
● Developed Python scripts to transform raw data into intelligent data as specified by business user
● Worked closely with data modelers to model new incoming data sets
● Processed data into HDFS by developing solutions, analyzed the data using Pyspark, Hive/Impala/Kudu and produce results to downstream systems
● Developed Shell, Python scripts to automate the manual monitoring process for ETL jobs
● Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
● Reading ETL jobs data from Oracle using Pyspark and store into Hive/Impala
● Skilled in interacting with clients and coordinating with multiple stakeholders for data exchange
● Written Python script for Kafka Producer Consumer
● Possess functional knowledge of designing and developing applications in Spark using Scala to compare the performance of Spark with Hive
● Knowledge on Hadoop Ecosystem such as HDFS, Resource Manager, Node Manager, Name Node, Data Node and MapReduce program paradigm
● Insightful knowledge of loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Python Scripts
● Possess strong analytical and problem-solving skills; an effective leader with excellent skills in motivating individual employee performance
Title: Falcon (Birlasoft - JNJ)
Role: Development, Coding, Testing
Domain: Big Data (Pharma)
Technologies: Spark, Python, Cloudera, Hive/Impala/Kudu, Oozie, Oracle
Synopsis:
Capturing logs using Pyspark/Shell Script from different applications like GSDL, GSDR, Safety etc. and storing in Hive/Impala. Also getting server availability and ETL job status to show data in Tableau Dashboard.
Logging.
Title: Galaxy (Condeco)
Role: Development, Coding, Testing
Domain: Big Data (Sales)
Technologies: Spark, Scala, Azure Synapse Studio, Azure Data Pipeline, SQL
Synopsis:
Build data pipelines for data ingestion from source (SQL) to destination (blob) and automate it. Applied transformations using spark and store into data warehouse for reporting. Also wrote Stored Procedure for custom
Logging.
Title: AML Reconciliation (Capgemini – Citi Bank)
Role: Development, Coding, Testing
Domain: Big Data (Financial)
Technologies: Spark, Scala, Hive, CDH5, Oracle
Synopsis:
Data ingested to HDFS in parquet format and we get the data from HDFS and process it to different layers using Spark Dataframe/SQL API by applying some business rules. In last layer, creating reconciliation report for valid transactions.
Title: Novus Loyalty Analytics (Clavax Technologies)
Role: Development, Coding, Testing
Domain: Big Data Analytics
Technologies: Java, Hadoop, MongoDB, PowerBI
Synopsis:
Novus is a loyalty program for RuPay debit card. In this project providing vouchers, points and other loyalty benefits to our customers on using RuPay card. Our clients are Banks, NPCI, Tempe Golf Range, BookMyShow and others. Data comes in different formats (TEXT, CSV, TSV, JSON, and XML) and stored in HDFS. Processed data using MapReduce and Hive. Power BI is used to prepare the report from data.
Spark, MapReduce, HDFS, Hive/Impala, Kudu, Oozie, Sqoop, Yarn, Python, Scala, Java , SQL, Cassandra, Java, Spring MVC, Hibernate, Flask, Eclipse, VSCode, NetBeans, IntelliJ, Azure, AWS