To work in a challenging and creative environment contributes towards the goals of the organization.
Pyspark, Spark, Azure Databricks, Map reduce
+91-8074405932
+91-9705903982
Project 1: Data Migration from SQL Server to Snowflake
Client: Nature Sweet
Role: Staff Engineer
Environment: Azure, Pyspark, Azure Databricks, Snowflake, Python-based Snowflake Store procedure, LAZSA Platform, Agile Methodology, Jira
Duration: From Nov 2023 to till nowProject Scope: The Program objective is to perform data engineering tasks on various data sources and ingest/cleanse them to Snowflake.
Roles & Responsibilities:
· Provided solution on the architecture level for building any new pipeline or re-designing any pipeline.
· Worked closely with Client/Business Analysts for requirement gatherings.
· Experienced in developing Python-based Snowflake Store procedure to load the data in between multiple layers in Snowflake tables.
· Managed data coming from different sources and loading of structured data into Snowflake landing layer.
· Worked on Pyspark using Azure Databricks to load the data from SQL Server tables to Snowflake tables.
· Conceptualized and designed an end-to-end framework for batch processing.
· Worked on LAZSA platform to design the pipelines and schedule the jobs for batch processing.
· Used Jira for project tracking, Bug tracking and Project Management.
· Involved in complete end to end code deployment process in Production.
Project 2: CALIBO
Client: CALIBO
Role: Staff Engineer
Environment: AWS S3, Pyspark, Azure Databricks, Snowflake, SQL Store procedure, LAZSA Platform, Agile Methodology, Jira
Duration: From Nov 2022 to till nowProject Scope: The Program objective is to perform data engineering tasks on various
data sources and ingest/cleanse them to Snowflake.
Roles & Responsibilities:
· Provided solution on the architecture level for building any new pipeline or re-designing any pipeline.
· Worked closely with Client/Business Analysts for requirement gatherings.
· Experienced in developing SQL Store to load the data in between multiple layers in Snowflake tables.
· Managed data coming from different sources and involved in AWS S3 maintenance and loading of structured data.
· Worked on Pyspark using Azure Databricks to load the data from AWS S3 to Snowflake tables.
· Conceptualized and designed an end-to-end framework for batch processing.
· Worked on LAZSA platform to design the pipelines and schedule the jobs for batch processing.
· Used Jira for project tracking, Bug tracking and Project Management.
· Involved in complete end to end code deployment process in Production.
Project 3: Retailer data ingestion system
Client: UK Retailer Company
Role: Application Development Team Lead
Environment: AWS S3, HDFS 2.7.3, SPARK 2.4.5, Hive 1.1.0, Pyspark, Hue, Jenkins, AWS services, Winscp, YARN, Agile Methodology, Jira
Duration: From April 2021 to till Nov 2022
Project Scope: This is a British multinational consumer goods company with headquarters in London, England. Unilever products include food, condiments, ice cream, cleaning agents, beauty products, and personal care. It is the largest producer of soap in the world and its products are available in around 190 countries. The main purpose is to deal with Retailer’s sales, products and its reviews related data. This project came into picture to get data from various sources of retailers dumped into S3 buckets and load the data in Hive tables to provide downstream teams. Here we have several retailer’s sales, products and its reviews related data across the globe. Here we used to get the data from various sources in different file formats (ex. csv, .xlsx, .parquet, .csv.gz, .json etc.) then process the data in 2 stages and then we used to load the data in final Hive tables in parquet/orc file format from where downstream will consumes those data.
Roles & Responsibilities:
· Managed the team with 10 members.
· Provided solution on the architecture level for building any new pipeline or re-designing any pipeline.
· Worked closely with Client/Business Analysts for requirement gatherings.
· Experienced in developing Hive Queries on different data formats like Text file, parquet file, orc file and leveraging time-based partitioning yields improvement in performance using HiveQL.
· Managed data coming from different sources and involved in AWS S3 maintenance and loading of structured data.
· Worked on Pyspark and Spark-SQL to convert old Hive scripts.
· Implemented extensive Impala queries and creating views for adhoc and business processing.
· Conceptualized and designed an end-to-end framework for batch processing.
· Handled up to 10 Terabytes of Data per day.
· Developed Shell scripts to automate and configure the jobs in Jenkins tool.
· Worked on Jenkins tool to schedule the jobs for batch processing.
· Used Jira for project tracking, Bug tracking and Project Management.
· Involved in complete end to end code deployment process in Production.
Project 4: Health Legacy system
Client: US Health Insurance Company
Role: Associate Hadoop Developer
Environment: CDH 5.16.0, HDFS 2.7.3, SPARK 1.6.0, Hive 1.1.0, Impala 2.7.0, Sqoop 1.4.6, Hue, Rundeck, Putty, Big Decision, YARN, Agile Methodology, HP - ALM
Duration: From December 2018 to till April 2021
Project Scope: This is one of the United States largest nonprofit health plans. It was established in 1937 to provide New York’s working families with access to medical services regardless of cost. The main purpose is to deal with Health Insurance legacy system data. This project came into picture to get data from Oracle(source) and load the data in Hive tables to provide downstream teams. Here we have several subject area's data like Membership, Claim, Accounting, Product, Commission etc. Here we used to get the data from Oracle(source) as textile format then process the data in 3 stages and then we used to load the data in final Hive tables in parquet file format from where downstream will consumes those data.
Roles & Responsibilities:
· Worked closely with Business Analysts for requirement gatherings.
· Experienced in developing Hive Queries on different data formats like Text file, parquet file and leveraging time-based partitioning yields improvement in performance using HiveQL.
· Managed data coming from different sources and involved in HDFS maintenance and loading of structured data.
· Experience in importing and exporting data using Sqoop from HDFS to RDBMS and vice-versa.
· Implemented extensive Impala queries and creating views for adhoc and business processing.
· Conceptualized and designed an end-to-end framework for batch processing 180 tables of data every day.
· Handled up to 8 Terabytes of Data per day.
· Developed Shell scripts to automate and configure the jobs in Rundeck tool.
· Worked on Big Decision tool for data cleansing.
· Worked on Rundeck tool to schedule the jobs for batch processing.
· Used HP-ALM for project tracking, Bug tracking and Project Management.
· Involved in complete end to end code deployment process in Production.
· Apart from above mentioned skillsets, I take full ownership of the deliverables and also review the assigned work done by other teammates and put my inputs if required.
Project 5: Customer Service Expectations
Client: US Insurance Company
Role: Hadoop Developer
Environment: Hadoop, HDFS, Hive, Yarn, Sqoop, Java, Spring Tool Suite, IBM DB2, UNIX, RMS, Zookeeper and Putty
Duration: From June 2017 to till November 2018
Project Scope: The main aim is to deal agent’s Licenses. This project came into picture as a replacement to the Select Agent criteria. Select Agent Indicator is a special criterion where an agent is given priority than the normal agents to the business when given a search thru user Interfaces. An agent can be called a select agent when they satisfy certain conditions. In this project Select agent criterion was made obsolete and introduced a new concept called customer service expectations (CSE) Agent. Along with current select agent criteria, added some more parameters to decide whether an agent to be called CSE or not.
Roles & Responsibilities:
· Worked closely with business customers for Requirement gatherings.
· Developing Sqoop jobs with incremental load from heterogeneous RDBMS (IBM DB2) using native dB connectors.
· Designed Hive repository with external tables, internal tables, partitions, ACID property and UDF for incremental data load of parsed data for analytical & operational dashboards.
· Experienced in developing Hive Queries on different data formats like Text file, CSV file, Log files and leveraging time-based partitioning yields improvement in performance using HiveQL.
· Created Hive external tables for the data in HDFS and moved data from archive layer to business layer with hive transformations.
· Developed Spark application using Scala against Hive tables to determine the CSE Agents.
· Worked on Revision management system to install the changes in production.
Project 6: Associate Data Movement
Client: US Insurance Company
Role: Software Development Engineer
Environment: Hadoop, HDFS, Hive, Sqoop, Java, Spring Tool Suite, IBM DB2, UNIX, RMS, Zookeeper and Putty
Duration: From June 2016 to till May 2017
Project Scope: The main purpose of this release is to migrate associate data related to Authorizations, Contacts, and Registrations etc. from Associate Register to Hadoop. While it is not critical for all associate information to be updated in real time, subset of data should be made continuously available for retrieval on the Integrated Customer Platform/Technical Platform. Many Hadoop applications will require associate data such as associate name, contact information and authorization to service specific products.
Roles & Responsibilities:
· Worked closely with business customers for Requirement gatherings.
· Developing Sqoop jobs with incremental load from heterogeneous RDBMS (IBM DB2) using native DB connectors.
· Designed Hive repository with external tables, internal tables, partitions and UDF for incremental data load of parsed data for analytical & operational dashboards.
· Created Hive external tables for the data in HDFS and moved data from archive layer to business layer with hive transformations.
· Developed Business logic.
· Performed unit testing.
· Worked on Revision management system to install the changes in production.
Project 7: Associate Data Movement
Client: US Insurance Company
Role: Software Development Engineer
Environment: COBOL, JCL, IBM DB2, RMS
Duration: From September 2014 to till May 2016
Project Scope: It was a Leasing Project where the whole process of Agents, Agent Staffs, Employees and Externals has their different roles. Here Agents, Agent Staffs and employees are internal to Client and others are externals. All the associates’ personal and business data is stored in the tables. Agents will have their policies, agreements, and products to sell to the customers. Agent Staffs and employees are working under Agents. They will handle the work of an Agent. The whole process is used to control through different applications. It is about the lease for movable goods like trucks.
Roles & Responsibilities:
· Prepared project documentation.
· Analyzed the functional documents.
· Developed codes of new modules and enhancement of old modules.
Performed unit testing.