Accomplished Senior Data Engineer with a proven track record at DELL Technologies Inc, enhancing data processes and analytics. Expert in Apache Hadoop and Cloudera, with a knack for transforming complex data into actionable insights. Demonstrates strong leadership in cross-functional team collaboration, significantly improving data job performance by over 75%.
Overview
14
14
years of professional experience
1
1
Certification
Work History
Senior Data Engineer
DELL Technologies Inc
Hyderabad
12.2019 - Current
Dell Technologies is an American multinational computer technology company that develops, sells, repairs, and supports computers and related products and services and is one of the largest technology corporations in the world
As a Data Engineering team which is responsible for Creation, Enhancements and Maintenance of different type of datasets for the analytical purpose
We help business to create insights to improve the processes by providing on-time data with quality
Boomerang is also the main tool used for investigations with development teams regards to performance of their pages
Responsibilities:
Responsible for data architecture and flow for the speed and stability project in data engineering team
Designed and Developed OLAP Tabular Cube reading 100 M Records data every week
Responsible for developing an automated framework which creates automates the development process in Data lake
Deploying and maintaining the CICD,GIT pipelines
Collecting the requirements from product owner and converting them into user stories to set the tasks for developing the product
Orchestrated and improved performance of the jobs in Hadoop and OLAP from 16 hours to 3hours
Catalina combines our deep analytics and insights with the greatest buyer history database in the world to power our buyR3scienceTMsolutions
Catalina solutions pinpoint the way behind every buy and mobilize meaningful, real-time engagement and results with the relevant 2% of buyers who drive 80% of brand volume (on average)
As a Retailer, I want to have easy access to advanced insights about my business and shoppers, so that I don't have to build out my own reports
Insights to our retailer partners about their private label brands in support of their partnership with Catalina
This will provide the retailer with easy
Which might be difficult for smaller retailers to create themselves, or time consuming and too costly for larger retailers to build out their own reporting
Provides retailers with both analytics/insights and media activation capabilities creating synergies for retailers Retailer
Retailer renewals Account penetration -startup a consultative relationship with retailers at the marketing promotional decisions are taking place
Generally, will create stickiness with retail partners, and/or mitigate account erosion Explanation
Responsibilities:
Designed and Developed Solr Indexes as end points to web team for building retail hub
Responsible for developing an automated framework which creates automates the development process in Data lake
Responsible for maintaining and creating the user stories and leading the team on the technology side
Collecting the requirements from product owner and converting them into user stories to set the tasks for developing the product
Is an American multinational food, snack, and beverage corporation which has interests in the manufacturing, marketing, and distribution of grain-based snack foods, beverages, and other products
PepsiCo Analytics Center (PAC) team which is part of the PepsiCo Business Intelligence
Services (BIS) team supports the broad PepsiCo organization on managing all major data assets and technological investment to developing proprietary solutions that help PepsiCo leverage data asset to drive our business
Which Provide insights on demand creation to shape our business strategies and elevate our value as a partner
Builds best-in-class tools, capabilities and partnerships to revolutionize how we work in a rapidly changing retail landscape
PAC Provides advantaged execution of key commercial capabilities, changing the way we work with our customers to jointly capture growth
Provides a bridge across BUs and functions (marketing and sales) to break down silos and create seamless interaction pre-during-post with the consumer/shopper
At the heart of this capability is a household-level database of U.S
Shoppers, which predicts household demand for our products (spend propensity for our product categories and brands) and household retailer behavior (household shopping propensity for our top 30 retailers) which we can use for hyper-targeting and engagement across our portfolio
Responsibilities:
Created a development framework in python which enables the developers to save time and money
Planning, designing, developing, testing and documenting Big Data, Data lake Solutions and Analytics
Provide guidance on how we better integrate our data assets into single household database (vs
Current plan)
Responsible for designing and developing an automated framework which creates automates the development process in Enterprise Data Lake
Responsible to work with cross functional consulting teams within the data science and Analytics team to design, develop, and execute solutions to derive business insights and solve clients operational and strategic problems
Involved in migration of Hive queries into the Apache Nifi
Worked on Data lake AZURE, Blob
Work in Agile Scrum model and involved in sprint activities
Gather and Analyze Business requirements
Develop with GitHub, Apache Nifi, Microsoft AZURE Tools and deployed the projects in to production environments
Responsible for Hive Performance Queries and Improving the Efficiency and Latency of time consuming queries to save the money and time of business users
Wrote Apache Nifi Automated Dataflow scripts and Processors
Responsible for coding new development and maintaining existing systems
Convert project specifications into detailed instructions and logical steps for coding into languages processed by computers
Analyze workflow charts and diagrams, applying knowledge of requirements, analysis, design, testing and software application implementation
Apply broad knowledge of programming techniques and Big data Solutions to evaluate business user requests for new Applications
Perform further development after due analysis and design
Responsible for knowledge transfer/user training activities
Blue Cross and Blue Shield insurance companies are licensees, independent of the association and traditionally of each other, offering insurance plans within defined regions under one or both of the association's brands
Blue Cross Blue Shield insurers offer some form of health insurance coverage in every U.S
State
They also act as administrators of Medicare in many states or regions of the U.S.A and provide coverage to state government employees as well as to the federal government employees under a nationwide option of the Federal Employees Health Benefits Program
The Project deals with Membership Gold Layer is to create an active, current, rationalized, integrated, enterprise accepted, and consumable membership structure
In other words, a single consistent view of membership irrespective of sources
Components delivered for membership structure enable creation of additional consumable structure
Responsibilities:
Designed and Developed Data lake Enterprise layer gold confirmed process which is available for the consumption team and business users to perform analytics
Responsible for designing and developing an automated framework which creates automates the development process in Data lake
Integrated Talend with HBase for storing the processed Enterprise Data into separate column families and column qualifiers
Used CronTab and Zena scheduling to schedule trigger jobs in production
Worked with cross functional consulting teams within the data science and analytics team to design, develop, and execute solutions to derive business insights and solve clients' operational and strategic problems
Involved in migration of Teradata queries into the snowflake Data warehouse queries
Worked in Agile Scrum model and involved in sprint activities
Gathering and Analysis of Business requirements
Worked in Various Talend Integrations with Hbase and Avro Format, Hive, Phoenix and Pig Components
Worked with GitHub, Zena, Jira, Jenkins Tools and deployed the projects in to production environments
Involved in Cluster coordination services through Zookeeper
Worked On Integration with Phoenix Thick and thin clients and also involved in installing and developing Phoenix-Hive, Hive-Hbase integrations
Wrote UNIX Automated Shell scripts and developed an automation framework with Talend and Unix
Created Merge, Update, Delete Scripts in Hive and worked on performance tuning Joins in Hive
The Enterprise Data Warehouse provides consistent, complete, integrated, accurate, and timely core business data to support applications and information needs across the enterprise
The Enterprise Data Warehouse contains key business data from across the enterprise organized around customer
EDW provides this data to business users and applications to meet their information needs
This data is kept for several years to gain visibility not only into current activities but to analyze trends in the business as a whole
It is a very powerful tool, but the combination of such a large amount of sensitive company data in a single location will require some specific security restrictions
The project deals with the creation and maintenance of the enterprise data warehouse which constitutes details about customer behavior and Walgreens program, product and location information
Information from different source systems is received, transformed, verified and loaded into the data warehouse
From the data warehouse information is extracted as reports for the users
The transformation of data, loading of the data ware house and extraction of data from the data ware house is done using Ab initio
The loaded information will be further used for reporting using other tools and applications
Now the EDW is migrated into Hadoop with cost effective mechanism
Responsibilities:
Responsible for designing and implementing ETL process to load data from different sources, perform data mining and analyze data using visualization/reporting
Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions
Involved in migration of Teradata queries such as updates, inserts and deletes migration into the hive queries
Development of PIG scripts for Noise reduction
Developed the Sqoop scripts for the historical data with BIG tables with more than 4 TB tables
Worked in Agile Scrum model and involved in sprint activities
Gathering and Analysis of Business requirements
Involved with optimizing query performance and data load times in PIG, Hive and Map Reduce applications
Expert in optimizing performance in hive using partitions and bucketing concepts
Experienced to interact with data scientists to implement ad-hoc queries using Hive QL, Partitioning, bucketing and Hive Custom UDF's
Experienced in optimizing hive queries, optimized joins and using different data files with Custom SerDe's
Designed the process to do historical/incremental load
Involved in Sqooping more than 20 TB of Data from Teradata to Hadoop
The objective of the project is to leverage the power of Data Fabric to improve performance and readiness while lowering IT costs
Analyze growing volumes of data immediately to streamline business processes and improve decision-making
Connect customer, product, and TLOG data for scientific analysis of pricing
Accurately segment customers and products by pricing analysis of customer data and group them into sections based on their purchasing patterns
Uncover pricing opportunities for Pricing by doing heavy lifting, comparing how it is being priced today versus the optimal pricing for each segment
To implement the data validation rules formed by the business users based on their requirements
Remediation of data aims at removing the invalid data from data sources
Responsibilities:
Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions
Analyze the assigned user stories in JIRA (Agile software) and create design documents
Attend Daily stand ups and update the hours burned down in JIRA
Worked in Agile Scrum model and involved in sprint activities
Gathering and Analysis of Business requirements
Created and Implemented Business, validation and coverage, Price gap Rules in Talend on Hive, Greenplum Databases using Talend Tool
Involved in development of Talend components to validate the data quality across different data sources
Involved in analysis of business validation rules and finding options for the implementation of the rules in Talend
Exceptions thrown out of the data validation rule execution will be sent back to the business users for remediating the data and ensuring clean data across data sources
Worked on Global ID Tool to Apply the Business Rules
Automated andSchedulingthe Rules on Weekly, Monthly Basis in TAC (Talend Administration Centre)
Created Scheduling Process with CRON Scheduling on a weekly Process
Created and Maintained the Hive Tables and Greenplum Tables on weekly basis
Collected the data from ftp server and loaded into the Hive tables
Partitioned the collected logs by date/timestamps and host names
Developed Data Quality Rules on top of the External Vendors Data
Imported data frequently from MySQL to HDFS using Sqoop
Supported operations team in Hadoop cluster maintenance activities including commissioning and decommissioning nodes and upgrades
Used QlikView for visualizing and to generate reports
Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters
Managing and scheduling Jobs using Oozie on a Hadoop cluster
Involved in Data Modelling by Using QlikView Integration of Data Sources - ETL with QlikView reports
Involved in defining job flows, managing and reviewing log files
Monitored workload, job performance and capacity planning using Cloud era Manager
Installed Oozie workflow engine to run multiple Map Reduce, Hive and Pig jobs
Responsible for loading and transforming large sets of structured, semi structured and unstructured data
Responsible to manage data coming from different sources
Implemented pushing the data from Hadoop to Greenplum
Worked on pre-processing the data using pig regular expressions
Gained experience with NOSQL database
Worked on scheduling the jobs through Resource Manager
Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Talend Jobs
HEDIS is a set of standardize performance measures designed to ensure that purchasers and consumers have the information they need to reliably compare the performance of healthcare plans
It’s a tool, widely used for health plans to measure performance on important dimensions of care and service
To ensure the validity of HEDIS results, provider, member and claims are audited using an annual process designed by (NCQA)
HEDIS result is used as a performance benchmark and comparison of health care plans which can be used by purchasers and consumers
The scope of the project includes generation of Medical, Pharmacy and Lab claims, Provider, Provider specialty, Member-Enrollment and Member Extract for the external vendor
Responsibilities:
Worked in Agile Scrum model and involved in sprint activities
Analysis of Business requirements and implementing Customer Friendly Dashboards
Implemented Section Access for Security Implementation
Involved in Data Modelling by Using QlikView Integration of Data Sources - ETL with QlikView reports
Identify and improve weak areas in the applications, performance reviews and code walk through to ensure quality
Created QVD's and Designed QlikView Dashboards using different types of QlikView Objects
Modified ETL Scripts while loading the data, resolving loops & ambiguity joins
Wrote complex expressions using the Aggregation functions to match the logic with the business SQL
Performance tuning by analyzing and comparing the turnaround times between SQL and QlikView
Worked with QlikView Extensions like SVG Maps, HTML Content
Developed Set Analysis to provide custom functionality in QlikView application
Used Binary Load, Resident Load, Preceding Load, And Incremental Load during Data Model
Environment: Hadoop, HDFS, Hive, SQL and QlikView.
Hadoop Developer
AIG
Hyderabad
03.2011 - 12.2012
The AIGPC Claims organization seeks to maintain the full historical changes in the claim during its life cycle is to have analytics on top of XML semi-structured data using OneClaim Hadoop system
The OneClaim Hadoop system is single source of the claims data covering both current and history, and providing complete XMLs as well as their reference data
The data will be ingested into Hadoop from the OneClaim ODS
The key data attributes will be exposed and the other unexposed attributes will be available for querying only on demand basis
Qlikview, Cognos and other reporting tools can perform analytics on the data has provided by OneClaim Hadoop ODS
Responsibilities:
Worked on analyzing Hadoop cluster using different big data analytic tools including Pig,Hive, and Map Reduce
Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
Worked on debugging, performance tuning of Hive & Pig Jobs
Created HBase tables to store various data formats ofPII data coming from different portfolios
Implemented test scripts to support test driven development and continuous integration
Worked on tuning the performance Pig queries
Cluster co-ordination services through Zookeeper
Experience in managing development time, bug tracking, project releases, development speed, release forecast, scheduling and many more
Involved in loading data from LINUX file system to HDFS
Importing and exporting data into HDFS and Hive using Sqoop
Developed Java program to extract the values from XML using XPaths
Experience working on processing unstructured data using Pig and Hive
Supported Map Reduce Programs those are running on the cluster
Gained experience in managing and reviewing Hadoop log files
End-to-end performance tuning of Hadoop clusters and Hadoop Map/Reduce routines against very large data sets
Implemented test scripts to support test driven development and continuous integration
Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
Assisted in monitoring Hadoop cluster using tools like Cloudera Manager
Experience in optimization of Map reduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization for a HDFS
Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts