Summary
Overview
Work History
Skills
Certification
Timeline
Generic

Ravikanth Vittala

Hyderabad

Summary

Accomplished Senior Data Engineer with a proven track record at DELL Technologies Inc, enhancing data processes and analytics. Expert in Apache Hadoop and Cloudera, with a knack for transforming complex data into actionable insights. Demonstrates strong leadership in cross-functional team collaboration, significantly improving data job performance by over 75%.

Overview

14
14
years of professional experience
1
1
Certification

Work History

Senior Data Engineer

DELL Technologies Inc
Hyderabad
12.2019 - Current
  • Dell Technologies is an American multinational computer technology company that develops, sells, repairs, and supports computers and related products and services and is one of the largest technology corporations in the world
  • As a Data Engineering team which is responsible for Creation, Enhancements and Maintenance of different type of datasets for the analytical purpose
  • We help business to create insights to improve the processes by providing on-time data with quality
  • Boomerang is also the main tool used for investigations with development teams regards to performance of their pages
  • Responsibilities:
  • Responsible for data architecture and flow for the speed and stability project in data engineering team
  • Designed and Developed OLAP Tabular Cube reading 100 M Records data every week
  • Responsible for developing an automated framework which creates automates the development process in Data lake
  • Deploying and maintaining the CICD,GIT pipelines
  • Collecting the requirements from product owner and converting them into user stories to set the tasks for developing the product
  • Orchestrated and improved performance of the jobs in Hadoop and OLAP from 16 hours to 3hours
  • Environment: Hadoop, HDFS, Hive, Spark, Python, Hue, Sql Server, OLAP Cubes, Airflow, CDP

Senior Hadoop Developer

Catalina Marketing Corporation
Tampa
06.2019 - 12.2019
  • Catalina combines our deep analytics and insights with the greatest buyer history database in the world to power our buyR3scienceTMsolutions
  • Catalina solutions pinpoint the way behind every buy and mobilize meaningful, real-time engagement and results with the relevant 2% of buyers who drive 80% of brand volume (on average)
  • As a Retailer, I want to have easy access to advanced insights about my business and shoppers, so that I don't have to build out my own reports
  • Insights to our retailer partners about their private label brands in support of their partnership with Catalina
  • This will provide the retailer with easy
  • Which might be difficult for smaller retailers to create themselves, or time consuming and too costly for larger retailers to build out their own reporting
  • Provides retailers with both analytics/insights and media activation capabilities creating synergies for retailers Retailer
  • Retailer renewals Account penetration -startup a consultative relationship with retailers at the marketing promotional decisions are taking place
  • Generally, will create stickiness with retail partners, and/or mitigate account erosion Explanation
  • Responsibilities:
  • Designed and Developed Solr Indexes as end points to web team for building retail hub
  • Responsible for developing an automated framework which creates automates the development process in Data lake
  • Responsible for maintaining and creating the user stories and leading the team on the technology side
  • Collecting the requirements from product owner and converting them into user stories to set the tasks for developing the product
  • Environment: Hadoop, HDFS, Hive, Spark, Python, Hue, Hbase, Solr Indexes, Talend, Netezza.

Senior Hadoop Developer

Pepsico Inc
Plano
11.2018 - 06.2019
  • PepsiCo, Inc
  • Is an American multinational food, snack, and beverage corporation which has interests in the manufacturing, marketing, and distribution of grain-based snack foods, beverages, and other products
  • PepsiCo Analytics Center (PAC) team which is part of the PepsiCo Business Intelligence
  • Services (BIS) team supports the broad PepsiCo organization on managing all major data assets and technological investment to developing proprietary solutions that help PepsiCo leverage data asset to drive our business
  • Which Provide insights on demand creation to shape our business strategies and elevate our value as a partner
  • Builds best-in-class tools, capabilities and partnerships to revolutionize how we work in a rapidly changing retail landscape
  • PAC Provides advantaged execution of key commercial capabilities, changing the way we work with our customers to jointly capture growth
  • Provides a bridge across BUs and functions (marketing and sales) to break down silos and create seamless interaction pre-during-post with the consumer/shopper
  • At the heart of this capability is a household-level database of U.S
  • Shoppers, which predicts household demand for our products (spend propensity for our product categories and brands) and household retailer behavior (household shopping propensity for our top 30 retailers) which we can use for hyper-targeting and engagement across our portfolio
  • Responsibilities:
  • Created a development framework in python which enables the developers to save time and money
  • Planning, designing, developing, testing and documenting Big Data, Data lake Solutions and Analytics
  • Provide guidance on how we better integrate our data assets into single household database (vs
  • Current plan)
  • Responsible for designing and developing an automated framework which creates automates the development process in Enterprise Data Lake
  • Responsible to work with cross functional consulting teams within the data science and Analytics team to design, develop, and execute solutions to derive business insights and solve clients operational and strategic problems
  • Involved in migration of Hive queries into the Apache Nifi
  • Worked on Data lake AZURE, Blob
  • Work in Agile Scrum model and involved in sprint activities
  • Gather and Analyze Business requirements
  • Develop with GitHub, Apache Nifi, Microsoft AZURE Tools and deployed the projects in to production environments
  • Responsible for Hive Performance Queries and Improving the Efficiency and Latency of time consuming queries to save the money and time of business users
  • Wrote Apache Nifi Automated Dataflow scripts and Processors
  • Responsible for coding new development and maintaining existing systems
  • Convert project specifications into detailed instructions and logical steps for coding into languages processed by computers
  • Analyze workflow charts and diagrams, applying knowledge of requirements, analysis, design, testing and software application implementation
  • Apply broad knowledge of programming techniques and Big data Solutions to evaluate business user requests for new Applications
  • Perform further development after due analysis and design
  • Responsible for knowledge transfer/user training activities
  • Environment: Hadoop, HDFS, Hive, QlikView, UNIX shell scripting, Hue, Hbase, Apache Nifi, Microsoft Azure,Python.

Senior Hadoop Developer

Blue Cross Blue Shield of Illinois
Chicago
05.2017 - 06.2018
  • Blue Cross and Blue Shield insurance companies are licensees, independent of the association and traditionally of each other, offering insurance plans within defined regions under one or both of the association's brands
  • Blue Cross Blue Shield insurers offer some form of health insurance coverage in every U.S
  • State
  • They also act as administrators of Medicare in many states or regions of the U.S.A and provide coverage to state government employees as well as to the federal government employees under a nationwide option of the Federal Employees Health Benefits Program
  • The Project deals with Membership Gold Layer is to create an active, current, rationalized, integrated, enterprise accepted, and consumable membership structure
  • In other words, a single consistent view of membership irrespective of sources
  • Components delivered for membership structure enable creation of additional consumable structure
  • Responsibilities:
  • Designed and Developed Data lake Enterprise layer gold confirmed process which is available for the consumption team and business users to perform analytics
  • Responsible for designing and developing an automated framework which creates automates the development process in Data lake
  • Integrated Talend with HBase for storing the processed Enterprise Data into separate column families and column qualifiers
  • Used CronTab and Zena scheduling to schedule trigger jobs in production
  • Worked with cross functional consulting teams within the data science and analytics team to design, develop, and execute solutions to derive business insights and solve clients' operational and strategic problems
  • Involved in migration of Teradata queries into the snowflake Data warehouse queries
  • Worked in Agile Scrum model and involved in sprint activities
  • Gathering and Analysis of Business requirements
  • Worked in Various Talend Integrations with Hbase and Avro Format, Hive, Phoenix and Pig Components
  • Worked with GitHub, Zena, Jira, Jenkins Tools and deployed the projects in to production environments
  • Involved in Cluster coordination services through Zookeeper
  • Worked On Integration with Phoenix Thick and thin clients and also involved in installing and developing Phoenix-Hive, Hive-Hbase integrations
  • Wrote UNIX Automated Shell scripts and developed an automation framework with Talend and Unix
  • Created Merge, Update, Delete Scripts in Hive and worked on performance tuning Joins in Hive
  • Environment: Hadoop, HDFS, Hive, QlikView, UNIX shell scripting, Hue, Hbase, Avro Format, Phoenix, Talend, Snowflake.

Hadoop Developer

Walgreens
Deerfield
04.2016 - 05.2017
  • The Enterprise Data Warehouse provides consistent, complete, integrated, accurate, and timely core business data to support applications and information needs across the enterprise
  • The Enterprise Data Warehouse contains key business data from across the enterprise organized around customer
  • EDW provides this data to business users and applications to meet their information needs
  • This data is kept for several years to gain visibility not only into current activities but to analyze trends in the business as a whole
  • It is a very powerful tool, but the combination of such a large amount of sensitive company data in a single location will require some specific security restrictions
  • The project deals with the creation and maintenance of the enterprise data warehouse which constitutes details about customer behavior and Walgreens program, product and location information
  • Information from different source systems is received, transformed, verified and loaded into the data warehouse
  • From the data warehouse information is extracted as reports for the users
  • The transformation of data, loading of the data ware house and extraction of data from the data ware house is done using Ab initio
  • The loaded information will be further used for reporting using other tools and applications
  • Now the EDW is migrated into Hadoop with cost effective mechanism
  • Responsibilities:
  • Responsible for designing and implementing ETL process to load data from different sources, perform data mining and analyze data using visualization/reporting
  • Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions
  • Involved in migration of Teradata queries such as updates, inserts and deletes migration into the hive queries
  • Development of PIG scripts for Noise reduction
  • Developed the Sqoop scripts for the historical data with BIG tables with more than 4 TB tables
  • Worked in Agile Scrum model and involved in sprint activities
  • Gathering and Analysis of Business requirements
  • Involved with optimizing query performance and data load times in PIG, Hive and Map Reduce applications
  • Expert in optimizing performance in hive using partitions and bucketing concepts
  • Experienced to interact with data scientists to implement ad-hoc queries using Hive QL, Partitioning, bucketing and Hive Custom UDF's
  • Experienced in optimizing hive queries, optimized joins and using different data files with Custom SerDe's
  • Designed the process to do historical/incremental load
  • Involved in Sqooping more than 20 TB of Data from Teradata to Hadoop
  • Environment: Hadoop, HDFS, Hive, QlikView, UNIX shell scripting, Hue,Hbase,Pig,Sqoop , Talend.

Hadoop Developer

Walmart stores Inc
Bentonville
06.2014 - 04.2016
  • The objective of the project is to leverage the power of Data Fabric to improve performance and readiness while lowering IT costs
  • Analyze growing volumes of data immediately to streamline business processes and improve decision-making
  • Connect customer, product, and TLOG data for scientific analysis of pricing
  • Accurately segment customers and products by pricing analysis of customer data and group them into sections based on their purchasing patterns
  • Uncover pricing opportunities for Pricing by doing heavy lifting, comparing how it is being priced today versus the optimal pricing for each segment
  • To implement the data validation rules formed by the business users based on their requirements
  • Remediation of data aims at removing the invalid data from data sources
  • Responsibilities:
  • Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions
  • Analyze the assigned user stories in JIRA (Agile software) and create design documents
  • Attend Daily stand ups and update the hours burned down in JIRA
  • Worked in Agile Scrum model and involved in sprint activities
  • Gathering and Analysis of Business requirements
  • Created and Implemented Business, validation and coverage, Price gap Rules in Talend on Hive, Greenplum Databases using Talend Tool
  • Involved in development of Talend components to validate the data quality across different data sources
  • Involved in analysis of business validation rules and finding options for the implementation of the rules in Talend
  • Exceptions thrown out of the data validation rule execution will be sent back to the business users for remediating the data and ensuring clean data across data sources
  • Worked on Global ID Tool to Apply the Business Rules
  • Automated andSchedulingthe Rules on Weekly, Monthly Basis in TAC (Talend Administration Centre)
  • Created Scheduling Process with CRON Scheduling on a weekly Process
  • Created and Maintained the Hive Tables and Greenplum Tables on weekly basis
  • Collected the data from ftp server and loaded into the Hive tables
  • Partitioned the collected logs by date/timestamps and host names
  • Developed Data Quality Rules on top of the External Vendors Data
  • Imported data frequently from MySQL to HDFS using Sqoop
  • Supported operations team in Hadoop cluster maintenance activities including commissioning and decommissioning nodes and upgrades
  • Used QlikView for visualizing and to generate reports
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters
  • Managing and scheduling Jobs using Oozie on a Hadoop cluster
  • Involved in Data Modelling by Using QlikView Integration of Data Sources - ETL with QlikView reports
  • Involved in defining job flows, managing and reviewing log files
  • Monitored workload, job performance and capacity planning using Cloud era Manager
  • Installed Oozie workflow engine to run multiple Map Reduce, Hive and Pig jobs
  • Responsible for loading and transforming large sets of structured, semi structured and unstructured data
  • Responsible to manage data coming from different sources
  • Implemented pushing the data from Hadoop to Greenplum
  • Worked on pre-processing the data using pig regular expressions
  • Gained experience with NOSQL database
  • Worked on scheduling the jobs through Resource Manager
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Talend Jobs
  • Environment: Hadoop, HDFS, Hive, QlikView, UNIX shell scripting, Hue,Greenplum, Talend.

QlikViewDeveloper

Humana Inc
Hyderabad
01.2013 - 05.2014
  • HEDIS is a set of standardize performance measures designed to ensure that purchasers and consumers have the information they need to reliably compare the performance of healthcare plans
  • It’s a tool, widely used for health plans to measure performance on important dimensions of care and service
  • To ensure the validity of HEDIS results, provider, member and claims are audited using an annual process designed by (NCQA)
  • HEDIS result is used as a performance benchmark and comparison of health care plans which can be used by purchasers and consumers
  • The scope of the project includes generation of Medical, Pharmacy and Lab claims, Provider, Provider specialty, Member-Enrollment and Member Extract for the external vendor
  • Responsibilities:
  • Worked in Agile Scrum model and involved in sprint activities
  • Analysis of Business requirements and implementing Customer Friendly Dashboards
  • Implemented Section Access for Security Implementation
  • Involved in Data Modelling by Using QlikView Integration of Data Sources - ETL with QlikView reports
  • Identify and improve weak areas in the applications, performance reviews and code walk through to ensure quality
  • Created QVD's and Designed QlikView Dashboards using different types of QlikView Objects
  • Modified ETL Scripts while loading the data, resolving loops & ambiguity joins
  • Wrote complex expressions using the Aggregation functions to match the logic with the business SQL
  • Performance tuning by analyzing and comparing the turnaround times between SQL and QlikView
  • Worked with QlikView Extensions like SVG Maps, HTML Content
  • Developed Set Analysis to provide custom functionality in QlikView application
  • Used Binary Load, Resident Load, Preceding Load, And Incremental Load during Data Model
  • Environment: Hadoop, HDFS, Hive, SQL and QlikView.

Hadoop Developer

AIG
Hyderabad
03.2011 - 12.2012
  • The AIGPC Claims organization seeks to maintain the full historical changes in the claim during its life cycle is to have analytics on top of XML semi-structured data using OneClaim Hadoop system
  • The OneClaim Hadoop system is single source of the claims data covering both current and history, and providing complete XMLs as well as their reference data
  • The data will be ingested into Hadoop from the OneClaim ODS
  • The key data attributes will be exposed and the other unexposed attributes will be available for querying only on demand basis
  • Qlikview, Cognos and other reporting tools can perform analytics on the data has provided by OneClaim Hadoop ODS
  • Responsibilities:
  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig,Hive, and Map Reduce
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
  • Worked on debugging, performance tuning of Hive & Pig Jobs
  • Created HBase tables to store various data formats ofPII data coming from different portfolios
  • Implemented test scripts to support test driven development and continuous integration
  • Worked on tuning the performance Pig queries
  • Cluster co-ordination services through Zookeeper
  • Experience in managing development time, bug tracking, project releases, development speed, release forecast, scheduling and many more
  • Involved in loading data from LINUX file system to HDFS
  • Importing and exporting data into HDFS and Hive using Sqoop
  • Developed Java program to extract the values from XML using XPaths
  • Experience working on processing unstructured data using Pig and Hive
  • Supported Map Reduce Programs those are running on the cluster
  • Gained experience in managing and reviewing Hadoop log files
  • End-to-end performance tuning of Hadoop clusters and Hadoop Map/Reduce routines against very large data sets
  • Implemented test scripts to support test driven development and continuous integration
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
  • Assisted in monitoring Hadoop cluster using tools like Cloudera Manager
  • Experience in optimization of Map reduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization for a HDFS
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts
  • Environment: Hadoop(CDH4), Map-Reduce, HBase,Hive,Sqoop,Oozie.

Skills

  • Apache Hadoop Ecosytem
  • Data integration
  • Advanced SQL
  • Machine learning
  • Cloudera Hadoop Distribution
  • Hortonworks, IBM, Big Insights
  • HDFS, Map-Reduce, Spark, Kafka, Hive, Pig, Sqoop, Oozie, HUE, HBase, Solr
  • SQL Server, ORACLE, MySQL, Greenplum, Snowflake, Phoenix, Presto
  • SSMS, SSIS, Linux, Red Hat, CentOS
  • Talend, Cloudera Manager, QlikView
  • Data pipeline design
  • Data modeling
  • ETL development
  • Linux administration
  • Performance tuning
  • Python programming
  • Big data processing
  • NoSQL databases
  • Git version control

Certification

  • CCDH- 410: Cloudera Certified Developer for Apache Hadoop.
  • Map-R certification.
  • IBM Big Insights Mastery Test V2.
  • Dell Data Scientist and Data Engineering Optimize
  • NVIDIA Efficient Large Language Model (LLM) Customization

Timeline

Senior Data Engineer

DELL Technologies Inc
12.2019 - Current

Senior Hadoop Developer

Catalina Marketing Corporation
06.2019 - 12.2019

Senior Hadoop Developer

Pepsico Inc
11.2018 - 06.2019

Senior Hadoop Developer

Blue Cross Blue Shield of Illinois
05.2017 - 06.2018

Hadoop Developer

Walgreens
04.2016 - 05.2017

Hadoop Developer

Walmart stores Inc
06.2014 - 04.2016

QlikViewDeveloper

Humana Inc
01.2013 - 05.2014

Hadoop Developer

AIG
03.2011 - 12.2012
  • CCDH- 410: Cloudera Certified Developer for Apache Hadoop.
  • Map-R certification.
  • IBM Big Insights Mastery Test V2.
  • Dell Data Scientist and Data Engineering Optimize
  • NVIDIA Efficient Large Language Model (LLM) Customization
Ravikanth Vittala