Summary

Overview

Work History

Education

Skills

Certification

Timeline

Hi, I’m

VENKAT RAMANAN R

DATA/ML Engineer

Chennai,Tamilnadu

Summary

Experience:

With over 12 years of experience in the IT industry, including more than 8 years focused on Hadoop, I possess expertise in tools within the Hadoop Ecosystem. This includes Spark, Spark Streaming, Kafka, KSQL, Datastax Cassandra, Pig, Hive, HDFS, Map Reduce, Sqoop, Storm, Yarn, Ozzie, and Oracle Golden Gate for Big Data. In addition,

I have 5+ years of proficiency in Design/Architect/Develop cloud platforms such as Databricks GCP, AWS, and Azure using Medallion,Event Driven, Serverless and Northstar Architectures. Highly-motivated employee with desire to take on new challenges.

Strong worth ethic, adaptability and exceptional interpersonal skills. Adept at working effectively unsupervised and quickly mastering new skills.

Able to design and implement data architecture solutions, including Data Warehousing, Data Lake, and Data Mart concepts.Well-acquainted with ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) processes for efficient data integration and transformation.Extensive experience in business processes, requirements gathering, data modeling, and SDLC life cycle.

Overview

13

years of professional experience

3

Certification

Work History

Data Archietct

02.2021 - Current

Job overview

Technologies Used: GCP,AWS EMR,Kinesis,S3,Cloud watch,Lambda ,AWS glue,Airflow ,Redshift, Dynamo Db,Hadoop,Kafka, HBASE,, Spark Streamng Core java,Scala.,Python,Spring,J2ee,terradata
Summary (On prem to AWS migration Historical & Cloud native pipelines)
Migrated the on-prem Hadoop datalake to the AWS datalake (S3, Redshift & DynamoDB)
Created a common batch framework using AWS Glue targeted at S3 & Redshift
Utilized AWS Lambda for API calls to MongoDB and MySQL
Conducted discovery and data cataloging for all migrated data using AWS Data Catalog
Set up a Kinesis stream environment to process and analyze real-time streaming data, deploying it to Redshift & S3 via Firehose
Employed Apache Airflow on top of ECS & EC2 to orchestrate 700 production batch jobs
Achieved the movement of 300TB of data daily through incremental loads
Integrated various sources like ADLS, Oracle, Teradata, and Hadoop feed files
Utilized CloudFormation and AWS CodeCommit, CodePipeline, and AWS Deploy for continuous deployment
Implemented observability and monitoring with CloudWatch logs, Splunk, PagerDuty, and New Relic
Developed an automated Spark/Scala framework for both batch and real-time streaming
Created a synthetic data generator to produce mock data for lower environments
Implemented masking algorithms and encryption logic as needed
Collaborate regularly with the ML team and data science team to enable ad hoc or incremental data tasks., Create road map to Design,Building & Automation of data pipelines using dataproc, data fusion and data flow
Create data lakes & gather data with Batch and Streaming on top of GCS, DPMS
Created the curated environment on top of Big query with fine grain access model
Used Cloud composer, Cloud scheduler For the orchestration
Created the CI/CD pipelines using Cloud build and terraform combination
Added the monitoring and logging part with stark driver also integrated with Splunk and New relic
To be able to perform detailed analysis of business problems and technical environments and use this in designing the solution;
To be able to work creatively and analytically in a problem-solving environment;
Created end to end low latency streaming connecting KafkaPub sub using dataflow and targeting to Big table
Used google catalog for classification of the data and tightly integrated with Big query to enable persona based access
Worked with network team and governance team to build the GCP interconnect and to define persona based roles
Migrated historical data also migrated all the brownfield application to greenfield in cloud native
Worked with security team and to get approval to enable GCP managed services.

CTS

Senior Full stack Dev ops & Data Engineer

08.2019 - Current

Job overview

Technologies Used: AWS EMR,Kinesis,S3,Cloud watch,Lambda ,Azure functions,event Hub ,Blob , Azure SQLDW ,Hadoop,Kafka, HBASE,, Spark Streamng Core java,Scala.,Python
Summary
Developed real time Streaming Application using AWS Kinesis, Kafka, Spark Streaming, Kafka streams,Cassandra,Oracle
Build the data streaming pipeline between Saleforce to Aws Kinesis
Build and deploy the Developed using continuous integration CI-CD Bamboo
Transform and flatten the data received in Raw layer using Spark
Created Monitoring mechanism using Splunk dashboard
Automated the deployment using Jenkins
After flattening Make sure the data replicated to Access instance[Redshift and Azure synapse] to Users
Build adhoc analytical batch using spark jobs
Build the data streaming pipeline between Saleforce to Azure using event hub
Build and deploy the Developed using continuous integration CI-CD Bamboo
Transform and flatten the data received in Raw layer using Databricks
Utilize azure polybase using ADLS & Azure synapse.

CTS, NBC UNIVERSAL

Senior Data Engineer

12.2018 - 08.2019

Job overview

Technologies Used: Hadoop,Kafka, Cassandra, Spark Streaming, MongoDB, Teradata, SBT, OGG, Oracle,Core java,Scala.Hibernate,Python,Aws
Summary
Developed real time Streaming Application using AWS Kinesis, Kafka, Spark Streaming, Kafka streams,Cassandra,Oracle
Done the data modelling and DE-normalization in Cassandra
Handled the Kafka producer part integrating it into Oracle table using Oracle golden gate
For consuming, the data used Spark streaming to replicate the streaming data into Cassandra
Automated the deployment using Jenkins and Ansible
Created Grafana visualization for Metrics monitoring .[Kafka, Cassandra]
For Reconciliation [compare the oracle and Cassandra data count], created Spark batch process
In admin, side enabled the spark history server, tuning the Cassandra Database
Maintain the Tombstones and retention periods
Involved wring business logics using Oracle PL/SQL in Oracle DB
Also created Batch jobs migrating Data from Mongo DB to Cassandra, also Teradata to Cassandra.

CAPGEMINI, GLOBAL BANKING

Senior Data Engineer

03.2015 - 12.2018

Job overview

Technologies Used;- Hadoop,Kafka, Cassandra, Spark Streaming, MongoDB, Teradata, SBT, OGG, Oracle,Core java,Scala.Hibernate,Python,Aws,IBM CDC,Flume
Summary
Created datapipeline using IBM CDC, Kafka using spark direct streaming
Developed Spark scripts by using Scala shell commands as per the requirement
Responsible for building scalable distributed data solutions using Hadoop
Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's
Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark
Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself
Worked extensively with Sqoop for importing metadata from Oracle
Involved in creating Hive tables, and loading and analyzing data using hive queries
Developed Hive queries to process the data and generate the data cubes for visualizing
Implemented schema extraction for Parquet, ORC and Avro file Formats in Hive
Implemented Partitioning, Dynamic Partitions, Buckets in HIVE
Used Reporting tools like Tableau to connect with Hive for generating daily reports of data
Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.

3i Infotech

Senior Data Engineer

10.2012 - 03.2015

Job overview

Technologies Used: Hadoop, spark, Data Warehouse, Plsql, sql, Java, UNIX shell scripting
Summary
Involving in Business Process analysis, Requirements review and identification of business impact
Loading data from source system to Data warehousing by Migration
Making user adhoc Reports and analyzing the Big data environment
Worked in Map reduce and Hive, Spark environment
Worked extensively with Sqoop for importing metadata from Oracle
Involved in creating Hive tables, and loading and analyzing data using hive queries
Implemented Partitioning, Dynamic Partitions, Buckets in HIVE
Used Reporting tools like Tableau to connect with Hive for generating daily reports of data
Writing Programs for reports using map reduce, Hive, Spark sql when required
Handled user web logs using Kafka by creating Topics and also analyzed using spark streaming
Co-ordinate defect status review meetings with the testing teams on issues with the defects in the plate
Writing test cases, test scripts and processing flow scenarios
Coding the business requirements and technical requirements with PLSQL, SQL, and UNIX
Periodic performance tuning of SQL is to achieve better performance using tools like EXPLAIN PLAN
Creating stored procedures, Function, Packages and Triggers
Extensively used Ref cursor, Bulk collect, PL/SQL Collections, dynamic sql
Performing Unit-testing, System testing, Regression testing, Integration testing
In addition, involved in SQL Query tuning.

TCS, Mortgage Banking

Data Engineer

10.2010 - 09.2013

Job overview

Hadoop, spark, Data Warehouse, Plsql, sql, Java, UNIX shell scripting
Summary
Involving in Business Process analysis, Requirements review and identification of business impact
Loading data from source system to Data warehousing by Migration
Making user adhoc Reports and analyzing the Big data environment
Worked in Map reduce and Hive, Spark environment
Worked extensively with Sqoop for importing metadata from Oracle
Involved in creating Hive tables, and loading and analyzing data using hive queries
Implemented Partitioning, Dynamic Partitions, Buckets in HIVE
Used Reporting tools like Tableau to connect with Hive for generating daily reports of data
Writing Programs for reports using map reduce, Hive, Spark sql when required
Handled user web logs using Kafka by creating Topics and also analyzed using spark streaming
Co-ordinate defect status review meetings with the testing teams on issues with the defects in the plate
Writing test cases, test scripts and processing flow scenarios
Coding the business requirements and technical requirements with PLSQL, SQL, and UNIX
Periodic performance tuning of SQL is to achieve better performance using tools like EXPLAIN PLAN
Creating stored procedures, Function, Packages and Triggers
Extensively used Ref cursor, Bulk collect, PL/SQL Collections, dynamic sql.

Education

Anna University

Bachelor of Electrical and Electronics Engineering

01.2008

University Overview

Skills

Proficient in various data architecture concepts such as
Data Warehousing, Data Lake, and Data Mart Skilled in designing and implementing data architecture solutions that align with business needs and data strategies
Well-versed in data warehousing methodologies, ETL (Extract, Transform, Load), and ELT (Extract, Load, Transform) processes, ensuring efficient data integration and transformation
In addition I am good at programming languages Python/Scala/Java and Go
Worked in Event-driven architecture, Serverless architecture, and Medalian architecture
AWS
EMR,s3,Kinesis, Lambda,Red shift, Dynamo DB,Aws code deploy, beanstalk, Cloud watch,Data Pipeline,AWS Glue
Azure
Azure hdinsight,Blob,Databricks,Event Hub, SQLDW,ADLS,Data factory
GCP
Dataflow, data proc, GCS, Data fusion, dara catalog, Pub/Sub, Big query, Big table, Cloud composer
ETL
Informatica, Abinitio, Talend
DW
Teradata Netezza
Orchestration
Espx, Ctrl-M, Nifi, Airflow, oozie, Autosys
Distribution System
Data Stax, Cloudera
Languages
Scala, Core Java,ORACLE SQL/PL SQL
Scripting
Python,Unix Shell Script
Cloud
AWS,Azure, GCP

Hadoop Tools
Hive,Pig,Sqoop,Hbase,Spark, Impala
Data Extraction \Transformation
Kafka, Spark , Map reduce
No sql Db
Cassandra, Hbase, Prometheus ,Elastic,MongoDB
Streaming Tools
Flume,Kafka,Spark Streaming,AWS Kinesis,OGG,Apache Nifi,Azure Event Hub
Databases
Oracle , Teradata, Postgre
Development Tools
Intellij,Eclipse, Sql Developer,Dbvisualizer,Git
Contionous Deployement Tools [CI-CD]
Jenkins,Bamboo,Ansible, Azure Devops
Management Skills
Project lead, Solution Architect, principle data Engineer, Migration Consultant
Other
Git,Bit bucket,Git Lab
Logging Mechanisms
Splunk[Splunk query,Reports,Alerts & dashboards]
Visualization
Tableau, quicksight,Looker
Performing Unit-testing, System testing, Regression testing, Integration testing
In addition, involved in SQL Query tuning
Reverse Engineering

Certification

Azure - Certified Data Engineer & Dev ops Engineer & Architect
GCP – Certified cloud data engineer
AWS – Certified Associate Developer & Certified Data Analytics

Timeline

Data Archietct

02.2021 - Current

Senior Full stack Dev ops & Data Engineer

CTS

08.2019 - Current

Senior Data Engineer

CTS, NBC UNIVERSAL

12.2018 - 08.2019

Senior Data Engineer

CAPGEMINI, GLOBAL BANKING

03.2015 - 12.2018

Senior Data Engineer

3i Infotech

10.2012 - 03.2015

Data Engineer

TCS, Mortgage Banking

10.2010 - 09.2013

Anna University

Bachelor of Electrical and Electronics Engineering

Similar Profiles

Mlungisi NgqebeMlungisi Ngqebe
AWS Connect DevOps Engineer and Software Developer at NTT Data (Dimension Data)AWS Connect DevOps Engineer and Software Developer at NTT Data (Dimension Data)
Abdulla TreekAbdulla Treek
Freelance Data Engineer / Data Analyst at Freelance Data Engineer / Data AnalystFreelance Data Engineer / Data Analyst at Freelance Data Engineer / Data Analyst
Kate HollidayKate Holliday
KEY ACCOUNT MANAGER at THRYV DATA (SENSIS DATA SOLUTIONS)KEY ACCOUNT MANAGER at THRYV DATA (SENSIS DATA SOLUTIONS)
Kiran Kumar PamanjiKiran Kumar Pamanji
Senior Engineer – Product Specification and Configuration Analyst at FORD Motor Private LimitedSenior Engineer – Product Specification and Configuration Analyst at FORD Motor Private Limited
Vigneswari RVigneswari R
Senior Back Office Executive at SREE NANDHEE'S TECHNOLOGIESSenior Back Office Executive at SREE NANDHEE'S TECHNOLOGIES