Summary
Overview
Work History
Education
Skills
Certification
Timeline
Hi, I’m

VENKAT RAMANAN R

DATA/ML Engineer
Chennai,Tamilnadu
VENKAT RAMANAN R

Summary

Experience:

With over 12 years of experience in the IT industry, including more than 8 years focused on Hadoop, I possess expertise in tools within the Hadoop Ecosystem. This includes Spark, Spark Streaming, Kafka, KSQL, Datastax Cassandra, Pig, Hive, HDFS, Map Reduce, Sqoop, Storm, Yarn, Ozzie, and Oracle Golden Gate for Big Data. In addition,

I have 5+ years of proficiency in Design/Architect/Develop cloud platforms such as Databricks GCP, AWS, and Azure using Medallion,Event Driven, Serverless and Northstar Architectures. Highly-motivated employee with desire to take on new challenges.

Strong worth ethic, adaptability and exceptional interpersonal skills. Adept at working effectively unsupervised and quickly mastering new skills.

Able to design and implement data architecture solutions, including Data Warehousing, Data Lake, and Data Mart concepts.Well-acquainted with ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) processes for efficient data integration and transformation.Extensive experience in business processes, requirements gathering, data modeling, and SDLC life cycle.

Overview

13
years of professional experience
3
Certification

Work History

Data Archietct

02.2021 - Current

Job overview

  • Technologies Used: GCP,AWS EMR,Kinesis,S3,Cloud watch,Lambda ,AWS glue,Airflow ,Redshift, Dynamo Db,Hadoop,Kafka, HBASE,, Spark Streamng Core java,Scala.,Python,Spring,J2ee,terradata
  • Summary (On prem to AWS migration Historical & Cloud native pipelines)
  • Migrated the on-prem Hadoop datalake to the AWS datalake (S3, Redshift & DynamoDB)
  • Created a common batch framework using AWS Glue targeted at S3 & Redshift
  • Utilized AWS Lambda for API calls to MongoDB and MySQL
  • Conducted discovery and data cataloging for all migrated data using AWS Data Catalog
  • Set up a Kinesis stream environment to process and analyze real-time streaming data, deploying it to Redshift & S3 via Firehose
  • Employed Apache Airflow on top of ECS & EC2 to orchestrate 700 production batch jobs
  • Achieved the movement of 300TB of data daily through incremental loads
  • Integrated various sources like ADLS, Oracle, Teradata, and Hadoop feed files
  • Utilized CloudFormation and AWS CodeCommit, CodePipeline, and AWS Deploy for continuous deployment
  • Implemented observability and monitoring with CloudWatch logs, Splunk, PagerDuty, and New Relic
  • Developed an automated Spark/Scala framework for both batch and real-time streaming
  • Created a synthetic data generator to produce mock data for lower environments
  • Implemented masking algorithms and encryption logic as needed
  • Collaborate regularly with the ML team and data science team to enable ad hoc or incremental data tasks., Create road map to Design,Building & Automation of data pipelines using dataproc, data fusion and data flow
  • Create data lakes & gather data with Batch and Streaming on top of GCS, DPMS
  • Created the curated environment on top of Big query with fine grain access model
  • Used Cloud composer, Cloud scheduler For the orchestration
  • Created the CI/CD pipelines using Cloud build and terraform combination
  • Added the monitoring and logging part with stark driver also integrated with Splunk and New relic
  • To be able to perform detailed analysis of business problems and technical environments and use this in designing the solution;
  • To be able to work creatively and analytically in a problem-solving environment;
  • Created end to end low latency streaming connecting KafkaPub sub using dataflow and targeting to Big table
  • Used google catalog for classification of the data and tightly integrated with Big query to enable persona based access
  • Worked with network team and governance team to build the GCP interconnect and to define persona based roles
  • Migrated historical data also migrated all the brownfield application to greenfield in cloud native
  • Worked with security team and to get approval to enable GCP managed services.

CTS

Senior Full stack Dev ops & Data Engineer
08.2019 - Current

Job overview

  • Technologies Used: AWS EMR,Kinesis,S3,Cloud watch,Lambda ,Azure functions,event Hub ,Blob , Azure SQLDW ,Hadoop,Kafka, HBASE,, Spark Streamng Core java,Scala.,Python
  • Summary
  • Developed real time Streaming Application using AWS Kinesis, Kafka, Spark Streaming, Kafka streams,Cassandra,Oracle
  • Build the data streaming pipeline between Saleforce to Aws Kinesis
  • Build and deploy the Developed using continuous integration CI-CD Bamboo
  • Transform and flatten the data received in Raw layer using Spark
  • Created Monitoring mechanism using Splunk dashboard
  • Automated the deployment using Jenkins
  • After flattening Make sure the data replicated to Access instance[Redshift and Azure synapse] to Users
  • Build adhoc analytical batch using spark jobs
  • Build the data streaming pipeline between Saleforce to Azure using event hub
  • Build and deploy the Developed using continuous integration CI-CD Bamboo
  • Transform and flatten the data received in Raw layer using Databricks
  • Utilize azure polybase using ADLS & Azure synapse.

CTS, NBC UNIVERSAL

Senior Data Engineer
12.2018 - 08.2019

Job overview

  • Technologies Used: Hadoop,Kafka, Cassandra, Spark Streaming, MongoDB, Teradata, SBT, OGG, Oracle,Core java,Scala.Hibernate,Python,Aws
  • Summary
  • Developed real time Streaming Application using AWS Kinesis, Kafka, Spark Streaming, Kafka streams,Cassandra,Oracle
  • Done the data modelling and DE-normalization in Cassandra
  • Handled the Kafka producer part integrating it into Oracle table using Oracle golden gate
  • For consuming, the data used Spark streaming to replicate the streaming data into Cassandra
  • Automated the deployment using Jenkins and Ansible
  • Created Grafana visualization for Metrics monitoring .[Kafka, Cassandra]
  • For Reconciliation [compare the oracle and Cassandra data count], created Spark batch process
  • In admin, side enabled the spark history server, tuning the Cassandra Database
  • Maintain the Tombstones and retention periods
  • Involved wring business logics using Oracle PL/SQL in Oracle DB
  • Also created Batch jobs migrating Data from Mongo DB to Cassandra, also Teradata to Cassandra.

CAPGEMINI, GLOBAL BANKING

Senior Data Engineer
03.2015 - 12.2018

Job overview

  • Technologies Used;- Hadoop,Kafka, Cassandra, Spark Streaming, MongoDB, Teradata, SBT, OGG, Oracle,Core java,Scala.Hibernate,Python,Aws,IBM CDC,Flume
  • Summary
  • Created datapipeline using IBM CDC, Kafka using spark direct streaming
  • Developed Spark scripts by using Scala shell commands as per the requirement
  • Responsible for building scalable distributed data solutions using Hadoop
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself
  • Worked extensively with Sqoop for importing metadata from Oracle
  • Involved in creating Hive tables, and loading and analyzing data using hive queries
  • Developed Hive queries to process the data and generate the data cubes for visualizing
  • Implemented schema extraction for Parquet, ORC and Avro file Formats in Hive
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE
  • Used Reporting tools like Tableau to connect with Hive for generating daily reports of data
  • Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.

3i Infotech

Senior Data Engineer
10.2012 - 03.2015

Job overview

  • Technologies Used: Hadoop, spark, Data Warehouse, Plsql, sql, Java, UNIX shell scripting
  • Summary
  • Involving in Business Process analysis, Requirements review and identification of business impact
  • Loading data from source system to Data warehousing by Migration
  • Making user adhoc Reports and analyzing the Big data environment
  • Worked in Map reduce and Hive, Spark environment
  • Worked extensively with Sqoop for importing metadata from Oracle
  • Involved in creating Hive tables, and loading and analyzing data using hive queries
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE
  • Used Reporting tools like Tableau to connect with Hive for generating daily reports of data
  • Writing Programs for reports using map reduce, Hive, Spark sql when required
  • Handled user web logs using Kafka by creating Topics and also analyzed using spark streaming
  • Co-ordinate defect status review meetings with the testing teams on issues with the defects in the plate
  • Writing test cases, test scripts and processing flow scenarios
  • Coding the business requirements and technical requirements with PLSQL, SQL, and UNIX
  • Periodic performance tuning of SQL is to achieve better performance using tools like EXPLAIN PLAN
  • Creating stored procedures, Function, Packages and Triggers
  • Extensively used Ref cursor, Bulk collect, PL/SQL Collections, dynamic sql
  • Performing Unit-testing, System testing, Regression testing, Integration testing
  • In addition, involved in SQL Query tuning.

TCS, Mortgage Banking

Data Engineer
10.2010 - 09.2013

Job overview

  • Hadoop, spark, Data Warehouse, Plsql, sql, Java, UNIX shell scripting
  • Summary
  • Involving in Business Process analysis, Requirements review and identification of business impact
  • Loading data from source system to Data warehousing by Migration
  • Making user adhoc Reports and analyzing the Big data environment
  • Worked in Map reduce and Hive, Spark environment
  • Worked extensively with Sqoop for importing metadata from Oracle
  • Involved in creating Hive tables, and loading and analyzing data using hive queries
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE
  • Used Reporting tools like Tableau to connect with Hive for generating daily reports of data
  • Writing Programs for reports using map reduce, Hive, Spark sql when required
  • Handled user web logs using Kafka by creating Topics and also analyzed using spark streaming
  • Co-ordinate defect status review meetings with the testing teams on issues with the defects in the plate
  • Writing test cases, test scripts and processing flow scenarios
  • Coding the business requirements and technical requirements with PLSQL, SQL, and UNIX
  • Periodic performance tuning of SQL is to achieve better performance using tools like EXPLAIN PLAN
  • Creating stored procedures, Function, Packages and Triggers
  • Extensively used Ref cursor, Bulk collect, PL/SQL Collections, dynamic sql.

Education

Anna University

Bachelor of Electrical and Electronics Engineering
01.2008

University Overview

Skills

  • Proficient in various data architecture concepts such as
  • Data Warehousing, Data Lake, and Data Mart Skilled in designing and implementing data architecture solutions that align with business needs and data strategies
  • Well-versed in data warehousing methodologies, ETL (Extract, Transform, Load), and ELT (Extract, Load, Transform) processes, ensuring efficient data integration and transformation
  • In addition I am good at programming languages Python/Scala/Java and Go
  • Worked in Event-driven architecture, Serverless architecture, and Medalian architecture
  • AWS
  • EMR,s3,Kinesis, Lambda,Red shift, Dynamo DB,Aws code deploy, beanstalk, Cloud watch,Data Pipeline,AWS Glue
  • Azure
  • Azure hdinsight,Blob,Databricks,Event Hub, SQLDW,ADLS,Data factory
  • GCP
  • Dataflow, data proc, GCS, Data fusion, dara catalog, Pub/Sub, Big query, Big table, Cloud composer
  • ETL
  • Informatica, Abinitio, Talend
  • DW
  • Teradata Netezza
  • Orchestration
  • Espx, Ctrl-M, Nifi, Airflow, oozie, Autosys
  • Distribution System
  • Data Stax, Cloudera
  • Languages
  • Scala, Core Java,ORACLE SQL/PL SQL
  • Scripting
  • Python,Unix Shell Script
  • Cloud
  • AWS,Azure, GCP
  • Hadoop Tools
  • Hive,Pig,Sqoop,Hbase,Spark, Impala
  • Data Extraction \Transformation
  • Kafka, Spark , Map reduce
  • No sql Db
  • Cassandra, Hbase, Prometheus ,Elastic,MongoDB
  • Streaming Tools
  • Flume,Kafka,Spark Streaming,AWS Kinesis,OGG,Apache Nifi,Azure Event Hub
  • Databases
  • Oracle , Teradata, Postgre
  • Development Tools
  • Intellij,Eclipse, Sql Developer,Dbvisualizer,Git
  • Contionous Deployement Tools [CI-CD]
  • Jenkins,Bamboo,Ansible, Azure Devops
  • Management Skills
  • Project lead, Solution Architect, principle data Engineer, Migration Consultant
  • Other
  • Git,Bit bucket,Git Lab
  • Logging Mechanisms
  • Splunk[Splunk query,Reports,Alerts & dashboards]
  • Visualization
  • Tableau, quicksight,Looker
  • Performing Unit-testing, System testing, Regression testing, Integration testing
  • In addition, involved in SQL Query tuning
  • Reverse Engineering

Certification

  • Azure - Certified Data Engineer & Dev ops Engineer & Architect
  • GCP – Certified cloud data engineer
  • AWS – Certified Associate Developer & Certified Data Analytics

Timeline

Data Archietct
02.2021 - Current
Senior Full stack Dev ops & Data Engineer
CTS
08.2019 - Current
Senior Data Engineer
CTS, NBC UNIVERSAL
12.2018 - 08.2019
Senior Data Engineer
CAPGEMINI, GLOBAL BANKING
03.2015 - 12.2018
Senior Data Engineer
3i Infotech
10.2012 - 03.2015
Data Engineer
TCS, Mortgage Banking
10.2010 - 09.2013
Anna University
Bachelor of Electrical and Electronics Engineering
VENKAT RAMANAN RDATA/ML Engineer