Summary
Overview
Work History
Education
Skills
Additional Information
Accomplishments
Certification
Timeline
Generic
Jyoti Kanwar

Jyoti Kanwar

Technical Architect- Big Data
Pune

Summary

• Over 10+ Years of IT experience in Data Engineering and Application Development using Big Data Technologies and Cloud Services.
• Working experience in Hadoop ecosystem (Gen-1 and Gen-2) and its various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager (YARN), Application master Node manager.
• Experience with components such as Cloudera distribution encompassing components like MapReduce, Spark, SQL, Hive, HBase, Sqoop, Pyspark.
• Good skills on NoSQL Database- Cassandra.
• Proficient in developing Hive scripts for various business requirements.
• Knowledge in Data Warehousing Concepts in OLTP/OLAP System Analysis and developing Database Schemas like Star Schema and Snowflake Schema for Relational and Dimensional Modelling.
• Good hands on in creating custom UDF’s in Hive.
• Load and transform large sets of structured, semi-structured and unstructured data from Relational Database Systems to HDFS and vice-versa using Sqoop tool.
• Working knowledge on Hive UDF and various joins.
• Good Experience on architecture and components of Spark, and efficient in working with Spark Core, Data Frames/Data Sets/RDD API/Spark SQL, Spark streaming and expertise in building PySpark and Spark-Scala applications for interactive analysis, batch processing and stream processing.
• Hands-on experience in Spark, Scala, SparkSQL, Hive Context for Data Processing.
• Working knowledge on GCP tools like Cloud Function, Dataproc, Big Query.
• Experience on Azure cloud i.e., ADF, ADLS, Blob Storage, Databricks, Synapse etc.
• Extensive working experience in an Agile development Methodology & Working knowledge on Linux. • Expertise in working with big data distributions like Cloudera and Hortonworks.
• Automated data pipelines using streams & tasks Involved in loading the structured and semi structured data into spark clusters using Spark SQL and Data Frames Application programming interface (API).
• Experience in working with Hive data warehouse tool-creating tables, distributing data by doing static partitioning and dynamic partitioning, bucketing, and using Hive optimization techniques.
• Experience in tuning and debugging Spark application and using Spark optimization techniques.

• Knowledge on architecture and components of Spark and demonstrated efficiency in optimizing and tuning compute and memory for performance and price optimization.

• Expertise in developing batch data processing applications using Spark, Hive and Sqoop.
• Experience in working with CSV, JSON, XML, ORC, Avro and Parquet file formats.

• Good experience in creating and designing data ingest pipelines using technologies such as Apache Kafka.
• Worked on most of the popular AWS stack like S3, EC2, EMR, Athena.
• Good knowledge in working with ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data Warehouse tools for reporting and data analysis.

• Basic Experience in implementing Snowflake Data Warehouse.
• Experience in working with version control systems like Git, GitHub, CI/CD pipelines.

Overview

10
10
years of professional experience
4
4
years of post-secondary education
6
6
Certifications
1
1
Language

Work History

Manager - Data Engineering

lentra.ai
Pune, Maharashtra
07.2023 - 04.2025
  • Led the design and implementation of scalable data pipelines, enhancing data accessibility for cross-functional teams.
  • Streamlined data integration processes, resulting in improved accuracy and reduced processing time across systems.
  • Mentored junior engineers, fostering a collaborative environment that improved team performance and project delivery.
  • Spearheaded the migration to cloud-based data storage solutions, increasing operational efficiency and reducing costs.
  • Opened and closed location and monitored shift changes to uphold successful operations strategies and maximize business success.
  • Established team priorities, maintained schedules and monitored performance.
  • Reduced operational costs through comprehensive process improvement initiatives and resource management.

Technical Architect

ValueLabs
Hyderabad
10.2022 - 02.2023

Client: Pepperstone (A FinTech Startup)

As part of this role, I am responsible for building scalable and efficient data pipeline with AWS Services, Kafka Streams, Spark Structured Stream, Kubernetes, and Docker.

  • Introduced gRPC framework to connect distributed components, which eventually increased client-server connectivity as it exchanges data over HTTP/2 protocol in binary format developed in Golang with the TDD process.
  • Involved in the development of a real-time data pipeline that streams trades from different trading platforms (e.g., MT4, MT5, and cTrader) into a single, unified data source. This unified and centralized view of Pepperstone’s trades has revolutionized the company’s decision making, risk management, fraud detection, and analytical research capabilities and has helped meet regulators’ increasingly stringent reporting requirements.
  • Moved from manual deployments to automated using Infra as code.
  • Build, lead and manage the team of senior/junior Data Engineers to develop the entire data transformations architecture, and deliver scalable data solutions.
  • Pro-actively find issues with current data architecture and prepare elegant technical solutions to mitigate these risks in the future.
  • Improve the Data Transformations module to increase the performance and user experience while minimizing computing costs and technical issues.
  • Provide direction to our data engineering and architecture determine the right tools for the right jobs and what gets built based on the requirement.
  • Create and maintain analytics data pipelines that generate data and insight to power business decision making.
  • Bootstrapping a data engineering team at an early stage in the team’s evolution.
  • Performed quality code review and removed technical debt and security vulnerabilities.
  • Monitored and tested application performance to identify potential bottlenecks, develop solutions and collaborate with developers on solution implementation.
  • Managed and monitored installed systems for highest level of availability.
  • Act on the feedback by working closely with our Product team to prioritize and define a product roadmap and engineering backlog .
  • Skills/Technologies/Tools: Apache Spark, Apache Kafka, Apache Airflow, Python, Golang, Scala, SQL, AWS, Docker, Apache Nifi, Git, Jira, TDD, Kubernetes, Microservices, Apache hudi, Data Warehousing, Data Lake Currently, I am responsible for building scalable and efficient data pipeline with AWS Sevices, Kafka Streams, Spark Structured Stream, Kubernetes, and Docker.

Assistant Vice President

Credit Suisse
Pune
11.2021 - 08.2022

Data Analytics and Integration Services (DAAIS)

• Responsible for building scalable distributed data solutions using Spark.
• Ingested log files from source servers into HDFS data lakes using Sqoop.
• Developed Sqoop Jobs to ingest customer and product data into HDFS data lakes.
• Developed Spark streaming applications to ingest transactional data from Kafka topics into Cassandra tables in near real time.
• Developed an spark application to flatten the transactional data coming from using various dimensional tables and persist on Cassandra tables.
• Involved in developing framework for metadata management on HDFS data lakes.
• Worked on various hive optimizations like partitioning, bucketing, vectorization, indexing and using right type of hive joins like Bucket Map Join and SMB join.
• Worked with various files format like CSV, JSON, ORC, AVRO and Parquet.
• Developed HQL scripts to create external tables and analyze incoming and intermediate data for analytics applications in Hive.
• Optimized spark jobs using various optimization techniques like broadcasting, executor tuning, persisting etc.
• Responsible for developing custom UDFs, UDAFs and UDTFs in Hive.
• Analyze the tweets json data using hive SerDe API to deserialize and convert into readable format.
• Orchestrating Hadoop and Spark jobs using Oozie workflow to create dependency of jobs and run multiple Jobs in sequence for processing data.
• Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.

Skills/Technologies/Tools: Apache Hadoop, Apache Spark, Spark SQL, Spark Streaming, Hive, Cassandra, MySQL, HDFS,Apache Kafka, Python, Scala.

Manager

Morgan Stanley
Mumbai
10.2020 - 10.2021

Wealth Management Technology (WMT)

• Analyzed data using Hadoop components with Hive, Pig Queries and HBase queries.
• Load and transform large sets of structured, semi structured, and unstructured data using Hadoop/Big Data concepts.
• Involved in loading data from the UNIX file system to HDFS.
• Responsible for creating Hive tables, loading data, and writing hive queries.
• Handled importing data from various data sources, performed transformations using Hive, Map Reduce/Apache Spark, and loaded data into HDFS.
• Extracted the data from Oracle Database into HDFS using the Sqoop.
• Loaded data from Web servers and Teradata using Sqoop, Spark Streaming API.
• Utilized Spark Streaming API to stream data from various sources. Optimized existing Scala code and improved the cluster performance.
• Experience in working with Spark applications like batch interval time, level of parallelism, memory tuning to improve the processing time and efficiency.

  • Skills/Technologies/Tools: Hadoop 2.x, HDFS, Spark, Spark SQL, Spark Streaming, Sqoop, Eclipse, Python, Hive, Linux, MySQL, Apache Kafka, Unix, DWH, ETL, SQL.


Digital Solution Architect Sr. Advisor-Noida

NTT Data Information Processing Services
Noida
07.2025 - Current
  • Collaborated with cross-functional teams to deliver high-quality software products on time and within budget.
  • Evaluated emerging technologies, recommending appropriate adoption strategies for maximum return on investment.
  • Developed scalable architecture solutions for a diverse range of clients, resulting in improved performance and future growth opportunities.
  • Successfully migrated legacy applications to modern platforms, ensuring seamless integration and minimal disruption to operations.

ETL Developer

Amdocs
Pune
05.2019 - 09.2020

Client: Vodafone Italy (VFIT)

  • Building Big Data Pipeline for Customer Services Analytics for Vodafone.
  • Created Data Pipelines/Warehouse and analytics pipeline for telecommunication services exposed to more than 10 million customers.
  • Worked on kafka-spark streaming ADH (AmdocsDataHub) framework to provide data from Oracle to downstream after doing transformations in near real-time.
  • Created multiple utilities, automation and alerts to get insights from the real-time process also helpful for operations and monitoring.
  • Wrote and optimized in-application SQL statements.
  • On top of hive data, multiple critical extracts were created using HiveQL.
  • Skills/Technologies/Tools: Hive, Hadoop, Hbase, Spark, Kafka, Oracle, Python, Shellscripting, Jenkins/CI-CD, Grafana, Prometheus, JIRA.

Programmer Analyst

Cognizant
Pune
03.2016 - 04.2019

Project Undertaken: Building Big Data Pipeline and Warehousing solution to analyze the Data transformations that match a legacy system's data.

This involves building a data lake. Data sources use Hadoop tools to transfer data to and from HDFS and some of the sources, were imported using sqoop, then storing the raw data into HIVE tables in ORC format in order to facilitate the data scientists to perform analytics using HIVE. New use cases were developed and dumped into a NOSQL database (Hbase) for further analytics.

• Developed SQOOP scripts to import the source data from Oracle database into HDFS for further processing.

• Developed HIVE Script to store raw data in ORC format.

• Involved in gathering requirements, designing, development and testing.

• Generated reports using Hive for business requirements received on ADHOC basis.
Skills/Technologies/Tools: Cloudera CDH, Hadoop, HDFS, Hive, Sqoop, Hbase.

Education

B.Tech (CSE) - Computer Science

Rajasthan Technical University, Kota
Swami Keshvanand Institute Of Technology, Jaipur
08.2011 - 07.2015

Skills

    Data Engineering

undefined

Additional Information

  • , Dream Team Award, Amdocs. Team recognition award, Cognizant. Personal Details Date of Birth, Gender Female

Accomplishments

    CERTIFICATE OF RECOGNITION

    • Received appreciation for invaluable contribution in two projects.

    CERTIFICATION OF EXCELLENCE

    • Recognized for performance, dedication and support provided in the project.

  • Supervised team of 16 members.
  • Achieved an unified and centralized view of Pepperstone’s trades which has revolutionized the company’s decision making, risk management, fraud detection, and analytical research capabilities and has helped meet regulators’ increasingly stringent reporting requirements.

Certification

AWS Certified Solution Architect Associate

Timeline

Digital Solution Architect Sr. Advisor-Noida

NTT Data Information Processing Services
07.2025 - Current

Manager - Data Engineering

lentra.ai
07.2023 - 04.2025

Technical Architect

ValueLabs
10.2022 - 02.2023

Assistant Vice President

Credit Suisse
11.2021 - 08.2022

Manager

Morgan Stanley
10.2020 - 10.2021

ETL Developer

Amdocs
05.2019 - 09.2020

Programmer Analyst

Cognizant
03.2016 - 04.2019

B.Tech (CSE) - Computer Science

Rajasthan Technical University, Kota
08.2011 - 07.2015
Jyoti KanwarTechnical Architect- Big Data