Summary
Overview
Work History
Education
Skills
Accomplishments
Certification
Affiliations
Languages
Custom
References
Timeline
Generic
Jyoti Kanwar

Jyoti Kanwar

Pune

Summary

A highly skilled Data Engineering Leader with over 9 years of experience in designing and implementing scalable data solutions, leveraging Big Data technologies, AI, and cloud platforms. Proven track record in building high-performance data engineering teams, driving innovation, and delivering business-impacting solutions. Expertise in crafting end-to-end data pipelines, building real-time and batch data processing systems, and deploying machine learning models for actionable insights.

Data Engineering Expertise: Proficient in the Hadoop ecosystem (HDFS, MapReduce, Spark, Hive, HBase), NoSQL databases (Cassandra), and cloud services (AWS, Azure, GCP). Experience in designing and optimizing data architectures, creating data ingestion pipelines using Kafka, and implementing ETL frameworks for large-scale data processing.

AI and Machine Learning: Expertise in integrating AI-driven solutions into data pipelines, including anomaly detection, predictive analytics, and NLP techniques. Skilled in applying machine learning models and automating data processes for enhanced decision-making and operational efficiency.

Leadership and Team Management: Led and mentored high-performing teams of data engineers, fostering a culture of innovation and excellence. Spearheaded the development and optimization of complex data architectures, aligning team efforts with business goals to drive growth and strategic advantage.

Cloud & Distributed Systems: Deep experience with cloud tools (AWS S3, EC2, GCP BigQuery) and distributed computing systems like Spark and Kafka. Expertise in designing serverless architectures and managing data pipelines with minimal latency and high availability.

Collaboration and Communication: Adept at working with cross-functional teams, providing technical direction, and translating business requirements into actionable data solutions. Strong communication and stakeholder management skills, ensuring alignment and successful project delivery.

Overview

9
9
years of professional experience
1
1
Certification

Work History

Manager, Data Engineering

Lentra.ai
Pune
07.2023 - Current

Clients: Chola, PFL, HDFC, Bandhan

  • Lead a team of data engineers in designing and developing AI-powered data pipelines to automate credit risk assessments, fraud detection, and financial analysis, leveraging big data tools like Apache Spark, Kafka, Hadoop, Flink, and Scala, combined with machine learning models to detect and prevent fraudulent activities.
  • Spearheaded the implementation of fraud detection models by integrating AI-driven insights into the data workflow, enabling early identification of suspicious transactions and activities, significantly reducing fraud-related losses for clients in banking and financial services.
  • Played a key role in the development and optimization of the Cadenz Platform, specifically the Deduplication Service, improving real-time data processing, eliminating redundant data, and enhancing data quality for accurate fraud detection.
  • Utilized advanced AI techniques like anomaly detection, supervised and unsupervised learning, and predictive modeling to enhance fraud detection algorithms, continuously evolving the system to combat new fraud tactics and behavioral patterns.
  • Utilized AWS services like EC2, Lambda, API Gateway, and DynamoDB for building scalable and efficient AI-powered applications. Integrated OpenSearch for exact match search and DynamoDB for fuzzy search, optimizing data retrieval and fraud detection accuracy.
  • Led the development of robust API services through Lambda functions and API Gateway, enabling seamless integration of fraud detection services into client systems with high performance and low latency.
  • Worked closely with business stakeholders to ensure that AI models were aligned with organizational goals, and regulatory standards were met, improving operational efficiency and data-driven decision-making.
  • Managed and optimized cloud infrastructure (AWS, Azure), ensuring secure and scalable fraud detection systems, while leveraging cloud-native services for real-time processing and ensuring high availability.
  • Led initiatives to enhance ETL processes, ensuring seamless integration and transformation of data from diverse sources, improving data consistency and accuracy.
  • Built data architectures and pipelines that support large-scale machine learning models, with a focus on real-time streaming, data lakes, and data warehouses, optimizing performance and compliance.
  • Managed key client relationships and delivered tailored AI-based fraud detection solutions, improving business outcomes for clients like Chola, PFL, HDFC, and Bandhan, and ensuring regulatory compliance and data security.
  • Key skills: Big Data Technologies (Apache Spark, Kafka, Hadoop, Flink), Machine Learning (Fraud Detection, Anomaly Detection, Predictive Modeling), Cloud Services (AWS, Azure, GCP), AWS (EC2, Lambda, API Gateway, OpenSearch, DynamoDB), AI Solutions for Financial Services, ETL Pipelines, API Development, Data Architecture, AI Model Integration, Leadership and Team Management, Client Relationship Management.

Technical Architect - Data

ValueLabs
Hyderabad
10.2022 - 02.2023

Client: Pepperstone (FinTech Startup, Australia)

  • Architected and developed scalable, real-time data pipelines to stream trades from platforms like MT4, MT5, and cTrader into a unified data source, enabling improved decision-making, fraud detection, risk management, and regulatory compliance.
  • Spearheaded the integration of gRPC framework for efficient client-server communication over HTTP/2, developed in Golang using Test-Driven Development (TDD), significantly enhancing client-server connectivity and data exchange.
  • Designed and implemented data solutions using AWS, Apache Kafka, Apache Spark, Kubernetes, and Docker, ensuring high availability and low-latency data processing for real-time analytics and decision-making.
  • Automated infrastructure deployments using Infra as Code, reducing manual errors and accelerating scaling while lowering operational costs.
  • Built and led a high-performance team of data engineers to develop and deliver data transformation architectures, improving overall performance and reliability of data pipelines.
  • Collaborated with Product Managers and Business Stakeholders to align technical solutions with business needs and ensure high-value delivery.
  • Enhanced data transformation modules, optimizing performance while minimizing costs and ensuring seamless user experience.
  • Worked on data governance, ensuring compliance with industry regulations and privacy standards, safeguarding data integrity and security.
  • Monitored and tested application performance, identifying bottlenecks and collaborating with developers to ensure the highest system availability and efficiency.
  • Drove the evolution of the data engineering team, fostering a collaborative environment and implementing best practices for data architecture and engineering.

Skills/Technologies/Tools:
Big Data & Streaming: Apache Kafka, Apache Spark, AWS (EC2, Lambda, API Gateway), Kubernetes, Docker
Programming: Python, Golang, Scala, SQL
Architecture: Microservices, gRPC, Apache Airflow, Apache Nifi
Data Governance & Security: Data Warehousing, Data Lakes, Compliance, Data Privacy
Tools: Git, Jira, TDD

Assistant Vice President

Credit Suisse
Pune
11.2021 - 08.2022

Data Analytics and Integration Services (DAAIS)

  • Designed and developed scalable distributed data solutions using Apache Spark for processing and transforming large datasets.
  • Ingested log files from source servers into HDFS data lakes using Sqoop, ensuring seamless data integration and accessibility.
  • Developed Sqoop jobs for ingesting customer and product data into HDFS, enabling centralized data storage and streamlined access for analytics.
  • Built Spark Streaming applications to ingest transactional data from Kafka topics into Cassandra tables in near real-time, ensuring low-latency data processing and rapid decision-making.
  • Developed Spark applications to flatten transactional data and persist the data in Cassandra tables using various dimensional tables, enhancing query performance and data consistency.
  • Contributed to the development of a framework for metadata management on HDFS data lakes, improving data discovery and governance.
  • Performed extensive Hive optimizations, including partitioning, bucketing, vectorization, and indexing, to improve query performance; utilized advanced Hive joins like Bucket Map Join and SMB Join for more efficient data processing.
  • Worked with a variety of data formats, including CSV, JSON, ORC, AVRO, and Parquet, ensuring compatibility across different systems and use cases.
  • Developed HQL scripts to create external tables and analyze incoming data in Hive, supporting analytics applications and enabling data exploration.
  • Optimized Spark jobs using techniques like broadcasting, executor tuning, and data persistence, improving job performance and reducing execution times.
  • Developed custom UDFs, UDAFs, and UDTFs in Hive, enabling customized transformations and analytics on data.
  • Analyzed tweet JSON data using the Hive SerDe API to deserialize and convert data into readable formats for downstream processing.
  • Orchestrated Hadoop and Spark jobs using Oozie workflow, ensuring job dependencies and enabling efficient scheduling of multiple jobs for end-to-end data processing.
  • Continuously monitored and managed the Hadoop cluster using Cloudera Manager, ensuring optimal performance and reliability of the data infrastructure.

Skills/Technologies/Tools:
Big Data & Streaming: Apache Hadoop, Apache Spark, Spark SQL, Spark Streaming, Apache Kafka
Data Storage: Hive, Cassandra, HDFS, MySQL
Programming: Python, Scala
Data Formats: CSV, JSON, ORC, AVRO, Parquet
Orchestration & Management: Oozie, Cloudera Manager

Manager

Morgan Stanley
Mumbai
10.2020 - 10.2021

Wealth Management Technology, WMT

  • Analyzed data using Hadoop components, including Hive, Pig, and HBase, to drive insights for wealth management applications.
  • Loaded and transformed large sets of structured, semi-structured, and unstructured data using Hadoop/Big Data concepts, optimizing data pipelines for improved efficiency.
  • Involved in loading data from UNIX file systems into HDFS, ensuring smooth data flow and reliable storage for analytics processing.
  • Created Hive tables, loaded data, and wrote efficient Hive queries to support data transformations and provide insights for wealth management.
  • Handled data extraction from various data sources, including Oracle Database and Teradata, using Sqoop, enabling centralized storage in HDFS for downstream processing.
  • Streamed data from various sources using Spark Streaming API, enabling near real-time data processing and reducing latency for critical decision-making.
  • Optimized Scala code and fine-tuned Spark cluster performance, improving processing time and increasing system efficiency.
  • Improved Spark applications by adjusting batch interval times, level of parallelism, and memory tuning, reducing the time required for data processing tasks.

Skills/Technologies/Tools:
Big Data & Streaming: Hadoop 2.x, Spark, Spark SQL, Spark Streaming
Data Storage: Hive, HDFS, MySQL
Programming: Python, Scala
Data Integration: Sqoop, Apache Kafka
Operating Systems: Linux, Unix
Other: DWH, ETL, SQL, Eclipse

ETL Developer

Amdocs
Pune
05.2019 - 09.2020

Client: Vodafone Italy - VFIT

  • Built and optimized Big Data Pipelines for Customer Services Analytics to handle data from Vodafone Italy, supporting over 10 million customers.
  • Developed Data Pipelines/Warehouses and analytics pipelines for telecommunication services, enabling seamless data flow and real-time insights for the client.
  • Worked on Kafka-Spark streaming within the AmdocsDataHub (ADH) framework to transfer data from Oracle to downstream systems after performing data transformations in near real-time.
  • Automated multiple processes, created utilities, and set up alerts to monitor and gain actionable insights from real-time processes, aiding operational efficiency and monitoring.
  • Wrote and optimized in-application SQL statements for data extraction and transformation, improving the performance of data processing tasks.
  • Utilized HiveQL to create critical extracts from Hive data, providing essential reporting and analytics capabilities for the client.

Skills/Technologies/Tools:
Big Data: Hive, Hadoop, HBase, Spark, Kafka
Databases: Oracle
Programming: Python, Shell Scripting
CI/CD & Monitoring: Jenkins, Grafana, Prometheus
Project Management: JIRA

Programmer Analyst

Cognizant
Pune
03.2016 - 04.2019

Client: Sanofi

Project: Building Big Data Pipeline and Warehousing Solution for Data Transformation and Legacy System Data Matching

  • Designed and developed a data lake using Hadoop tools to efficiently transfer data to and from HDFS.
  • Utilized Sqoop to import source data from Oracle Database into HDFS for processing.
  • Stored raw data into Hive tables in ORC format, enabling data scientists to perform advanced analytics using Hive.
  • Developed new use cases and stored the results in HBase for further analysis.
  • Created and optimized Sqoop scripts for data ingestion, improving efficiency and data accuracy.
  • Wrote Hive scripts to store raw data in ORC format, enhancing storage efficiency and query performance.
  • Gathered requirements, designed, developed, and tested solutions to meet business needs.
  • Generated AD-Hoc reports using Hive for business teams, meeting the dynamic reporting requirements.

Skills/Technologies/Tools:
Big Data: Cloudera CDH, Hadoop, HDFS, Hive, Sqoop, HBase

Education

B.Tech (CSE) - Computer Science

Rajasthan Technical University, Kota
Swami Keshvanand Institute Of Technology, Jaipur, India

Skills

  • Data Engineering: Proficient in building scalable data pipelines using Apache Spark, Kafka, AWS (S3, Lambda, EC2), and Azure
  • Artificial Intelligence (AI): Expertise in integrating AI for fraud detection, machine learning model development, and data analysis automation
  • Big Data: In-depth experience with HDFS, Hive, Apache Kafka, and other big data frameworks
  • Cloud Services: Hands-on experience with AWS, GCP, and Azure for data management and integration
  • Data Validation & Quality Assurance: Ensuring data accuracy, consistency, and integrity through robust validation and quality assurance processes
  • Data Analysis: Strong analytical skills to process large datasets and provide actionable insights
  • Product Development: Expertise in data-driven product development and feature enhancement
  • Technology Integration: Skilled in integrating technologies like Kafka, Spark, and AI into existing systems
  • Problem-Solving & Critical Thinking: Ability to troubleshoot data issues and optimize solutions effectively
  • Team Leadership: Experienced in managing teams, fostering collaboration, and delivering results
  • Project Management: Strong project management skills, ensuring timely delivery in fast-paced environments
  • Multitasking & Communication: Able to manage multiple projects simultaneously and communicate complex ideas clearly to stakeholders

Accomplishments

Team Excellence Award (2022) – Recognized for outstanding leadership in building a high-performing data engineering team.
Innovation in Data Engineering (2021) – Awarded for pioneering the use of AI-driven automation in data processing pipelines.
Leadership Excellence (2020) – Awarded for exceptional leadership in driving cross-functional collaboration and mentoring junior engineers.
Top Performer in Data Engineering (2019) – Recognized for exceeding performance metrics and delivering key projects ahead of deadlines.

Certification

  • • Certified AWS Solutions Architect - Associate
    • Google Cloud Professional Data Engineer
    • Certified Azure Data Engineer
    • Databricks Certified Associate Developer for Apache Spark
    • Snowflake Data Warehouse Certification
    • Machine Learning Specialization

Affiliations

Wellness Advocate: Passionate about mental health and wellness, I actively promote mindfulness and meditation as effective tools for stress management and focus, both for myself and my teams.
Sustainable Living: Enthusiast of sustainable living practices, actively participating in eco-friendly initiatives like urban gardening, waste reduction, and sustainable fashion.
Community Engagement: Volunteering in local community programs focused on digital literacy and skill-building for underprivileged youth.

Languages

English
First Language
English
Proficient (C2)
C2
Hindi
Proficient (C2)
C2
French
Beginner
A1

Custom

Additional Information:

Public Speaking & Mentorship: Regularly speak at industry conferences on data engineering, AI, and cloud technologies. Actively mentor young professionals in tech through community-driven platforms.
Creative Interests: A keen interest in writing and blogging about the intersection of technology and personal growth.

References

References available upon request.

Timeline

Manager, Data Engineering

Lentra.ai
07.2023 - Current

Technical Architect - Data

ValueLabs
10.2022 - 02.2023

Assistant Vice President

Credit Suisse
11.2021 - 08.2022

Manager

Morgan Stanley
10.2020 - 10.2021

ETL Developer

Amdocs
05.2019 - 09.2020

Programmer Analyst

Cognizant
03.2016 - 04.2019

B.Tech (CSE) - Computer Science

Rajasthan Technical University, Kota
Jyoti Kanwar