Summary
Overview
Work History
Education
Skills
Websites
Certification
Projects
Awards
Timeline
Generic
Ajay Singh Negi

Ajay Singh Negi

Dehradun

Summary

Organized and self-motivated Data Engineer with six years of success designing and implementing database solutions. Goal-orientated with ability to understand business problems and create systems to improve functionality. Work effectively, independently, and in collaborative settings.

Overview

6
6
years of professional experience
1
1
Certification

Work History

Lead Consultantz

ITC INFOTECH
gurugram
11.2024 - Current

- Built streaming pipelines on Amazon Managed Service for Apache Flink (Python), integrating Kinesis Data Streams, Redis, Kinesis Firehose, and S3 with CloudWatch/SNS alerting.
- TypeAhead Search : Designed end-to-end Flink solution that ingests from Kinesis, applies entity identification/business rules, writes low-latency features to Redis, and ships enriched data via Firehose to S3 for downstream analytics (Athena).
- Context Engine (POCs): Implemented multiple Flink POCs and a Neo4j-backed graph layer; consumed Kinesis streams and persisted entities/relationships for graph-driven queries.
- Packaging/Deployment: Standardized Flink packaging—built dependency JAR (Maven/pom.xml), bundled Python code, requirements, connectors, and extra libs into code.zip; published to S3 and configured Managed Flink to consume artifacts.
- Reliability/Observability: Added CloudWatch metrics/alarms for Redis and Flink crash detection; integrated SNS email notifications for proactive incident response.
- Data Quality on Glue: Authored a reusable Python package for batch data quality checks; enabled teams to import and run standardized validations across datasets.
- Stack: Lambda, Managed Flink (Python), Kinesis Streams, Firehose, Redis, S3, Neo4j, Glue, EMR, EC2, IAM, SNS, Athena, SageMaker, EMR Studio, CloudWatch.

Senior Software Engineer - Data Engineering Team

Atmecs Technologies
Bengaluru
07.2019 - 11.2024

- Led migration of legacy ETL to PySpark on Hadoop/EMR, increasing throughput and system load capacity; optimized Spark jobs (~65% faster) via partitioning, join strategy tuning, and Parquet optimizations.
- Built scalable ETL/ELT pipelines in Python/PySpark to cleanse, transform, and load data into an S3-based data lake (Parquet); enabled downstream consumption in analytics/BI.
- Orchestrated batch pipelines with Airflow DAGs on EC2, managing dependencies, scheduling, retries, and automation for end-to-end workflows and backfills.
- Created automation with Bash/Shell to export from relational stores (MySQL/PostgreSQL) to HDFS/S3 as Parquet; reduced manual ops and improved reliability.
- Implemented features to extract, transform, and load weekly PostgreSQL updates into target stores, ensuring schema compatibility and data freshness SLAs.
- Established comprehensive test and data validation coverage (unittest/pytest) for transformed datasets; enforced business rules and anomaly checks prior to release.
- Set up CI/CD pipelines in Azure DevOps (YAML) for packaging, testing, and deploying data jobs; improved release cadence and consistency.
- Performed data analysis to design optimal aggregations and reporting flows, improving processing efficiency by ~30%; documented logic and data lineage.
- Drove code reviews, version control best practices (Git), and production readiness checks; collaborated with cross-functional teams for requirements and delivery.
- Supported platform hygiene and stability (OS/Hadoop updates, patches, version upgrades); built diagnostic monitors to detect output changes and notify via email.
- Contributed to UAT, defect triage, and production issue resolution within Agile/Scrum ceremonies using Jira.

Education

Bachelor of Computer Application -

Uttaranchal University
Dehradun, India

M.Tech - Software Systems

BITS Pilani
Bengaluru, India

Skills

  • Programming Languages: Python, SQL
    Distributed Data Processing: Apache Spark, PySpark, Hadoop, HDFS, Databricks, Amazon EMR, EMR Studio
  • Streaming and Real-time: Apache Flink (Python), Amazon Managed Service for Apache Flink, Amazon Kinesis Data Streams, Amazon Kinesis Data Firehose, Redis
  • Cloud Platforms and Services: AWS, Azure, Amazon S3, AWS Lambda, AWS Glue, Amazon EC2, Amazon Athena, AWS IAM, Amazon CloudWatch, Amazon SNS, Amazon SageMaker
    Databases and Storage: PostgreSQL, MySQL, Amazon Redshift, Neo4j (Graph Database)
  • Data Transformation and Analysis: Pandas, NumPy, SQL
  • Data Formats: Parquet
    Orchestration and Workflow: Apache Airflow
  • DevOps and CI/CD: Git, GitHub, Azure DevOps (YAML), Maven
  • Scripting and Automation: Bash, Shell scripting
  • Testing and Data Quality: Pytest, Unittest, Data validation frameworks
  • Project and Delivery: Jira, Agile methodologies, Scrum
  • Professional Skills: Continuous process improvement, Performance optimization and tuning, Analytical problem-solving, Reporting and documentation
  • Data Engineering Practices: ETL, ELT, Data Pipelines, Data Ingestion, Data Transformation, Data Quality, Observability, Monitoring

Certification

  • Python 101 for Data Science, IBM Developer Skills Network
  • Data Analysis Using Pyspark - Cleaning and Exploring Big Data using PySpark, Coursera Course Certificates
  • Python (Basic) Certificate, HackerRank

Projects

Blue Vector - Data Team, 01/01/19, 04/01/19, Build end to end ETL for customer data, using pandas, sql, and Google Notebook. Developed different modules in python and pandas to interact with Mysql DB and reading tables, and applying business logic and finally loading it to Target locations.

Awards

  • Lord Of The Code - 2nd Place, Uttaranchal University, 2019
  • Website Designing - 3rd Place, Uttaranchal University, 2019
  • TrailBlazer Award, Atmecs Technologies, 2024, Recognized for exceptional contributions.

Timeline

Lead Consultantz

ITC INFOTECH
11.2024 - Current

Senior Software Engineer - Data Engineering Team

Atmecs Technologies
07.2019 - 11.2024

Bachelor of Computer Application -

Uttaranchal University

M.Tech - Software Systems

BITS Pilani
Ajay Singh Negi