Summary
Overview
Work History
Education
Skills
Projects
Timeline
Generic

Apurv Siwach

Gurgaon

Summary

Accomplished Big Data Consultant and Senior Data Analyst with a proven track record at Alveo Technologies and EClerx Services, specializing in deploying scalable big data solutions and enhancing data-driven decision-making. Expert in Python programming and adept at fostering cross-functional collaboration, we have significantly optimized data processing and analytics, driving business intelligence and operational efficiency.

Overview

3
3
years of professional experience

Work History

Senior Data Analyst

EClerx Services
02.2022 - Current
  • Delivered comprehensive reports highlighting key trends, patterns, anomalies, presenting findings to senior management for informed decision-making purposes.
  • Analyzed large amounts of data to identify trends and find patterns, signals and hidden stories within data.
  • Leveraged text, charts and graphs to communicate findings in understandable format.
  • Partnered with IT teams to ensure seamless integration between databases and analytical tools, maximizing system efficiency across departments.
  • Reduced manual data entry errors by designing and deploying automated ETL processes to transform raw data into usable formats.
  • Collaborated with cross-functional teams to define requirements and develop end-to-end solutions for complex data engineering projects.
  • Established standard procedures for version control, code review, deployment, and documentation to ensure consistency across the team''s work products.
  • Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.

Big Data Consultant

Alveo Technologies
04.2022 - 12.2023
  • Enabled real-time analytics capabilities by designing efficient streaming processes using tools like Kafka.
  • Designed and deployed scalable big data solutions with cloud-based platforms like AWS.
  • Led identification of process improvements and changes to methodology based on project experience.
  • Spearheaded the development of proof-of-concept projects showcasing the potential benefits of integrating big data solutions into existing workflows.
  • Optimized data storage and retrieval for faster processing, utilizing distributed database systems such as Hadoop and Spark.
  • Streamlined ETL processes for seamless integration of various data sources into a unified system.
  • Ran statistical analyses within software to process large datasets.

Education

Bachelor of Science - Computer Science And Engineering

Lovely Professional University
Jalandhar, India
08.2022

Skills

  • Power BI
  • Python Programming
  • Data Warehousing
  • Product Management
  • ETL processes
  • Apache Spark
  • Hadoop Ecosystem
  • SQL and Databases
  • Data Pipeline Design
  • Data Modeling
  • NoSQL Databases
  • Data Analysis
  • Amazon Web Services
  • Linux

Projects

Project Title : Real-time algorithmic trading using Apache Flink


This project implements a cutting-edge real-time algorithmic trading system leveraging the power of Apache Flink, a robust stream processing framework. The system is designed to analyze financial market data streams, execute trading strategies, and make split-second decisions in the fast-paced world of electronic trading.

Key Features:


Technologies Used:

  • Apache Flink for stream processing
  • Kafka for data ingestion
  • Custom trading algorithms implemented in Python
  • Time series databases for storing historical data
  • RESTful APIs for external integrations

This project demonstrates the application of big data technologies in the financial sector, showcasing how stream processing can be harnessed for real-time decision making in algorithmic trading.


Project Title: Reddit Data Pipeline Engineering


This project implements a robust and scalable data pipeline for ingesting, processing, and analyzing data from Reddit, one of the world's largest social media platforms. The pipeline is designed to handle the high volume and variety of data generated by Reddit's diverse communities, providing valuable insights for social media analysis, trend spotting, and user behavior research.

Key Features:

Technologies Used:

  • Apache Kafka for data ingestion and message queuing
  • Apache Spark for large-scale data processing
  • Airflow for workflow orchestration
  • PostgreSQL for structured data storage
  • Elasticsearch for fast, full-text search capabilities
  • Python for data processing and API interactions
  • Docker for containerization and easy deployment


Project Title: Enterprise Data Warehouse and Advanced Analytics Platform


This project involves designing and implementing a comprehensive data warehouse and analytics platform to consolidate data from multiple sources across the organization, enabling advanced business intelligence capabilities and data-driven decision-making.


Key Components:

Technologies Used:

  • AWS Redshift for cloud data warehousing
  • Power BI for data visualization and reporting
  • DBT (data build tool) for data transformation and modeling
  • Git for version control
  • Python and R for advanced analytics


Outcomes:

  • Reduced report generation time by 70%
  • Improved data accuracy and consistency across departments
  • Enabled real-time access to critical business metrics
  • Facilitated data-driven decision making through predictive analytics
  • Achieved a single source of truth for enterprise-wide reporting


This project showcases expertise in data modeling, ETL processes, and the effective use of modern BI tools to transform raw data into actionable insights, driving business value across the organization.


Project Title: Enterprise-Wide Data Warehouse Implementation


This project involved the design, development, and implementation of a comprehensive data warehouse solution to centralize and optimize data management across the organization. The data warehouse serves as a single source of truth, enabling advanced analytics, reporting, and data-driven decision-making.

Key Features:

Technologies Used:

  • Amazon Redshift for data warehousing
  • AWS Glue for ETL processes
  • Data Lakes for data processing
  • dbt (data build tool) for data transformation
  • Airflow for workflow orchestration
  • Git for version control
  • Tableau and Power BI for data visualization


Outcomes:

  • Reduced data silos and improved data accessibility across the organization
  • Accelerated reporting time from days to minutes
  • Enabled advanced analytics capabilities, leading to data-driven decision making
  • Improved data quality and consistency
  • Achieved cost savings through optimized data storage and processing

Timeline

Big Data Consultant

Alveo Technologies
04.2022 - 12.2023

Senior Data Analyst

EClerx Services
02.2022 - Current

Bachelor of Science - Computer Science And Engineering

Lovely Professional University
Apurv Siwach