Summary
Overview
Work History
Education
Skills
Hobbies and Interests
Awards
Timeline
Generic

Himanshi Thakkar

Mumbai

Summary

Data Automation Engineer with 4+ years of experience in the IT industry, specializing in data automation and business process optimization. Proven expertise in leveraging Azure and Spark to build and optimize automated ETL pipelines, driving significant efficiency gains and cost reductions. Possesses a strong foundation in API Development, Web Scraping, Data Visualization, and Text Analytics, demonstrated through numerous successful projects. Eager to contribute to challenging data engineering projects that leverage cutting-edge technologies.

Overview

5
5
years of professional experience

Work History

Process Manager - Sr. Consultant

eClerx
Mumbai
04.2024 - Current
  • Currently working on an agentic AI-driven web scraping system to autonomously monitor and extract updates from global compliance authorities (e.g., SEBI, FCA, MAS). Combining Selenium, BeautifulSoup, and Python agents with DOM-adaptive logic, using LLM-based heuristics, to maintain resilience across changing web structures.
  • Orchestrating scraping tasks via Kubernetes and AWS EC2, storing structured outputs in S3, and Azure SQL for downstream use.
    - Building resilient retry and deduplication logic within scraping agents, improving compliance data accuracy by 95%.
  • Led a team of 3 to design, develop, and deploy automated data pipelines in Python and SQL, reducing manual KYC efforts by 85%.
  • Migrated legacy data warehouse to Azure Synapse Analytics, enhancing data accessibility and performance for analytical reporting.
  • Developed cloud-based data pipeline using Azure Data Factory, Azure Databricks, and Azure SQL Database.
  • Created comprehensive automation modules for three clients, improving data processing and delivery.
  • Enhanced KYC document processing with Document Ingestion API, delivering documents in base64 format.
  • Developed a high-performance document language detection service using Python multiprocessing and RoBERTa transformers. Containerized the service with Docker and deployed it on AWS ECS, enabling scalable NLP processing for large batches of scanned documents in multiple formats.
  • Leveraged Azure DevOps CI/CD pipelines for deployment and maintenance of automated solutions.
  • Python, SQL, Azure (Data Factory, Databricks, Synapse), AWS (EMR, S3, EC2, Kafka), RoBERTa, Docker, Kubernetes, CI/CD (Azure DevOps, CodePipeline), Selenium.

Associate Process Manager - Consultant

eClerx
Mumbai
10.2020 - 04.2024
  • Orchestrated the development of a Negative News Screening module, utilizing NLP techniques such as Tokenization, Lemmatization, POS Tagging, and Named Entity Recognition to identify and filter negative news articles.
  • Develop, maintain, and automate data processing workflows on Azure cloud to ensure efficient data handling and transformation.
  • Led the development and implementation of automated data pipelines through Python and SQL, resulting in a reduction of manual KYC efforts.
  • Created ETL processes for data ingestion and transformation, increasing data accessibility for analytics teams.
  • Implemented monitoring and alerting systems for data workflows, enhancing reliability and reducing downtime.
  • Utilized Selenium and Python to automate web scraping tasks, facilitating efficient data extraction from dynamic websites.

Intern

Continental AG
Bangalore
12.2019 - 06.2020
  • Served as a Machine Learning Intern for Continental, BLR project.
  • Developed and implemented the 'Sign Recognition' project utilizing Python scripting language.
  • Utilized TensorFlow and Keras frameworks for project execution.
  • Automated the labeling process for rectangular boxes containing text on signboard dataset.
  • Trained the Textboxes++ model on Continental US signs dataset.

Education

Mtech - Signal Processing and communication

Indian Institute of Technology, Mandi
Mandi, HP, India
07.2020

Btech - Electronics and Communication

Kurukshetra University, HCTM
Kaithal, India
06.2015

Skills

  • Data Structure
  • SQL
  • Python
  • AWS
  • PySpark
  • Azure
  • ETL
  • Hive
  • Data Modelling
  • Azure Synapse
  • Data Warehouse
  • Devops
  • CI/CD
  • Selenium
  • Automation
  • Airflow
  • API
  • Leadership
  • Analytical Thinking
  • Collaboration
  • Adaptability
  • Learning Agility

Hobbies and Interests

  • Exploring distant lands
  • Getting lost in a good book
  • Capturing moments
  • Feeling the music

Awards

06/01/21, Values Award for integrity and excellence, eClerx

Timeline

Process Manager - Sr. Consultant

eClerx
04.2024 - Current

Associate Process Manager - Consultant

eClerx
10.2020 - 04.2024

Intern

Continental AG
12.2019 - 06.2020

Mtech - Signal Processing and communication

Indian Institute of Technology, Mandi

Btech - Electronics and Communication

Kurukshetra University, HCTM
Himanshi Thakkar