Summary
Overview
Work History
Education
Skills
Certification
Projects
Timeline
Generic

Harika Saroja Ivaturi

Denton

Summary

Data Engineer with over 3 years of experience in Business Intelligence Reporting, Google Cloud services, Big Data/Hadoop ETL, and supply chain product development. Proficient in building and managing large-scale data pipelines using Python and PySpark. Skilled in data cleaning, transformation, and visualization with a strong background in Google Cloud Platform (GCP) and container orchestration.

Overview

5
5
years of professional experience
1
1
Certification

Work History

AI/ML Intern

Inclined Analytics
01.2025 - Current
  • Handled large Medicare Part B datasets using Pandas and NumPy for data manipulation and cleaning.
  • Designed ML pipelines for healthcare fraud detection using K-Means clustering, Mahalanobis Distance, and Z-score analysis.
  • Engineered features using weighted embeddings based on service volume and charges.
  • Conducted anomaly detection to flag irregular billing patterns and over-utilization of HCPCS codes.
  • Built scalable Python-based data pipelines and machine learning models.
  • Performed statistical analysis to reduce false positives and improve fraud detection accuracy.
  • Created insightful visualizations (scatter plots, box plots) using Seaborn and Matplotlib to present findings to stakeholders.

Data Engineer

Accenture
10.2021 - 12.2022
  • Built end-to-end data engineering workflows using MySQL, Hadoop, Oracle, and NoSQL databases (HBase, Cassandra).
  • Defined and automated ETL pipelines using Oozie; supported MySQL-to-Hadoop migration efforts.
  • Participated in database architecture planning and SQL performance tuning (execution plans, indexing, materialized views).
  • Deployed Oracle databases on AWS EC2 instances.
  • Used PySpark to optimize Hive SQL queries, including non-equi joins.
  • Conducted POCs on Hive table bucketing and partitioning to assess performance.
  • Used Apache Sqoop for data migration and managed datatype handling post-transfer.
  • Utilized Python collections for data manipulation and processing of custom objects.

Data Engineer

TEK Systems Global Services
09.2020 - 10.2021
  • Built and optimized ETL pipelines using Python, PySpark, Hive SQL, and Presto, including data transformation and cleansing to ensure high-quality, reliable data for reporting and analysis.
  • Migrated SAS programs and on-prem data pipelines to Python and cloud platforms (AWS, GCP), improving performance and scalability.
  • Developed backend systems and automated scripts for data aggregation, migration, and ingestion from APIs and flat files.
  • Created SQL/PLSQL procedures, Oracle Reports, and dashboards using Matplotlib and Plotly for business insights and forecasts.
  • Used AWS services (S3, Redshift) and Kubernetes for scalable, containerized deployments and data archival strategies.
  • Conducted functional and system testing; implemented logging, monitoring, and documentation for data workflows.
  • Applied data masking, anonymization, and compliance techniques (GDPR, HIPAA) for secure data handling.
  • Improved ETL efficiency through parallel processing, partitioning strategies, and Spark performance tuning.
  • Coordinated with stakeholders to translate business requirements into technical data solutions and reporting logic.
  • Ensured data integrity with reconciliation checks and root cause analysis across diverse data sources.

Education

Master of Science - Advanced Data Analysis

University of North Texas
Denton, TX
05.2024

Bachelor of Technology - Electronics And Communications Engineering

GVPCEW
Visakhapatnam, India
06.2020

Skills

  • Python and SQL
  • PL/SQL and Hive
  • Hadoop and PySpark
  • Cloud platforms (GCP and AWS)
  • Data visualization (Power BI, Seaborn, Matplotlib)
  • Container orchestration (Docker and Kubernetes)
  • Version control (GitLab)
  • Data ingestion (Apache Sqoop)

Certification

  • Architecting with Google Kubernetes
  • AWS Fundamentals
  • Professional Google IT Support
  • Cybersecurity
  • AZ 900: Microsoft Azure Fundamentals
  • Supervised Machine Learning: Regression and Classification

Projects

In-Vehicle Coupon Recommendation System, Built a machine learning model using Python and Scikit Learn to recommend coupons based on user behavior and real-time location data. Achieved a 20% increase in coupon redemption rates through personalized suggestions. Customer Churn Prediction, Developed a machine learning model using Python and Scikit Learn to predict customer churn for a telecom company. Improved retention by 15% using Logistic Regression and XGBoost for classification.

Timeline

AI/ML Intern

Inclined Analytics
01.2025 - Current

Data Engineer

Accenture
10.2021 - 12.2022

Data Engineer

TEK Systems Global Services
09.2020 - 10.2021

Master of Science - Advanced Data Analysis

University of North Texas

Bachelor of Technology - Electronics And Communications Engineering

GVPCEW
Harika Saroja Ivaturi