Summary

Overview

Work History

Education

Skills

Timeline

Arunashree S

Data Engineer

Bengaluru

Summary

Python Data Engineer with a proven track record with over 10 years of experience in designing, developing, and optimizing data pipelines and ETL processes. Proficient in leveraging Python and SQL for data manipulation, and experienced in working with big data technologies such as Apache Spark and cloud platforms like Azure. Proven track record of implementing robust data solutions that enhance data accessibility and insights across organizations. Strong analytical skills with a focus on improving data quality and integrity. Adept at collaborating with cross-functional teams to gather requirements and deliver data-driven solutions that support business objectives.

Overview

years of professional experience

years of post-secondary education

Work History

Data Engineer

Microsoft

08.2023 - Current

Building and optimizing data pipelines in Azure environments. Expertise in designing, developing, and deploying Azure Data Factory solutions to facilitate seamless data integration and transformation.
Using Azure DevOps for code reviews is an integral part of ensuring high-quality, maintainable, and collaborative software development.
SCOPE is used as query language which is a Microsoft query language used primarily within the internal data processing systems.
Enhanced data quality by performing thorough cleaning, validation, and transformation tasks.
Hands-on experience in implementing ETL processes and automating data workflows.
Streamlined complex workflows by breaking them down into manageable components for easier implementation and maintenance.
Collaborated with cross-functional teams for seamless integration of data sources into the company's data ecosystem.
Managed identification, protection and use of data assets.
Optimized data processing by implementing efficient ETL pipelines and streamlining database design.

Senior Research Engineer

Nuance Communications

11.2020 - 07.2023

Data set release for the audio files in .nwv, .stm formats
Creating mapped tsv files with chats and all information regarding the audio text
Migrated whole scraping project to Azure cloud
Creation of Virtual machines and running the tasks on Azure
Docker image creation, Uploading image to Azure container registry
Learning Azure Data factory and Azure Databricks
Hands on experience with Azure devops
Shell scripting for parallel processing to minimize the time taken by an individual task
Mapping the entertainment related data with meta information regarding a particular program title
Pyspark usage to handle large set of data to combine, map and manipulate the data
Created nutch with Java environment for crawling
Consistently met my short and long-term targets
Trained, coached, and supervised new members

Research Engineer

Nuance Communications

06.2018 - 11.2020

Created scrapers for different websites for OTT content
Combining scraped data into a single entity according to genre of the data and assigning the popularity based of rating, wikipedia rating and based on other ranking factors
Merging scraped data from different platforms to form a combined dataset to show popular OTT content at the top
Creating data Pipeline for one our customer Gracenote where we downloaded the data as html content and converted to TSV files with proper data categories
Creating backend files for monitoring Webpages and tsv files to show in charts or any visual demonstration
Reading delimiter separated files using Pandas
And working on the data
Using Apache Spark Mysql library reading tsv files and loading to sql and formatting the data
Upskilling according to project requirement
(Learning Azure to use Azure for scraping and other data downloads)
FTP data downloads and conversions
Mapping data provided by stakeholders and creating a single entry
Requirement gathering from stakeholders and converting data to their requirements
Helping team whenever extra resource is needed to deliver
Maintaining code integrity by using the git repository

Software Engineer

Headrun Technologies

06.2010 - 09.2014

Written scrapers and crawlers with focus on OTT (Over the top) and MVP(Managed Video Platform) websites across internet
The data which we were focused on was entertainment, news, sports live updates, social media like twitter, FB, youtube
We were covering websites across globe, Major areas I worked on is Latin America, Southeast Asian Websites
Have experience in handling different types of websites while scraping: HTML, API, POST data, LOGIN
XML page extraction from XML pages
And creating XML pages
Optimized programs using profiling in python
Extensive experience in Writing Regular expressions to extract data and modify or cleaning the data
Usage of encode and decode while working with Languages other than English
Using Mediawiki FW scraped incremental data from wikipedia which means newly added pages
Using Wikipedia page, I Have written code to extract episodes of tvshows which will be in a particular format compared to other wikipedia pages
Did Image mapping to movies, tvshows and other scraped data
From scrapping various Websites, we get all meta data for a record like movie, tvshows etc., which we will map against the wikipedia images which are maintained as reference
The finest image is chosen across the sources scrapped and wikipedia pages
Client interaction to gather requirements
Led a small team which was working on scraping and other data related works
Have very good experience in maintaining and leading team
Worked on merging scraped data across various websites and creating a single entity with popularity
Used Mysql Database to store the scraped data
Hands on experience with sql queries
Downloaded and uploaded data on s3 Automating the processes and optimizing
Had very good requirement gathering and understanding capabilities
Learnt about owning, leadership quality, team management, guiding the team
Prepared detailed reports concerning project specific specifications and activities

Education

Bachelor of Engineering (Computer Science) -

Dr Ambedkar Institute of Technology

09.2014 - 06.2018

Skills

Linux

Windows

PySpark

Pandas

Numpy

Python

Shell Scripting

Mysql

Github

Gitlab

Scrapy

Azure databricks

Azure data factory

Azure Devops

Timeline

Data Engineer

Microsoft

08.2023 - Current

Senior Research Engineer

Nuance Communications

11.2020 - 07.2023

Research Engineer

Nuance Communications

06.2018 - 11.2020

Bachelor of Engineering (Computer Science) -

Dr Ambedkar Institute of Technology

09.2014 - 06.2018

Software Engineer

Headrun Technologies

06.2010 - 09.2014

Arunashree S

Summary

Overview

Work History

Data Engineer

Senior Research Engineer

Research Engineer

Software Engineer

Education

Bachelor of Engineering (Computer Science) -

Skills

Timeline

Data Engineer

Senior Research Engineer

Research Engineer

Bachelor of Engineering (Computer Science) -

Software Engineer

Similar Profiles

Sougandhika TeraSougandhika Tera

Aman AgarwalAman Agarwal

Yunus HajjajYunus Hajjaj

Harjeet Harjeet null