Summary
Overview
Work History
Education
Skills
Timeline
Generic
Arunashree S

Arunashree S

Data Engineer
Bengaluru

Summary

Python Data Engineer with a proven track record with over 10 years of experience in designing, developing, and optimizing data pipelines and ETL processes. Proficient in leveraging Python and SQL for data manipulation, and experienced in working with big data technologies such as Apache Spark and cloud platforms like Azure. Proven track record of implementing robust data solutions that enhance data accessibility and insights across organizations. Strong analytical skills with a focus on improving data quality and integrity. Adept at collaborating with cross-functional teams to gather requirements and deliver data-driven solutions that support business objectives.

Overview

14
14
years of professional experience
4
4
years of post-secondary education

Work History

Data Engineer

Microsoft
08.2023 - Current
  • Building and optimizing data pipelines in Azure environments. Expertise in designing, developing, and deploying Azure Data Factory solutions to facilitate seamless data integration and transformation.
  • Using Azure DevOps for code reviews is an integral part of ensuring high-quality, maintainable, and collaborative software development.
  • SCOPE is used as query language which is a Microsoft query language used primarily within the internal data processing systems.
  • Enhanced data quality by performing thorough cleaning, validation, and transformation tasks.
  • Hands-on experience in implementing ETL processes and automating data workflows.
  • Streamlined complex workflows by breaking them down into manageable components for easier implementation and maintenance.
  • Collaborated with cross-functional teams for seamless integration of data sources into the company's data ecosystem.
  • Managed identification, protection and use of data assets.
  • Optimized data processing by implementing efficient ETL pipelines and streamlining database design.

Senior Research Engineer

Nuance Communications
11.2020 - 07.2023
  • Data set release for the audio files in .nwv, .stm formats
  • Creating mapped tsv files with chats and all information regarding the audio text
  • Migrated whole scraping project to Azure cloud
  • Creation of Virtual machines and running the tasks on Azure
  • Docker image creation, Uploading image to Azure container registry
  • Learning Azure Data factory and Azure Databricks
  • Hands on experience with Azure devops
  • Shell scripting for parallel processing to minimize the time taken by an individual task
  • Mapping the entertainment related data with meta information regarding a particular program title
  • Pyspark usage to handle large set of data to combine, map and manipulate the data
  • Created nutch with Java environment for crawling
  • Consistently met my short and long-term targets
  • Trained, coached, and supervised new members

Research Engineer

Nuance Communications
06.2018 - 11.2020
  • Created scrapers for different websites for OTT content
  • Combining scraped data into a single entity according to genre of the data and assigning the popularity based of rating, wikipedia rating and based on other ranking factors
  • Merging scraped data from different platforms to form a combined dataset to show popular OTT content at the top
  • Creating data Pipeline for one our customer Gracenote where we downloaded the data as html content and converted to TSV files with proper data categories
  • Creating backend files for monitoring Webpages and tsv files to show in charts or any visual demonstration
  • Reading delimiter separated files using Pandas
  • And working on the data
  • Using Apache Spark Mysql library reading tsv files and loading to sql and formatting the data
  • Upskilling according to project requirement
  • (Learning Azure to use Azure for scraping and other data downloads)
  • FTP data downloads and conversions
  • Mapping data provided by stakeholders and creating a single entry
  • Requirement gathering from stakeholders and converting data to their requirements
  • Helping team whenever extra resource is needed to deliver
  • Maintaining code integrity by using the git repository

Software Engineer

Headrun Technologies
06.2010 - 09.2014
  • Written scrapers and crawlers with focus on OTT (Over the top) and MVP(Managed Video Platform) websites across internet
  • The data which we were focused on was entertainment, news, sports live updates, social media like twitter, FB, youtube
  • We were covering websites across globe, Major areas I worked on is Latin America, Southeast Asian Websites
  • Have experience in handling different types of websites while scraping: HTML, API, POST data, LOGIN
  • XML page extraction from XML pages
  • And creating XML pages
  • Optimized programs using profiling in python
  • Extensive experience in Writing Regular expressions to extract data and modify or cleaning the data
  • Usage of encode and decode while working with Languages other than English
  • Using Mediawiki FW scraped incremental data from wikipedia which means newly added pages
  • Using Wikipedia page, I Have written code to extract episodes of tvshows which will be in a particular format compared to other wikipedia pages
  • Did Image mapping to movies, tvshows and other scraped data
  • From scrapping various Websites, we get all meta data for a record like movie, tvshows etc., which we will map against the wikipedia images which are maintained as reference
  • The finest image is chosen across the sources scrapped and wikipedia pages
  • Client interaction to gather requirements
  • Led a small team which was working on scraping and other data related works
  • Have very good experience in maintaining and leading team
  • Worked on merging scraped data across various websites and creating a single entity with popularity
  • Used Mysql Database to store the scraped data
  • Hands on experience with sql queries
  • Downloaded and uploaded data on s3 Automating the processes and optimizing
  • Had very good requirement gathering and understanding capabilities
  • Learnt about owning, leadership quality, team management, guiding the team
  • Prepared detailed reports concerning project specific specifications and activities

Education

Bachelor of Engineering (Computer Science) -

Dr Ambedkar Institute of Technology
09.2014 - 06.2018

Skills

Linux

Timeline

Data Engineer

Microsoft
08.2023 - Current

Senior Research Engineer

Nuance Communications
11.2020 - 07.2023

Research Engineer

Nuance Communications
06.2018 - 11.2020

Bachelor of Engineering (Computer Science) -

Dr Ambedkar Institute of Technology
09.2014 - 06.2018

Software Engineer

Headrun Technologies
06.2010 - 09.2014
Arunashree SData Engineer