Summary
Overview
Work History
Education
Skills
Certification
Interests
Timeline
Generic

MD Arif

Business Intelligence Engineer | Data Analyst | SQL | ETL | Python
Dallas,Texas

Summary

  • Data Engineering professional with solid foundational skills and proven tracks of implementation in a variety of data platforms. Self-motivated with a strong adherence to personal accountability in both individual and team scenarios.
  • Experience in Data Analysis, Data Profiling, Data Integration, Migration, Data governance and Metadata Management, Master Data Management and Configuration Management.

Overview

4
4
years of professional experience
6
6
years of post-secondary education
1
1
Certification

Work History

Graduate Teaching Assistant

Texas A & M University Commerce
Commerce, Texas
01.2021 - Current
  • Checked assignments, proctored tests and provided grades according to university standards.
  • Documented attendance and completed assignments to maintain full class and student records.
  • Taught ETL and SQL college-level courses for over 50 students.
  • Oversaw classes of up to 30 students in Business Intelligence Course.
  • Prepared lessons according to course outline to convey required material and deepen student understanding of subject matter.
  • Led courses independently with minimal oversight from professors.
  • Utilized GitHub and Docker for the runtime environment for the CI/CD system to build, test, and deploy.
  • Consumed the data from Kafka sources and implemented analysis model.

Data Engineer

Cognizant Technology Solution
Hyderabad , Telangana
01.2019 - 11.2020
  • Designed compliance frameworks for multi-site data warehousing efforts to verify conformity with state and federal data security guidelines.
  • Generated detailed studies on potential third-party data handling solutions, verifying compliance with internal needs and stakeholder requirements.
  • Developed, implemented and maintained data analytics protocols, standards and documentation.
  • Analyzed complex data and identified anomalies, trends and risks to provide useful insights to improve internal controls.
  • Designed and built data processing pipelines tools and framework in the Hadoop Ecosystem
  • Worked on interactive guided analytics apps and dashboards, the ER/Dimensional model was implemented in Tableau.
  • Constructed product-usage SDK data and Siebel data aggregations by using PYSPARK, Scala, Spark SQL.
  • Developed ETL tool to load the data from a given source to target using python, PySpark, Sqoop, Unix and hive.
  • Participated in requirements gathering and worked closely with the architect and SME’s in designing and modeling.
  • Handled data ingestions from various data sources, performed transformations using spark, and loaded data into HDFS.
  • Hive context in partitioned Hive external tables maintained in AWS S3 location for reporting, data science dash boarding and ad-hoc analyses.
  • Worked on API development for client apps to query for the current version.
  • Translated a set of requirements and data into a usable database schema by creating or recreating ad hoc queries, scripts and macros, updates existing queries, creates new ones to manipulate data into a master file
  • Experience in dealing with distributed computing using Hadoop and applying various Machine Learning techniques in solving various data related challenges.
  • Worked on applications for cloud readiness changes, Implemented Liquibase application changes.

Data Analyst

Minevesta Infotech
Hyderabad, Telangana
03.2018 - 01.2019
  • Identified and documented detailed business rules and use cases based on requirements analysis.
  • Researched and resolved issues regarding integrity of data flow into databases.
  • Identified, analyzed and interpreted trends or patterns in complex data sets.
  • Analyzed transactions to build logical business intelligence model for real-time reporting needs.
  • Build data pipelines using Hive and Apache Spark to calculate core Marketing metrics
  • Source the data from multiple places to Hadoop cluster
  • Created Power BI reports for different metrics from hive and Big Query as the source data
  • Collect, clean, transform and load user Clickstream data and make it available for downstream pipelines and analyses
  • Create Dataflow pipelines using Spark-Scala
  • Migration of existing Teradata and hive queries to Google Cloud Platform
  • Creating aggregate tables in the data pipeline that will be used by the reporting team to project the metrics to the business users.
  • Schedule the data pipelines in a scheduler like UC4 as per the requirement like weekly, daily.
  • Migration to Google Cloud Platform from traditional Hadoop cluster
  • Ingested the data into data lake from different sources and performed various transformations like sort, join, aggregations, filter to process various datasets.
  • Automated data flow between the software systems using Apache Airflow.
  • Created ETL jobs using Spark to perform data migrations and data loads into HDFS, Hive from different source systems.
  • Implemented Spark jobs for data preprocessing, validation, normalization, and transmission.
  • Configured multiple Spark jobs to obtain efficient run time.

Education

Master of Science - Computer Science And Programming

Texas A & M University
Texas
07.2020 - 07.2022

Bachelor of Engineering - Computer Science

Deccan College of Engineering And Technology
India
06.2009 - 06.2013

Skills

Python, SQL, PL/SQL, Scala

MongoDB, Amazon DynamoDB, HBase

Hadoop, HDFS, Hive, Spark, PySpark, Sqoop, Kafka

Oracle, DB2, Teradata, SQL Server

AWS Glue, Azure Data Factory, GCP, Airflow, Spark, Sqoop, Flume, Apache Kafka, Spark Streaming

Jira, Rally

BitBucket, Git, GitHub

AWS EC2, S3, Lambda, EMR, GCP Big-Query

Certification

Careerara: Data Science Professional Certificate

Interests

Reading Blogs

Writing Blogs

Travelling

Timeline

Careerara: Data Science Professional Certificate

07-2022

Graduate Teaching Assistant

Texas A & M University Commerce
01.2021 - Current

Master of Science - Computer Science And Programming

Texas A & M University
07.2020 - 07.2022

Data Engineer

Cognizant Technology Solution
01.2019 - 11.2020

Data Analyst

Minevesta Infotech
03.2018 - 01.2019

Bachelor of Engineering - Computer Science

Deccan College of Engineering And Technology
06.2009 - 06.2013
MD ArifBusiness Intelligence Engineer | Data Analyst | SQL | ETL | Python