Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

KARISHMA SHAIK

Fort Wayne

Summary

  • Seven years of experience in software development life cycle, specializing in system application architecture design, development, and support. Proficient in Hive, Impala, Sqoop, MapReduce, Spark, and Oozie, with a strong foundation in Apache Hadoop and Hortonworks distributed platform. Expertise in developing data pipelines using distributed technologies and tools, alongside experience in installing and upgrading Hortonworks HDP clusters. Familiar with Snowflake and skilled in Test-driven development and Agile methodologies, including Scrum.

Overview

1
1
Certification

Work History

Program Analyst

Alkes Technology Inc
Fremont
  • Analyzed program data to identify trends that informed decision-making processes.
  • Researched industry best practices to recommend enhancements in project management workflows.
  • Managed documentation and maintained compliance records according to organizational standards.
  • Assisted in developing new program initiatives by creating detailed project plans, timelines, and budgets.

Data Engineer

Cargill
Hopkins
  • Gathered requirements by engaging with end clients to understand system objectives and needs.
  • Utilized Impala for efficient query processing in data handling tasks.
  • Performed data transformations and actions with Spark to enhance performance.
  • Developed comprehensive unit tests through creation of detailed test cases.
  • Created shell scripts for automated end-to-end data management processes.
  • Applied Spark and Scala for extensive data analysis and transformation activities.
  • Wrote complex workflows in Airflow to streamline operational processes.

Environment included Hadoop, MapReduce, Impala, Hive QL, Oracle, HDFS, Hive, Sqoop, Cloudera Hadoop Distribution, Streamsets, Spark, and Scala.

Data Engineer

Point 72
  • Constructed Snowflake Data Warehouse on AWS, including dimensions and fact tables.
  • Developed external and internal staging tables for processing transaction data from source systems.
  • Created optimized Snowflake tables to enhance query performance using best practices.
  • Designed views in Snowflake for aggregated and summarized data access.
  • Established data pipelines on AWS EMR utilizing Scala, Spark DataFrame, and Spark SQL.
  • Developed Python scripts to extract data from Oracle Database into CSV format and load into AWS S3.
  • Transformed data in external stage S3 using business-specific rules before loading into Snowflake.
  • Implemented utility scripts with AWS CLI for streamlined interaction with AWS services.
  • Environment: Snowflake, DBeaver, IntelliJ, PyCharm, Scala, Python, AWS EMR, S3, Athena, IAM, AWS CLI, Boto3, Oracle, UNIX, Snowflake, ActiveBatch, Bamboo, Bitbucket.

Big Data Developer

Comcast Cable Company
  • Developed and productionalized ETL pipelines using Sqoop and Hive/Tez.
  • Constructed datasets from customer remote clicks data for data science team utilization.
  • Leveraged Hortonworks Data Platform components to analyze and build data models.
  • Employed Stinger initiatives including Tez, ORC, and Vectorization in Hive 0.13 for optimized data pipelines.
  • Collaborated with data science teams to create suitable datasets for R and SAS modeling.
  • Conducted profiling of various Hive tables to ensure data quality.
  • Executed result set exports from Hive to MySQL using Sqoop export tool for further processing.
  • Created Hive tables, loaded data, and wrote hive queries executed internally in MapReduce/Tez.
  • Environment: RHEL 6, Centos 6, Hortonworks Data Platform (2.2 and 2.3), Apache Hadoop 2.6, Apache
    YARN 2.6, Apache Hive 0.13, Apache Spark 1.6.X, Hbase, Kerberos, Ansible, Jenkins

Sr. Data Engineer

UPS
  • Designed logical and physical data models for multiple sources in Redshift.
  • Created Hive schemas utilizing partitioning and bucketing for enhanced performance.
  • Developed analytical components leveraging Kafka and Spark Streaming.
  • Implemented Spark applications to load data into Athena tables efficiently.
  • Configured AWS Data Pipeline for seamless data transfers from S3 to Redshift.
  • Utilized JSON schema for precise table and column mapping from S3 to Redshift.
  • Developed PySpark scripts to establish effective data pipelines.
  • Wrote Spark code using Scala and Spark-SQL/Streaming to accelerate data processing.
  • Environment: Hadoop 3.0, Scala, Airflow, Python 2.7, Apache Spark 2.3, Apache Kafka, S3, Athena, Red Shift. Redshift, AWS Data Pipeline

Education

Master of Science - Information Science

Indiana Institute of Technology
Fort Wayne, IN
12-2024

Krishnaveni Engineering College For Women
India

Skills

- Data Tools: Excel (PivotTables, Charts, Functions), Power BI (basic), Tableau (intro)
- Programming Languages: Python, SQL, C
- Database Management: MySQL
- Data Analytics: Exploratory Data Analysis (EDA), Data Cleaning, Data Validation
- Soft Skills: Problem Solving, Communication, Team Collaboration, Detail-Oriented

Certification

  • Python certification

Timeline

Program Analyst

Alkes Technology Inc

Data Engineer

Cargill

Data Engineer

Point 72

Big Data Developer

Comcast Cable Company

Sr. Data Engineer

UPS

Master of Science - Information Science

Indiana Institute of Technology

Krishnaveni Engineering College For Women
KARISHMA SHAIK