Seeking a Big data developer position utilizing a creative mind and well-tuned skills in developing big data application through the most modern development tools. Dedicated to delivering efficient data processing solutions and contributing to data-driven decision-making.
System Engineer / Data Engineer having 2 years of experience in design, implement and support different big data applications by using Apache Spark, Hadoop, and AWS along with Python. Implemented data transformation and aggregation processes, optimizing query performance and ensuring efficient data loading. Gain experience in creating data processing pipelines and implementing data ingestion pipelines for batch data processing along with debugging and problem solving. Good understanding of Partitioning, Bucketing concepts in Hive, spark and designed both Internal and External tables in Hive to optimize performance. Having extensive experience on Spark SQL, Data Warehousing. Having experience in all stages of the project including requirements gathering, designing & documenting architecture, development, data extraction, and reporting. Exposure to AWS cloud technologies such as EC2, S3, EMR, RDS, Redshift, Athena, Glue etc. Having Strong experience in all Hadoop and Spark ecosystems include HDFS, Map Reduce, Hive, Shell Script, Sqoop, Data Bricks, Spark SQL along with various distributions such as HBase to perform Big Data Analysis. Proficient in programming languages such as Python (Programming Language). Worked on big data structured, non-structured datasets, data cleaning. To sound with Oracle, PostgreSQL, MySQL, NOSQL. Worked on Orchestration, workflow management, Airflow Scheduler, Git.
Data Engineer having hands-on AWS services, to design and implement data pipeline, Ingesting raw data from source, applying transformations, processes data to gain meaningful insights as per the customer needs.
involved in performance tuning and testing of the job.
Design and implement robust data ETL process to seamlessly ingest, transform, and deliver high-quality data for analysis.
Collaborate cross-functionally with stakeholders to translate business needs into efficient data models and actionable insights.
Utilize big data tools and distributed computing frameworks (Spark, Hadoop) and shape data into meaningful information.
Implement automated data quality checks and monitoring processes to ensure data integrity and build trust in insights.
Analyze data to uncover hidden trends and patterns, crafting compelling narratives that resonate with non-technical audiences.
Continuously learn and adapt to stay abreast of the evolving data landscape, mastering new technologies and best practices.
OEC data optimization script
• Performs optimization of dotnet script via rewriting the script in Pyspark code and implementation
• Changes the approach from one-by-one processing to batch process.
• Tools used are Apache Spark, Pyspark, python, SQL, PostgreSQL, pandas, postman collection.
MRC Data PIpeline
The consolidated data is being available to client API service, get downloaded in csv files.
• Developed a semi-automated data pipeline in Pyspark script.
• The ETL/ELT process involves extracting data via API service, loaded into database, perform transformations on it and loaded data to production service.
• Perform validation and Documentation of the project work flow.
• Tools used are Apache Spark, Pyspark, python, SQL, PostgreSQL, pandas, postman collection.
Boodmo data Extraction
• Data Extraction process was developed to extract data from client Boodmo API service, transform it via Pyspark and python coding and loaded it into sqlite database.
• Perform optimization on Pyspark process and perform validation.
• Implement the concept of checkpointing for roll back and automation.
• Tools used are Apache Spark, Pyspark, python, SQL, SQLite db., pandas.
CCDA-CCDA-Credit Card Data Analysis.
• Collaborated with cross-functional teams to understand project requirements and define data analysis goals.
• Developed data ingestion pipelines using Pyspark to extract data from an RDS (such as MySQL or Oracle) and load it into HBase.
• Designed and implemented data processing workflows to transform and clean credit card transaction data in HBase.
• Utilized Hive for structured querying and analysis of the processed data., Conducted data validation and cleansing to ensure data accuracy and consistency.
• Collaborated with Data Scientists to provide them with reliable datasets for advanced analytics and modeling.
• Implemented data security measures and ensured compliance with data privacy regulations.,
• Conducted performance optimization and tuning of the data processing pipelines for enhanced efficiency.,
• Collaborated with stakeholders to understand their data analysis requirements and deliver actionable insights.
• Perform Documentations