Results-driven Data Engineer with experience at Wipro Limited, specializing in AWS and Glue for seamless data migration and ETL processes. Proficient in Spark and SQL, I excel in optimizing data workflows and collaborating effectively with stakeholders to meet business requirements. Committed to delivering high-quality solutions in fast-paced environments.
Project Name: Sales Data Analytics
Project Role: Data Engineer.
Client: Nike
Environment: AWS – S3.| Snowflake | DBT | SQL | JINJA | GitHub | Scrum.
Description: The use case is to copy data from AWS S3 to Snowflake’s Cloud Data Warehouse and perform ETL operations on top of it using DBT for analyzing sales data.
Responsibilities:
Loaded data from AWS S3 into Snowflake, ensuring accuracy and efficiency.
Executed ETL operations by integrating Snowflake and GitHub through DBT.
Created complex sub-queries, joins, and views using Jinja and SQL in Snowflake environment.
Designed models, macros, and tests, leveraging packages and seeds to boost query performance.
Conducted data modeling by refining data types and resolving duplicates and null entries.
Transformed and loaded semi-structured data formats into AWS S3 aligned with client specifications.
Assessed business requirements to ensure alignment with project outcomes.
Project: Data migration from legacy to modern Big Data Platform.
Project Role: Data Engineer
Client : Adidas
Environment: AWS S3 | Pyspark | SparkSQL | Databricks | MySQL | Postgres SQL | GIT| JIRA.
Description: The use case is to migrate data from on-premises to the AWS cloud, and on top of that, the business requirement is to perform ETL operations using Databricks on data available in S3 and to store the transformed data in Redshift according to their business requirements.
Roles and Responsibilities: Developed ETL processes integrating AWS S3, Databricks, and Redshift for seamless data migration.
Automated data extraction from client server to AWS S3 using Airflow for efficiency.
Extracted data through API, ensuring accurate import of raw datasets into AWS S3.
Implemented optimization techniques like data serialization and predicate pushdown for effective transformations.
Created DAGs for query optimization based on event-driven triggers or CDC.
Established event-based and scheduled triggers for continuous data migration from S3 to Redshift.
Conducted analysis of datasets in Redshift utilizing Redshift Query Editor for insights.
Participated in business requirements gatherings to align data transformation with client needs.
Client: Mercedes Benz
Project Role: Data Engineer
Environment: Azure Data Lake Storage, Azure Data Factory, Azure Storage, and on-premises data sources.
Description: Migrate large volumes of structured and unstructured data from on-premises data sources to Azure Data Lake Storage for centralized data management and analytics.
Roles and Responsibilities:
· Configure and manage data movement activities, such as copy data tasks and data flows.
· Integrate with on-premises data sources and handle various data formats (structured, semi-structured, and unstructured).
· Design and implement data transformation logic using Azure Data Factory's mapping data flows or Spark activities.
· Optimize data processing workflows for performance and efficiency.
· Integrate Azure Data Lake Storage with other Azure services for data processing and analysis.
· Implement data integration patterns with Azure services like Azure Databricks, Azure Synapse Analytics.
· Collaborate with data analysts or data scientists to facilitate data access and consumption.
· Implement techniques like partitioning, compression, or parallel copying to improve transfer speeds.
· Continuously improve and refactor existing data ingestion pipelines for better performance, maintainability, and scalability.
I, Sandra Pradeepa, hereby declare that the above information and particulars furnished are true and correct to the best of my personal knowledge and belief.