I am a highly skilled Data Engineer with 2+ years of experience, with overall 10+ years of work experience specializing in various data technologies including Snowflake, Azure Databricks, Pyspark, and Spark. My expertise lies in data transformation, data ingestion using Snowpipe, SnowSQL, making me proficient in data sharing, analysis, and reporting. I have a strong background in ETL/ELT processes, data Processing, Agile methodologies, and debugging, delivering robust solutions. My deep understanding of Snowflake's core features including Time Travel, Zero copy cloning, Stream & Tasks , star schema and snowflake schema, allows me to design scalable ETL/ELT processes. My problem-solving skills, strong communication, and multitasking abilities enable effective performance tuning, data governance, and translating business needs into technical solutions. Proficient in SQL, Python, SQL Server, Oracle, data warehouse and data pipeline infrastructure, I ensure seamless data processing and data management in the Big Data landscape and have understanding of big data technologies like Hadoop and Spark.
Migration Analytics in Retail Domain
Developed an ETL pipeline to extract data stored in AWS S3 for further processing using PySpark., Data Transformation and Cleaning: Developed PySpark and AWS Glue jobs for efficient data transformation and cleansing of data., Job Monitoring and Maintenance: Actively monitored and maintained Glue jobs, including debugging and resolving issues to ensure job stability and accuracy., Cross-Functional Collaboration: Collaborated closely with Data Scientists, Analysts, and team members to comprehend data requirements, guaranteeing that processed data aligned with their needs., Thorough Documentation: Created comprehensive documentation for PySpark scripts, SQL queries, and data transformations, fostering transparency and facilitating collaboration., SQL Query Execution: Executed SQL queries in Snowflake data warehouse to fulfil client requirements and support business KPIs effectively.
Data Ingestion in Media & Entertainment Domain
Data Retrieval: Fetched API data using Python, applying data cleaning, transformation, and extraction., Cloud Deployment: Utilized Terraform to deploy AWS Lambda, automating data retrieval and processing., Data Integration: Designed Lambda function to fetch, process, and store data in S3 for analysis., Monitoring and Debugging: Set up AWS CloudWatch for real-time monitoring and issue resolution., Data Accessibility: Ensured processed data availability in S3, well organized for downstream use.