Accomplished Data Engineer with extensive experience at Titan Company Limited, specializing in the design and optimization of ETL pipelines. Proven proficiency in SQL, Python, and data analysis, resulting in a 15% increase in customer retention through actionable insights. Skilled in stakeholder management, delivering scalable solutions that enhance business success and operational efficiency.
International Business Data Warehouse, Led the end-to-end implementation of a robust data warehouse for an international business, enabling seamless data integration, advanced analytics, and informed decision-making., Collaborated with stakeholders from various business divisions to gather requirements and understand business needs, ensuring the data warehouse design aligned with organizational goals., Developed efficient ETL processes using tools like AWS Glue and IBM DataStage to extract data from multiple sources, transform it to meet analytical needs, and load it into the data warehouse (Redshift)., Integrated data from various systems including POS, CRM, ERP, Marketing, and external market data, ensuring data consistency and accuracy across the business., Designed a scalable and flexible data warehouse and datalake architecture using industry-leading technologies (Amazon Redshift and AWS S3) to support diverse data sources, high query performance and cost effectiveness., Implemented optimization techniques such as indexing, partitioning, and caching to ensure high performance and quick access to critical business data., Established data governance practices including data cataloging, metadata management, and role-based access control to ensure data quality, security, and compliance across global business units., Implemented encryption protocols for sensitive customer data (e.g., mobile numbers and addresses) across the U.S., Singapore, and GCC regions, ensuring compliance with major data protection laws including CCPA/CPRA (U.S.), PDPA (Singapore), and regional GCC privacy frameworks (e.g., UAE PDPL), thereby strengthening data governance and minimizing privacy risk., Developed comprehensive documentation and conducted training sessions for end-users to maximize the adoption and effective use of the new data warehouse. AI-Driven Review Insights Engine, Built an OpenAI-powered NLP pipeline to process Google My Business and other review data, extracting key features, sentiment, and recurring issues., Improved insights accuracy by 30% and reduced manual analysis time by 70%, enabling faster business decision-making. Implementation of CI/CD for ETL Jobs, Contributed significantly to the implementation of Continuous Integration and Continuous Deployment (CI/CD) pipelines for AWS Glue and IBM DataStage., Collaborated with cross-functional teams to identify current challenges and define requirements for the CI/CD pipeline., Acted as a single point of contact (SPOC) to integrate AWS tools such as AWS CodePipeline, AWS CodeBuild, and AWS CodeDeploy to automate build, test and deployment processes for AWS Glue and IBM DataStage., Leveraged AWS CodeCommit for version control, ensuring efficient management of code changes, collaboration, and tracking of development progress., Conducted training sessions and created detailed documentation to ensure the development and operations teams were proficient in using the new CI/CD processes. Metadata & DDL Automation with Schema Validation, Automated metadata and DDL generation for database tables, incorporating an auto-validation mechanism to ensure schema consistency and prevent pipeline failures.
SQL, Python, Apache Spark- PySpark, Apache Airflow, AWS Glue, AWS Lambda, Amazon API Gateway, IBM DataStage, Tableau, Excel, Descriptive & Exploratory Data Analysis, A/B Testing, Amazon Redshift, MS SQL Server, Amazon DynamoDB, Data Modeling, Performance Tuning, Pandas, NumPy, Regression Analysis, Natural Language Processing, AWS CodeCommit, GitHub, AWS CodePipeline, Requirement Gathering, Data Storytelling, Stakeholder Management, Translating business requirements into data solutions