With over 18 years of experience as a certified AWS Data Engineer, I specialize in data management, integration, and Big data infrastructure. I have a proven track record in guiding successful AWS Cloud implementation and migration projects. Proficient in both Microsoft Azure and AWS cloud technologies, I excel in setting up EMR and serverless computing for cloud-based big data environments. My expertise extends to strong data integration skills with RDBMS and NoSQL sources, including Hive, HBase, and Sqoop. I possess a comprehensive understanding of MapReduce and Hadoop. Additionally, I have built data intake pipelines and leveraged PySpark for data science models. Adept at effective communication, I deliver compelling commercial value to stakeholders.Highly skilled Data Engineer with specializing in designing, building, and maintaining robust data pipelines and architectures on the Databricks platform.Expertise in leveraging Databricks to process large-scale datasets, develop and deploy machine learning models, and create real-time analytics solutions.Proven ability to optimize data processing performance, ensure data quality, and implement data governance practices.
Client-USPTO
● Designed and oversaw data pipelines utilizing AWS Glue, S3, and Redshift.
● Conducted analytical operations on health insurance data, extracting valuable insights.
● Orchestrated a data pipeline using AWS Glue to amalgamate data from diverse sources.
● Analyzing health insurance data, I seamlessly integrated Snowflake into the existing infrastructure, enhancing data
processing capabilities.
● Imported files from an on-premises SFTP server to an S3 bucket and coordinated the data pipeline through an AWS Glue
workflow.
● Engineered and deployed lambda-based invocations to trigger Glue workflows.
● Implemented data validation processes and audit-level jobs to ensure impeccable data integrity.
● Maintained all code repositories in AWS Code Commit.
● Executed seamless code migrations from the DEV environment to STAGE and PROD using AWS Code Pipeline.
● design and implementation of a scalable data lakehouse on Databricks for business purpose.
● Monitored and optimized Databricks cluster performance, ensuring [cost savings and resource utilization.
● Developed efficient Spark pipelines for specific data processing tasks, achieving quantifiable results.
● Implemented Delta Lake for [data management and performance improvement
Cloud/Data platform Lead for Envision Healthcare
Highlights: Setting up data lake on AWS for VBC product. Setting up Redshift enterprise-wide as a data warehouse SME for Envision
Key Result Areas -
● Created data lake on AWS using S3, Lambda, Glue, Athena Databricks, and Quick sight.
● Migrated On-premises Oracle database objects and data to Redshift.
● Created data models and injection pipelines loading redshift Schemas from RDBMS, DB2.
● Created Redshift Models and ELT pipelines using DBT and data quality architecture.
Project: Data Lake Implementation
● Designed and implemented a data lake leveraging AWS Glue, S3, Athena, with Python & Spark for coding.
● Established a seamless connection with the SFTP server through a Glue job for downloading and processing files.
● Deployed Docker images to the AWS Elastic Container Registry (ECR) repository to fulfill specific requirements.
● Configured the AWS Batch environment to execute all jobs stored in Code Commit.
● Engineered an AWS Step Function definition for parallel execution of batch jobs.
● Implemented and configured HP Diagnostics for application monitoring and critical client application call stack analysis.
● Monitored system-level metrics for various virtual server implementations using HP SiteScope and vSphere.
● Reviewed and analyzed Performance Testing deliverables for Performance Testing Projects.
● Developed a test harness tool to monitor and regulate the MQ and EMS queues for optimal performance.
Project: Environmental Social Governance Client: GBS
● Utilized Python for web scraping to extract data from 1500 companies on unpri.org/signatories.
● Assisted in transitioning processing from a NoSQL MongoDB architecture to a cloud-based solution.
● Established a distributed computing infrastructure leveraging Apache SPARK.
Project: Blue Shield Florida
● Enabled a data science team with a view of all available datasets by creating a Data Lake based on Hive and Hadoop.
● Created DW data models using Hive (ORC, AVRO) and partitioning and bucketing for high performance.
Project: Pacific Gas Energy
● Created mapR cluster on AWS EC2 instances and installed Hive, Spark, Pig, and Sqoop.
● Transposed data from Aladdin output files to Eagle.
● Created a uniform SPARK layer to process data from different sources including Hive, MongoDB, and Splunk.
Project: Fitch Ratings
● Migrated data from Oracle to MongoDB collections.
● Built SPARK cluster to process data and integrated MongoDB with Hadoop.
● Automated migrated Mongo-DB cluster to AWS using Chef and cloud-formation.
● Created web services using Data Services Informatica.
Project: Investment Derivatives and Front Office and Support
● Designed and optimized Informatica mappings for data processing.
● Created reports and reconciliation tools using Tableau.
● Developed .Net web services for data access.
Project: Trading Portfolio Management System
● Implemented ETL strategies and data flows using Informatica.
● Developed reconciliation between Eagle and Aladdin systems.
● Supported .Net applications and Tableau reports.
Databricks:
Dataiku:
Installation, configuration and maintenance of Dataiku application on AWS servers.
AWS environment management for patching, security audits, compliance checks and IAM audits, network configurations.
Creation and management of project on a Kubernetes environment.
Configuration of Dataiku with Kubernetes and resource optimization for offloading machine learning computations to Kubernetes.
Dataiku application health and lifecycle management.
Implementing automations using python and Linux scripts.
Use of Dataiku Fleet Manager and ansible scripting to manage lifecycle of Dataiku application.
Managing incidents and issues, focusing on effective communication and follow up with stakeholders and other technical teams
Analyzing user requirement, request fulfillment, incident resolution.
Maintaining documentation and learning resources
Minimum Dataiku Core Designer certified, preferable Dataiku DSS Administration certified.
Good working experience with AWS Cloud Services.
You have experience of engineering applications to run on Linux Servers within an enterprise environment including integration with security infrastructure.
Some experience in delivering AI solutions that create business value.
Strong sense of ownership; prioritizing outcome over output.
Strong intrinsic motivation to learn new skills and concepts.
Ability to collaborate and align with your team rather than deliver on your own.
Proven experience with Python, Shell scripting,Git, Linux, Hadoop, Kubernetes, AWS services.
Good communication skills in English and understand Scrum / Agile / DevOps way of working.
AWS Cloud (Compute, Storage, Analytics)