Summary

Overview

Work History

Education

Skills

Timeline

Syed Abdul Kather Shahul Hameed

Bangalore,KA

Summary

As a data professional, I possess a profound comprehension of data-related challenges spanning both batch and real-time processing. My expertise lies in the adept design and construction of resilient data pipelines capable of efficiently managing large volumes and high-velocity data streams.

Overview

years of professional experience

Work History

Data Architect

M42

03.2024 - Current

Designing and implementing a scalable data platform for life sciences, integrating genomic and health information exchange (HIE) data for UAE, ensuring compliance with relevant regulations (e.g., HIPAA, GDPR)
Provided and managed a secure Trusted Research Environment (TRE), enabling researchers to access and analyze sensitive genomic and HIE data while adhering to strict data governance and security protocols

Principle Data Engineer

NOON

01.2020 - 09.2023

Company Overview: THE SOCIAL LEARNING PLATFORM
Created the Data platform from scratch, overseeing its operations with high reliability and minimal expenses (40K per quarter)
Implemented data ingestion using Kafka Connect, Sqoop, and a custom reconciler to collect data from diverse sources
Developed, deployed, and supervised real-time computations with Flink on EMR and batch computations with Spark on EMR
Designed a Custom SDK to optimize the onboarding process of Airflow as the Scheduler
Introduced Proto for event standardization across the organization, ensuring consistency for backend and client applications
Constructed the Entity/Feature Store for enhanced real-time event enrichment capabilities

Big Data Lead

TATHASTU

12.2018 - 01.2020

Maintained and managed in-house Cloudera Hadoop Cluster
Designed a generic and extensible ingestion platform capable of handling both streams (binlog) and batch (JDBC) data
Utilized Kafka Connect and Apache Hudi, with modifications made to core components to support Schema Registry
Deployed the platform on a Kubernetes cluster
Developed data pipelines to collect, cleanse, and process data from multiple sources
Used data visualization to present findings to internal stakeholders

Senior Software Engineer III (Data)

OLA CABS

09.2014 - 12.2018

Created a flexible and customizable workflow engine capable of handling both stream and batch computing, featuring event-triggered sliding/tumbling windows, ML model execution (PMML), delay queue, custom HTTP actions, and more
Efficiently managed a scale of approximately 100+ workflows, including 40 streams, 85 batches, and 4 APIs, processing around 100 million messages daily on the platform
Successfully reduced known fraudulent transactions from 10% to less than 1%, resulting in substantial savings of approximately 10L INR per day over two years and around 50L INR per day since launch
Developed generic solutions for various problem domains, including Generic Flexible penalization, Customer Scoring, Device Scoring, Centralized actioning, ARC (Automatic Rule builder), Fraud Life cycle management, and more
Developed a DSL-based polymorphic data service for serving data in a declarative manner
The service is horizontally scalable and currently handles a workload of approximately 2K transactions per second (TPS)
Demonstrated the ability to achieve a data enrichment rate of up to 8K TPS without negatively impacting service metrics

Associate Engineer

COGNIZANT

02.2013 - 09.2014

Build and deployed the ETL pipelines to serve In IDEA, we were consuming data(binary) from the mainframe to queryable format
Managed the Hadoop cluster (12 nodes and 8 nodes) in TFS & point cross
Kerberos Integration with Hadoop Cluster
Implemented and maintained monitoring and logging systems
Participated in code reviews to ensure adherence to best practices and standards

Senior Software Engineer II

POINTCROSS PVT

03.2011 - 02.2013

Enterprise Technical Search allows search, navigation, and discovery, with security and fine-grained authorisation access, across text and data
Generated user permission sequence file to map between the document index of Solr and Orchestra objects and Built an 'Object map' in the HBase table using MapReduce
Provide Auth-Based access to objects indexed in Solr
Viewing nonclinical study data from your in-house laboratories is difficult and getting disparate domains of data into a single viewer and making assessments of the data at the subject level or the treatment group level is a challenge
Building a search layer on top of this
This mainly helps the scientist to create a model on the top of our search layer

Education

Master - Computer Application

ANNA UNIVERSITY

01.2010

Bachelor - computer science

KAMARAJ UNIVERSITY

01.2006

Skills

Apache Hadoop
Apache Spark
Apache Kafka
Apache Cassandra
Apache Hive
Apache Flink

Data Governance
Big Data
ML Ops
Data Warehousing
Kubernetes
Airflow

Timeline

Data Architect

M42

03.2024 - Current

Principle Data Engineer

NOON

01.2020 - 09.2023

Big Data Lead

TATHASTU

12.2018 - 01.2020

Senior Software Engineer III (Data)

OLA CABS

09.2014 - 12.2018

Associate Engineer

COGNIZANT

02.2013 - 09.2014

Senior Software Engineer II

POINTCROSS PVT

03.2011 - 02.2013

Bachelor - computer science

KAMARAJ UNIVERSITY

Master - Computer Application

ANNA UNIVERSITY

Syed Abdul Kather Shahul Hameed

Summary

Overview

Work History

Data Architect

Principle Data Engineer

Big Data Lead

Senior Software Engineer III (Data)

Associate Engineer

Senior Software Engineer II

Education

Master - Computer Application

Bachelor - computer science

Skills

Timeline

Data Architect

Principle Data Engineer

Big Data Lead

Senior Software Engineer III (Data)

Associate Engineer

Senior Software Engineer II

Bachelor - computer science

Master - Computer Application

Similar Profiles

Bentley Cunningham ScottBentley Cunningham Scott

Noura Nasser AljneibiNoura Nasser Aljneibi

Heba ElbakhHeba Elbakh

Irshad UmmerIrshad Ummer

Soubhagya MishraSoubhagya Mishra