Summary
Overview
Work History
Education
Skills
Timeline
Generic
Syed Abdul Kather Shahul Hameed

Syed Abdul Kather Shahul Hameed

Bangalore,KA

Summary

As a data professional, I possess a profound comprehension of data-related challenges spanning both batch and real-time processing. My expertise lies in the adept design and construction of resilient data pipelines capable of efficiently managing large volumes and high-velocity data streams.

Overview

15
15
years of professional experience

Work History

Data Architect

M42
03.2024 - Current
  • Designing and implementing a scalable data platform for life sciences, integrating genomic and health information exchange (HIE) data for UAE, ensuring compliance with relevant regulations (e.g., HIPAA, GDPR)
  • Provided and managed a secure Trusted Research Environment (TRE), enabling researchers to access and analyze sensitive genomic and HIE data while adhering to strict data governance and security protocols

Principle Data Engineer

NOON
01.2020 - 09.2023
  • Company Overview: THE SOCIAL LEARNING PLATFORM
  • Created the Data platform from scratch, overseeing its operations with high reliability and minimal expenses (40K per quarter)
  • Implemented data ingestion using Kafka Connect, Sqoop, and a custom reconciler to collect data from diverse sources
  • Developed, deployed, and supervised real-time computations with Flink on EMR and batch computations with Spark on EMR
  • Designed a Custom SDK to optimize the onboarding process of Airflow as the Scheduler
  • Introduced Proto for event standardization across the organization, ensuring consistency for backend and client applications
  • Constructed the Entity/Feature Store for enhanced real-time event enrichment capabilities

Big Data Lead

TATHASTU
12.2018 - 01.2020
  • Maintained and managed in-house Cloudera Hadoop Cluster
  • Designed a generic and extensible ingestion platform capable of handling both streams (binlog) and batch (JDBC) data
  • Utilized Kafka Connect and Apache Hudi, with modifications made to core components to support Schema Registry
  • Deployed the platform on a Kubernetes cluster
  • Developed data pipelines to collect, cleanse, and process data from multiple sources
  • Used data visualization to present findings to internal stakeholders

Senior Software Engineer III (Data)

OLA CABS
09.2014 - 12.2018
  • Created a flexible and customizable workflow engine capable of handling both stream and batch computing, featuring event-triggered sliding/tumbling windows, ML model execution (PMML), delay queue, custom HTTP actions, and more
  • Efficiently managed a scale of approximately 100+ workflows, including 40 streams, 85 batches, and 4 APIs, processing around 100 million messages daily on the platform
  • Successfully reduced known fraudulent transactions from 10% to less than 1%, resulting in substantial savings of approximately 10L INR per day over two years and around 50L INR per day since launch
  • Developed generic solutions for various problem domains, including Generic Flexible penalization, Customer Scoring, Device Scoring, Centralized actioning, ARC (Automatic Rule builder), Fraud Life cycle management, and more
  • Developed a DSL-based polymorphic data service for serving data in a declarative manner
  • The service is horizontally scalable and currently handles a workload of approximately 2K transactions per second (TPS)
  • Demonstrated the ability to achieve a data enrichment rate of up to 8K TPS without negatively impacting service metrics

Associate Engineer

COGNIZANT
02.2013 - 09.2014
  • Build and deployed the ETL pipelines to serve In IDEA, we were consuming data(binary) from the mainframe to queryable format
  • Managed the Hadoop cluster (12 nodes and 8 nodes) in TFS & point cross
  • Kerberos Integration with Hadoop Cluster
  • Implemented and maintained monitoring and logging systems
  • Participated in code reviews to ensure adherence to best practices and standards

Senior Software Engineer II

POINTCROSS PVT
03.2011 - 02.2013
  • Enterprise Technical Search allows search, navigation, and discovery, with security and fine-grained authorisation access, across text and data
  • Generated user permission sequence file to map between the document index of Solr and Orchestra objects and Built an 'Object map' in the HBase table using MapReduce
  • Provide Auth-Based access to objects indexed in Solr
  • Viewing nonclinical study data from your in-house laboratories is difficult and getting disparate domains of data into a single viewer and making assessments of the data at the subject level or the treatment group level is a challenge
  • Building a search layer on top of this
  • This mainly helps the scientist to create a model on the top of our search layer

Education

Master - Computer Application

ANNA UNIVERSITY
01.2010

Bachelor - computer science

KAMARAJ UNIVERSITY
01.2006

Skills

  • Apache Hadoop
  • Apache Spark
  • Apache Kafka
  • Apache Cassandra
  • Apache Hive
  • Apache Flink
  • Data Governance
  • Big Data
  • ML Ops
  • Data Warehousing
  • Kubernetes
  • Airflow

Timeline

Data Architect

M42
03.2024 - Current

Principle Data Engineer

NOON
01.2020 - 09.2023

Big Data Lead

TATHASTU
12.2018 - 01.2020

Senior Software Engineer III (Data)

OLA CABS
09.2014 - 12.2018

Associate Engineer

COGNIZANT
02.2013 - 09.2014

Senior Software Engineer II

POINTCROSS PVT
03.2011 - 02.2013

Bachelor - computer science

KAMARAJ UNIVERSITY

Master - Computer Application

ANNA UNIVERSITY
Syed Abdul Kather Shahul Hameed