Projects
Product Analytics/Data foundation
2020 - current
The project involves building streaming ingestion pipelines on Google cloud to support real-time ingestion of viewing and health metrics events generated by the STB (Set top box) devices.
Technologies used - Google Cloud Pub/Sub, Google Cloud Dataflow, Google Cloud Functions, Bigtable, Google Cloud Storage, BiqQuery, Cloud monitoring and logging, Java 8
Google Cloud Migration
2018 - 2020
The project involved migrating an on-premise Netezza data warehouse, on-premise Hadoop workloads and Redshift datawarehouse to GCP.
Technologies used - Google Cloud Storage, BigQuery, Cloud Dataflow, Cloud Composer, Google Kubernetes Engine, Google Compute Engine, Google Secret Manager, Google Stackdriver, Cloud Source Repositories, Cloud Build, Cloud Functions, Cloud SQL, Python 2.7, Java 8
Enterprise Data Platform
2016 - 2018
The project involved building an enterprise data platform (EDP) based on Hadoop which would consolidate the diverse data ecosystem and act as a strategic data platform supporting BI, reporting and advanced analytics needs. The data platform was designed based on a layered data architecture model.
Technologies used – Hortonworks Data Platform, Hive, Sqoop, Spark 2.x, HDFS, YARN, Storm 1.1, HBase 1.0, Kafka, Kafka Connect, Java 1.6, Python 2.7
Hadoop Platform Security
2015
The project involved implementing security controls on Hadoop cluster to support secure storage and processing of the PII data as part of a recommendation engine. The solution implemented used Hadoop’s built-in security features such as Kerberos based authentication, ACLs and HDP provided security features such as Ranger and Knox and Protegrity's data protection features.
Technologies - Hortonworks Data Platform, Kerberos, Protegrity, Apache Ranger
Projects
Location Content Management System (LCMS)
2009-2010, 2013 - 2015
LCMS was developed based on Hadoop for the processing and storage of the POI data. It replaced the existing Oracle based data processing and data storage solution. The POI data processing involved the following stages – cleaning, standardization, validation, geocoding, matching and blending. These stages were implemented as separate Hadoop MapReduce jobs for the parallel processing of the data. The system was capable of ingesting and processing millions of records per day and making it available for consumption in a very less amount of time. HBase was used for the data storage.
Technologies used - Cloudera Hadoop Distribution, HBase, Pig, Hive, Oozie, Drools, Java
Local Business Portal (LBP) API Jun’11 – Dec’12
Development of a unified REST API layer providing a consistent and rich set of RESTful APIs to be used by external systems.
Technologies used - RESTful web services, Apache CXF, Spring 2.1, Java
LCS (Location Content System) Relationships Nov’10 – Jun’11
Implementation of a solution to allow a more powerful way of querying the relationships between different entities like location and a point of interest using Semantic Web technologies such as RDF and OWL.
Technologies used - RDF, OWL, Triple store, SPARQL, Jena, SDB, Graph data store, RESTful web services, PostGres EnterpriseDB 8.4, Memcached, Squid cache, RestEasy
Projects
Rich Content Framework
Led the development and maintenance of a Java-based framework used for the processing of XML files provided by 3rd party vendors containing rich data about the places of interest such as petrol pumps, restaurants, etc.
Technologies used - Java, XML, XSD, XSLT
Projects
VoIP Order Entry
2006 – 2008
Implementation of an order entry module which is a part of Customer Broadband Solution (CBS). CBS intends to provide an on-line solution for customers that wish to view the status of their relationship with the client or change the nature of that relationship, an on-line solution for customer service representatives (CSRs) that act on behalf of customers to carry out significant changes to their relationships with the client in a controlled way.
Technologies used - Struts 1.2, EJB 2.1, Hibernate 3.0, J2EE 1.4, JSP 2.0, Java 1.5, Oracle 9i
IdM (Identity Management) 2004 – 2006
Development of a centralized identity and access management solution containing features such as role-based access control and single sign on (SSO). The solution tried to achieve SOX (Sarbanes Oxley) compliance.
Technologies used - EJB 2.0, Sun Access Manager, Sun Identity Manager, Sun Directory Server, LDAP, Struts 1.2, Java
Google Cloud Platform – Google Cloud Storage, BigQuery, Google Cloud Dataflow, Cloud Composer, Dataproc, Pub/Sub, Google Kubernetes Engine, Cloud Functions