Summary
Overview
Work History
Education
Skills
Work Availability
Certification
Timeline
Generic

MANGESH BORADE

Big Data And Cloud Architect
Pune,India

Summary

  • Big Data and Cloud Architect with 17+ years of experience.
  • Expertise in building cloud-based, cloud-native and serverless solutions involving Big Data analytics including migration of on-premise ETL workloads to Cloud.
  • Expertise in architecting and end-to-end implementation of data lakes based on Hadoop.
  • Experience with building real-time data ingestion and data processing pipelines.
  • Experience with building REST APIs.
  • Familiar with microservices based architectures and service mesh platforms like Istio.
  • Experience with building modern CI/CD pipelines.
  • Strong experience in designing and developing Java/J2EE based applications.
  • Strong hands-on knowledge.
  • Strong experience working in Agile development methodologies.
  • Google Cloud Certified Professional Cloud Architect.

Overview

17
17
years of professional experience
4
4
years of post-secondary education
1
1
Certification

Work History

Senior Architect

Datametica Solutions Pvt. Ltd.
Pune, Maharashtra
06.2015 - Current
  • Conduct assessment of existing Hadoop based data platforms and come up with the cloud migration strategy based on the findings.
  • Lead POCs, solution prototyping and guide the teams with the POC implementation.
  • Develop reusable assets, development methods, processes and best practices to accelerate delivery.
  • Keep pace with emerging technologies.

Projects

Product Analytics/Data foundation

2020 - current

The project involves building streaming ingestion pipelines on Google cloud to support real-time ingestion of viewing and health metrics events generated by the STB (Set top box) devices.

  • Designed the streaming ingestion pipeline based on Google Cloud Pub/Sub, Google Cloud Dataflow, Google Cloud functions and Bigtable.
  • Finalized the Cloud Pub/Sub and Bigtable configuration. Also and perfomed the Bigtable cluster sizing to ensure optimal throughput and low latency.
  • Established the logging and monitoring processes. Created alerting policies based on built-in and custom metrics using Google cloud monitoring.
  • Guided the development team on the implementation of the cloud function and cloud dataflow jobs.

Technologies used - Google Cloud Pub/Sub, Google Cloud Dataflow, Google Cloud Functions, Bigtable, Google Cloud Storage, BiqQuery, Cloud monitoring and logging, Java 8

Google Cloud Migration

2018 - 2020

The project involved migrating an on-premise Netezza data warehouse, on-premise Hadoop workloads and Redshift datawarehouse to GCP.

  • Worked in collaboration with the client operations and networking teams to build the GCP foundation including the GCP resource hierarchy, organization policies, VPC design and IAM policies.
  • Designed the data ingestion processes for ingesting data from various sources such as SFTP server, S3, Azure Blob storage, RDBMS, Google Analytics and 3rd party APIs.
  • Led the platform design and development involving the development of components such as a library of re-usable modules for programmatically interacting with the GCP services, a data transfer utility for helping with the data ingestion from various source system types to GCS and BigQuery, job framework, custom Airflow operators and few Dataflow jobs.
  • Implemented processes for copying historical data from Netezza, Hadoop and Redshift to GCP.
  • Established data governance based on Google Cloud Data Catalog.
  • Provided recommendation for the number of BigQuery slots based on past usage. Estimated the sizing of the Cloud Composer cluster.
  • Established a CI/CD process using Cloud build and Cloud source repositories.
  • Established logging and monitoring processes based on Stackdriver.
  • Led the initial POC effort involving evaluation of BigQuery, Cloud Dataflow and Dataproc.

Technologies used - Google Cloud Storage, BigQuery, Cloud Dataflow, Cloud Composer, Google Kubernetes Engine, Google Compute Engine, Google Secret Manager, Google Stackdriver, Cloud Source Repositories, Cloud Build, Cloud Functions, Cloud SQL, Python 2.7, Java 8

Enterprise Data Platform

2016 - 2018

The project involved building an enterprise data platform (EDP) based on Hadoop which would consolidate the diverse data ecosystem and act as a strategic data platform supporting BI, reporting and advanced analytics needs. The data platform was designed based on a layered data architecture model.

  • Established the data layer design including the HDFS directory layout, data access and data management rules.
  • Designed and implemented platform tools and components including Java based data copy tools supporting SFTP server and RDBMS sources, a generic Spark application for the ingestion of flat files, a job execution framework and Hive UDFs.
  • Designed and implemented data ingestion solutions for the ingestion of data from various source systems such as SFTP server, S3, Azure Blob Storage, Google Ads, Salesforce, etc.
  • Designed and implemented a streaming pipeline for one of the use cases involving real-time enrichment of Wi-Fi event data stream using Storm, HBase and Kafka.
  • Designed and implemented a data replication process for replicating data changes in near real time from Oracle to Hive using Kafka Connect framework for one of the applications.
  • YARN queues and container configuration for optimal cluster resources utilization and for ensuring the SLAs were met.
  • Carried out performance tuning of the Hive and Spark jobs.
  • Established a CI process including the automation of the deployment process.

Technologies used – Hortonworks Data Platform, Hive, Sqoop, Spark 2.x, HDFS, YARN, Storm 1.1, HBase 1.0, Kafka, Kafka Connect, Java 1.6, Python 2.7

Hadoop Platform Security

2015

The project involved implementing security controls on Hadoop cluster to support secure storage and processing of the PII data as part of a recommendation engine. The solution implemented used Hadoop’s built-in security features such as Kerberos based authentication, ACLs and HDP provided security features such as Ranger and Knox and Protegrity's data protection features.

  • Worked with the client's Hadoop administration team to finalize the Hadoop cluster configuration to enable Kerberos based authentication.
  • Defined the Ranger based authorization policies and HDFS and YARN ACLs.

Technologies - Hortonworks Data Platform, Kerberos, Protegrity, Apache Ranger

Lead Engineer

HERE Solutions India Pvt Ltd
Mumbai, Maharashtra
11.2010 - 05.2015

Projects

Location Content Management System (LCMS)

2009-2010, 2013 - 2015

LCMS was developed based on Hadoop for the processing and storage of the POI data. It replaced the existing Oracle based data processing and data storage solution. The POI data processing involved the following stages – cleaning, standardization, validation, geocoding, matching and blending. These stages were implemented as separate Hadoop MapReduce jobs for the parallel processing of the data. The system was capable of ingesting and processing millions of records per day and making it available for consumption in a very less amount of time. HBase was used for the data storage.

  • Led the design and development of the validation and extraction modules.
  • Was involved in the initial Hadoop MapReduce Java Job implementation and prototyping.
  • Automated the entire end-to-end extraction process involved in the creation of the extracts using Oozie.
  • Carried out performance benchmarking and further performance improvement of Hadoop jobs using techniques such as the usage of combiner and caching feature of HBase scan operation.

Technologies used - Cloudera Hadoop Distribution, HBase, Pig, Hive, Oozie, Drools, Java

Local Business Portal (LBP) API Jun’11 – Dec’12

Development of a unified REST API layer providing a consistent and rich set of RESTful APIs to be used by external systems.

  • Contributed to the RESTful API design.
  • Led the development and testing of the API.

Technologies used - RESTful web services, Apache CXF, Spring 2.1, Java

LCS (Location Content System) Relationships Nov’10 – Jun’11

Implementation of a solution to allow a more powerful way of querying the relationships between different entities like location and a point of interest using Semantic Web technologies such as RDF and OWL.

  • Contributed in developing initial Ontology to define the relationships using OWL and RDFS.
  • Implemented a Java API layer to interact with the triple store and query the triple store using Jena SDK.
  • Implemented RESTful web services to create, retrieve and query the relationships.

Technologies used - RDF, OWL, Triple store, SPARQL, Jena, SDB, Graph data store, RESTful web services, PostGres EnterpriseDB 8.4, Memcached, Squid cache, RestEasy

Development Lead

Ness Technologies India Pvt Ltd
Mumbai, Maharashtra
12.2008 - 10.2010

Projects

Rich Content Framework

Led the development and maintenance of a Java-based framework used for the processing of XML files provided by 3rd party vendors containing rich data about the places of interest such as petrol pumps, restaurants, etc.

Technologies used - Java, XML, XSD, XSLT

Senior Software Engineer

Persistent Systems Ltd
Pune, Maharashtra
12.2004 - 11.2008

Projects

VoIP Order Entry

2006 – 2008

Implementation of an order entry module which is a part of Customer Broadband Solution (CBS). CBS intends to provide an on-line solution for customers that wish to view the status of their relationship with the client or change the nature of that relationship, an on-line solution for customer service representatives (CSRs) that act on behalf of customers to carry out significant changes to their relationships with the client in a controlled way.

  • Implemented Struts action classes and developed JSPs for the order entry flow.
  • Added and modified APIs in existing EJBs to accommodate changes for voice orders.
  • Carried out integration testing to ensure that any of the existing flows did not break.
  • Support and bug-fixing.

Technologies used - Struts 1.2, EJB 2.1, Hibernate 3.0, J2EE 1.4, JSP 2.0, Java 1.5, Oracle 9i

IdM (Identity Management) 2004 – 2006

Development of a centralized identity and access management solution containing features such as role-based access control and single sign on (SSO). The solution tried to achieve SOX (Sarbanes Oxley) compliance.

  • Developed a stateless session bean to interact with Sun Access Manager for performing user and role management operations.
  • Implemented the helpdesk administrator module of the User Admin application using Struts framework.
  • Made customizations in Sun Identity Manager forms, workflows and configuration objects to meet the client requirements. Sun Identity Manger is a user provisioning application developed by Sun.
  • Developed resource adapters required by Sun Identity Manager for interacting with various resources for user provisioning.
  • Carried out Integration and JUnit tests.

Technologies used - EJB 2.0, Sun Access Manager, Sun Identity Manager, Sun Directory Server, LDAP, Struts 1.2, Java

Education

Bachelor of Technology - Chemical Engineering, Bombay – Mumbai

Indian Institute Of Technology
06.1997 - 05.2001

Skills

    Google Cloud Platform – Google Cloud Storage, BigQuery, Google Cloud Dataflow, Cloud Composer, Dataproc, Pub/Sub, Google Kubernetes Engine, Cloud Functions

undefined

Work Availability

monday
tuesday
wednesday
thursday
friday
saturday
sunday
morning
afternoon
evening
swipe to browse

Certification

Google Cloud Certified Professional Cloud Architect (Oct 2018 – Oct 2020)

Timeline

Senior Architect

Datametica Solutions Pvt. Ltd.
06.2015 - Current

Lead Engineer

HERE Solutions India Pvt Ltd
11.2010 - 05.2015

Development Lead

Ness Technologies India Pvt Ltd
12.2008 - 10.2010

Senior Software Engineer

Persistent Systems Ltd
12.2004 - 11.2008

Bachelor of Technology - Chemical Engineering, Bombay – Mumbai

Indian Institute Of Technology
06.1997 - 05.2001
MANGESH BORADEBig Data And Cloud Architect