Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Sankara Narayanan S

Trichy

Summary

Experienced technology professional with a proven track record of innovation and adaptability. Skilled in optimizing systems and integrating technical solutions to achieve business objectives. Demonstrated expertise in project management, leading initiatives from start to finish to drive organizational growth and success.

Overview

10
10
years of professional experience
1
1
Certification

Work History

Staff Engineer | Data Engineering

Altimetrik
01.2025 - Current

Project Name: FGemini – Ford Domain-Aware Conversational AI (POC)


Description:
Designed and implemented a custom conversational AI assistant ("FGemini") using Google Cloud’s Vertex AI, combining Ford internal domain knowledge (e.g., manufacturing, sales, service) with public data via Gemini Pro and Retrieval-Augmented Generation (RAG). The application mimics a ChatGPT-like interface and intelligently answers cross-domain business queries by integrating structured Ford data with unstructured web sources, demonstrating real-world AI integration at enterprise scale.

Responsibilities and Highlights:

  • Developed a dynamic web UI with persistent chat history and contextual understanding using ReactJS and Firebase Hosting.
  • Integrated Google Gemini (via Vertex AI) to handle general NLP tasks and external search augmentation.
  • Built a RAG pipeline using: Text Embedding Gecko to encode domain documents Vertex AI Matching Engine as the vector database
    Cloud Storage + BigQuery to ingest and store internal structured and unstructured Ford datasets
  • Implemented MCP (Multi-Context Processing) logic to intelligently route and merge domain knowledge based on user intent.
  • Designed CI/CD and API backend using:
    Cloud Run for REST endpoints
    Firestore for chat history and user state
    Tekton and Cloud Build for deployment automation

Staf Engineer | Data Engineering

Altimetrik
01.2024 - Current

Project Name: Data Platform & Engineering - DP&E

Ford is building a suite of data products tailored to the specific requirements of its portfolio teams. These data products aim to integrate critical Key Performance Indicators (KPIs) into business dashboards, enabling stakeholders to make informed decisions by identifying performance trends and areas that require improvement. The initiative spans across multiple business domains, considering Ford’s 120+ years of operations and extensive data landscape, including legacy systems and historical records stored in various formats.

Tasks and roles which I was involved in:

● Engaged with various portfolio teams to understand their business KPIs, data needs, and the decision-making insights they required from the dashboards.

● Performed in-depth exploration within the enterprise data warehouse to identify relevant data sources; collaborated with source system owners to understand metadata, data lineage, and any transformation logic required for KPI calculation.

● Developed scripts to extract and process the required data for each KPI, and shared sample outputs with stakeholders for validation. In many cases, historical data was maintained in spreadsheets, requiring data ingestion and transformation before integration into the centralized data product.

● Coordinated with multiple domain-specific teams across Ford to gather, validate, and standardize data for consistent reporting and analytics.

● Utilized various GCP services and supporting tools throughout the project lifecycle, including:

  • BigQuery for data processing and analytics
  • Cloud Storage for handling ingested files
  • Dataform for orchestrating scalable data workflows
  • Astronomer (Apache Airflow) for workflow scheduling
  • Alteryx for building and managing ETL pipelines
  • Tekton for CI/CD pipeline integration and validation

Staf Engineer | Data Engineering

Altimetrik
06.2023 - 12.2023

Project Name: Ford 3rd Party Data Ingestion and Curation

This project was part of Ford Motor Company's Global Data Insight & Analytics (GDIA) initiative, focusing on the ingestion, standardization, and curation of 3rd-party data sources to ensure consistency, usability, and quality across enterprise-wide analytics platforms. The work involved close coordination between various teams including data stewards, source system experts, and landing/data engineering teams. The objective was to streamline ingestion pipelines, enhance data reliability, and ensure the curated data met analytical and reporting needs.

Tasks and roles which I was involved in:

● Led comprehensive data source analysis by collaborating with data stewards to understand metadata, source-to-target mappings, ingestion timelines, and transformation rules.

● Performed end-to-end assessments of Qlik-based and file-based data sources, distinguishing between direct ingestions into GCP BigQuery and Qlik as a pipeline tool; validated source structures using data dictionaries and communicated findings to data scientists and engineering teams to ensure alignment with downstream requirements.

● Conducted data cleansing, transformation, and validation tasks on ingested data using GCP BigQuery, ensuring consistency and integrity across curated datasets.

● Delivered a Proof of Concept (PoC) using the Qlik tool to demonstrate performance characteristics, assess server utilization, and evaluate its integration into the broader pipeline architecture.

● Built and managed file-based ingestion workflows using a standardized framework that leveraged multiple services such as:

  • Dataproc for batch processing
  • Cloud Storage (Buckets) for file landing
  • BigQuery for storage and querying
  • Apache Airflow (via Astronomer) for scheduling and orchestration
  • Tekton for CI/CD pipeline automation

Senior Data Engineer

Quantiphi
06.2021 - 06.2023

Project Name: Definity

Definity is a migration project like moving the entire project setup from Cloudera to GCP. Here they are using multiple services for their workflow i.e. Control-M, Pentaho with 50 + source systems.

Tasks and roles which I involved:

  • As mentioned this is the migration project so we have to move their entire workflow to our new environment, for this I worked on "general ledger" source system, Here first we have to migrate their HiveQL queries to Big query with compatibility.
  • From GCP side we are using a big query for processing the data, separate CI/CD team load the data from client database to our side mostly in big query tables as per the client data. Once SQL script is ready we will do the unit testing based on the particular date for comparing the results with client end.
  • Then next step is creating the airflow DAG for scheduling the tasks based on the loading order for the final table.
  • In some places we will used the Databricks environment for executing the logical query results based on that will creating the file for the next runs for the jobs.
  • Created data models and defined transformation logic using dbt's SQL-based modeling language.
  • Ensured data quality and accuracy by writing unit tests and implementing validation checks in dbt.
  • Leveraged dbt's documentation features to maintain comprehensive and up-to-date documentation of data models and transformations.

Project Name: Telecom Argentina


This project is a kind of consulting PSO because things are already developed in the client environment, they are facing
performance issues in their CDF pipelines, and from our side, we are giving possible ways to improve the performance of the
pipeline.

Tasks and roles in which I was involved:

  • Mostly my daily work will be researching and learning the best and optimized way for the issue. Once I found the way. Based on that, I have created POCs out of it, So far I have created POCs in GCP services like Cloud Data Fusion, Cloud Dataproc, Cloud Composer for different scenarios.
  • In CDF have created the pipelines with different config options like executor core and memory for improving the performance of the pipeline for speeding the execution time and additional improving the pipelines performance have updated the configuration in the Dataproc cluster like changing the executor core and memory and scheduler method.
  • In Cloud Composer i have created the DAGs for triggering the one pipeline based on the success of the another pipeline.

Project Name: Tory Burch


Tory Burch this is a migration project like moving the entire project setup from Cloudera to GCP. Here they are using multiple services for their workflow i.e. Control-M, Informatica, and Azure. Apart from this, they are using dashboards for visualization.

Tasks and roles which I involved:

  • As I said is the migration project so we have to move their entire workflow to our new environment, so for moving the data from one source to another they are using bash scripts, inside that they are calling multiple UDFs(user-defined functions) and Hive SQL for storing the data into azure.
  • From GCP side we are using a big query, cloud storage, so migrating the bash scripts and Hive SQL for GCP bit complex because querying compatibility most of the Hive queries and functions will not work in big query SQL so we have to change those functions with big query functions then have to validate each function.
  • I worked on the validation part for the Data proc cluster for validating am executing the jobs on data proc in multiple ways like through CLI, UI, and Master Node (SSH).
  • I have worked one feature in the Machine Learning pipeline in their flow they have to trigger the pyspark job in data proc automatically, for this scenario I have written the POC python script in that I have used the data proc package client for triggering the function automatically.
  • I have worked on Power BI to Looker dashboard migration.

Software Engineer

Xerago
07.2018 - 06.2021

Project Name: CATO
This application aims to help banks increase their profit by analyzing customer segments and shooting relevant campaigns to keep them engaged. This application was developed using the technologies PySpark, with GCP Cloud.

This application workflow is :

Create campaign->Set filters->Set orchestration->Assign creative->Create contact policy->Run campaign->Capture response.


Allows enterprises to identify profitable customers, profile and segment customers, predict customer response to the communication, treat customers based on profile, communicate with them at a 1-1 level, and track and measure responses.

To create, execute and review the campaigns within a few minutes; track the response of the customer to the communication.

Tasks and roles in which I was involved:

  • Deployed and administrated multi-node Hadoop cluster using components Dataproc with a spark on Google Cloud Services.
  • Developing Spark core and Spark SQL jobs to classify customers into segments based on demographic and transactional data for further application functioning.
  • Configured application health monitoring system using Grafana and time series database of InfluxDB.
  • Developed chatterbot application using Rasacore, Rasa NLU

Executive

SPi Global
09.2015 - 06.2018

Project Name: NGMI(Next Generation Math Integration)
McGraw-Hill is a book publishing company. SPI Global plays an important role in McGraw-Hill’s business by processing the given raw data file (book’s contents) into a complete book format that can be sold in the Online Market.

Tasks and roles in which I was involved:

  • Have been part of the Development team for creating the HTML templates for the given raw data file and have experience in technologies HTML and CSS.
  • Have worked on creating the CSS Style Sheets for designing the book as per the specifications received from McGraw-Hill and have experience in testing HTML templates to match the customer’s requirements.
  • Have experience in coordinating with customers to fix the issue reported in the book’s design by applying technical abilities.

Project Name: Company Data Management Platform(CDMP)


This project deals with the centralization and processing of data from various facilities, format, normalizing, and standardizing that data in a way that can be used to create and improve the community inventory records. The main purpose of this project was to move the huge volume of structured data from RDBMS to a big data platform to provide fast query results, high availability, data consistency, and security. This application was developed using technologies such as Hadoop, Spark, Sqoop, Oozie & Hive.

Tasks and roles in which I was involved:

  • Created Sqoop jobs to load data from the MS SQL server to HDFS with the incremental feature.
  • Experience in Importing and Exporting Data using SQOOP from HDFS to Relational Database systems.
  • Developed Hive scripts to load the processed data into Hive to improve query performance.
  • Created Sub-Queries for filtering and faster execution of data.
  • Designed and Developed Oozie workflow to coordinate the above tasks and schedule the workflow daily.

Education

Bachelor of computer science and engineering - Computer Science Engineering

Sree Sowdamibika Collage of Engineering
Aruppukkottai
12.2014

Skills

  • Cloud - GCP
  • Data Processing Framework - PySpark, SQL
  • AI - Vertex AI (Gemini Pro, Embedding APIsetc) Matching Engine (Vector DB) Firestore MCP Retrieval-Augmented Generation (RAG) Semantic Search NLP
  • Cloud Data Warehouse - BigQuery
  • Data Storage - Google Cloud Storage
  • Data integration - Cloud Data Fusion, Dataflow, Data Build Tool(DBT)
  • Scheduler - Cloud Composer
  • Language - Python
  • Web Development Framework - Flask
  • Web Development Tools - HTML, CSS, Javascript

Certification

Google Cloud Associate Cloud Engineer

Google Cloud Professional Data Engineer

Timeline

Staff Engineer | Data Engineering

Altimetrik
01.2025 - Current

Staf Engineer | Data Engineering

Altimetrik
01.2024 - Current

Staf Engineer | Data Engineering

Altimetrik
06.2023 - 12.2023

Senior Data Engineer

Quantiphi
06.2021 - 06.2023

Software Engineer

Xerago
07.2018 - 06.2021

Executive

SPi Global
09.2015 - 06.2018

Bachelor of computer science and engineering - Computer Science Engineering

Sree Sowdamibika Collage of Engineering
Sankara Narayanan S