Experienced Senior Data Engineer with a demonstrated history of working in different sectors. Solved data mysteries for different domains like Health sciences, Banking. Have designed scalable & optimized data pipelines to handle Petabytes of data, with Batch & Real Time frequency enhanced the pipelines by saving cost and processing time around 20-30%.
Automated Data profiling using PySpark both for incremental data on existing tables and on demand tables basing on request