Experienced and enthusiastic Consultant with a proven track record of success in various industries. Possessing exceptional interpersonal, problem-solving, and analytical skills, I provide valuable advice and expertise to client organizations, driving improvements in business performance. Experienced leader with a strong background in guiding teams, managing complex projects, and achieving strategic objectives. Expertise lies in developing efficient processes, ensuring high standards, and aligning efforts with organizational goals. Known for collaborative approach and unwavering commitment to excellence. Versatile skills in project management, problem-solving, and collaboration bring a fresh perspective and a strong dedication to quality and success. Adaptability and proactive approach recognized for delivering effective solutions.
Pyspark
Developed and implemented 21 key performance indicators (KPIs) to streamline business reporting and retailer billing, ensuring precise and timely data delivery. KPIs were presented in multiple time frames such as Month to Date, Last Month to Date, File to Date, and Last to Last Month to Date, leveraging CSV files for efficient reporting. Enhanced data processing performance through optimization techniques, including Adaptive Query Execution (AQE), partitioning, and predicate pushdown. Utilized performance tuning best practices by eliminating unnecessary actions, optimizing executor configurations, and adjusting driver settings. Delivered KPIs on a D-1 basis, facilitating timely, data-driven decision-making.
Successfully migrated critical billing data reporting from on-premises systems to Azure Data Factory (ADF) to ensure accurate, error-free compliance reporting for ASP-GSP. Consolidated billing data from various streams and circles on a monthly and daily basis, with reports for VBS and NON-VBS pushed to the ASP portal for final reporting. Addressed key challenges around scheduling and triggering processes based on data availability from the EBPP system. Developed master scripts to automate the trigger process, ensuring data is only processed when available, and skipping unnecessary steps to accommodate varying billing cycles across different circles.
Project scope is to ingest and integrate four extracts from upstream system and send 6 extracts to the downstream systems PSGL and one extract to FDW by applying required transformation and integrate all 4 extracts to Data warehouse enterprise tables. As part of this project, we have developed Pyspark jobs and shell scripts and Airflow Dag’s and prepared required SQLs to unit test the Data. In this project we are using ELT concept where will stage all the Data into four staging tables while loading these tables, we have used various checks to make sure the complete data flown to staging tables like duplicate file check, reject data check, tagging values check and trigger file to stage validation check related to count and amounts. For PSGL requirement, we have extracted the data from staging layer and loaded into work table with A side and B side separately, then applied requirement transformations like grouping on segment/Business unit wise and prepared header, trailer and line records and send it out to downstream application PSGL through Electronic communication gateway. For Integration requirement, loaded data into source repository (where will preserve history) and then created required surrogate keys to uniquely identify the records, loaded into common format layer using required transformations including ETL load indicator to identify is it an insert or update record, loaded into Base tables and performed a check on monetary amount between Base and source repository to make sure no data loss. Created Subledger view which is user facing view which will be exposed to user is built on top of the 3NF enterprise tables to make a Flattened view. Applied certain roles to the PSGL report views to make it accessible to user for read-only purpose.
Project scope is to eliminate the DataStage processing for data acquisition process and decrease the data availability time into the staging area and reduce the TCO. For this purpose, we have developed UDW Lake DA framework. Spark Framework can be able to pick the file from lake environment and after doing basic control checks can load into Teradata DB. Spark jobs are designed in such a way that in order to ingest new file format, we can prepare new config and use the existing spark common job, due to this capability the time taken for developing/ingesting new files were took less time and at the same time increased quality.
Azure Fundamentals ( AZ -900 )
Received various recognition and awards for the quality of projects delivered.