KYC Automation Project – Data Integration and ETL Development
- As part of a broader KYC automation project, we aimed to reduce manual efforts by automating over 30 key fields in a form with 100+ inputs. We sourced data from a third-party vendor (BVD) but faced compliance challenges with direct API access. Upon discovering the data was already stored within another team's platform, we accessed it via their internal API. Using Alteryx, we developed an ETL pipeline to clean, transform, and store the data in a tabular format. This involved overcoming challenges with JSON data by employing advanced regex techniques to ensure the data met specific quality and formatting requirements.
LLM Prompt Engineering for KYC Process Optimization
- In a KYC-related LLM Prompt Engineering project, we aimed to fast-track tasks like comparing related parties and beneficial owners across lifecycle documents. Initial single-step prompts yielded inconsistent results. Through iterative optimization, we improved outcomes by breaking the process into step-by-step prompts, incorporating outlier scenarios and key business insights. After each step, we instructed the LLM to review and refine its responses, treating it as a "student" learning from feedback to generate more accurate answers. This approach consistently delivered refined outputs, effectively addressing both straightforward and complex tasks requiring broader business understanding and validation against guidance documents.
Source Code Analysis Gen AI
- Developed an interactive Q&A system to simplify understanding large codebases. The project clones a GitHub repository, processes Python files into structured chunks using LangChain, creates a searchable knowledge base with OpenAI embeddings stored in a Chroma vector database, and integrates a conversational interface using OpenAI's ChatGPT and memory for context-aware responses. The system enables seamless querying of code insights, like understanding classes or methods, enhancing developer productivity.
- Tech Stack: Python, Git, LangChain, OpenAI API, Chroma. Highlights: Automated knowledge extraction, semantic search, and conversational AI for efficient codebase exploration.
Customer Segmentation Model for a US Retail Giant
- Performed movement analysis utilizing RFM on various customer segments to analyze their evolution over multiple historical time
- frames.
- Developed a customer segmentation machine learning model using an RFM approach for the retail client's marketing team, enabling optimization of customer retention and acquisition strategies through predicting future customer segments.
Market Basket Clustering for a US Retail Giant
- Conducted unsupervised machine learning analysis on spending
- patterns for a major retail player, aimed at deriving distinct customer groups and informing targeted marketing strategies. Leveraged spending proportions of categorical and non-categorical
- features to construct the features. Employed K-means modeling to unveil purchasing preferences and product associations within these clusters.
- Transformed cluster insights into compelling narratives to guide strategic marketing approaches.
Employee Attrition Prediction Analysis for HR
- This was a data analytics project in which we prepared a model for the organization to predict Attrition and provide insights from the data about the important factors associated with it so that the organization can take corrective or preventive measures to stop or control it.
More projects - https://github.com/ankitbiswas/GenAI.git