Data Engineer
DIGIQUANTA IT SERVICES LLP
Hyderabad
07.2023 - Current
- Utilized NumPy, Pandas, Matplotlib, and Seaborn for data preprocessing and visualization, including IP address clustering and feature engineering.
- Employed various machine learning algorithms, such as Decision Tree, KNN, RandomForest, and XGBoost, to identify potential fraud in merchant records.
- Utilized K-means clustering on Spark DataFrame, handypandas, and PySpark pipelines to analyze and conclude the involvement of two hackers in a data leakage attack.
- Conducted sentiment classification on IMDb movie reviews using LSTM and GRU models with pre-trained GloVe embeddings for improved performance in natural language processing.
- Contributed to open source initiatives by orchestrating data extraction from public websites, employing pytesseract for OCR processing, implementing rigorous data cleansing procedures, and ultimately organizing the refined data into structured tabular formats.