Skip to content

PROJECTS UNDERTAKEN


German Biography Generator

Project Summary

Developed a summarization tool using a large language model (LLM) to generate concise German biographies by processing text in chunks, refining coherence, and removing redundancies. The tool efficiently reads and summarizes Word, CSV, and PDF documents, with robust exception handling to ensure clear, focused outputs.

Techstacks Used

NumPy, Pandas, Flask, API, NLP

Hush Hush Recruiter Candidate Selection

Project Summary

Led project team at Hush Hush Recruiter in defining goals, extracting GitHub data via APIs, applying K-means clustering for candidate filtering, integrating SQLite for profile storage, and implementing automated email functionality for streamlined communication with chosen candidates

Techstacks Used

Scikit-learn, NumPy, Pandas, SQLite, Vercel

Data-Pipeline-Project

Project Summary

The project involves constructing a data pipeline with Python scripts and Docker, encompassing data fetching, processing, and storage in Google Big Query, along with visualization in Tableau, leveraging Docker for portability, Google Cloud SDK for integration, and Python for scripting, with detailed instructions provided for setup and usage.

Techstacks Used

Docker, MySQL, Google Big Query, Google Cloud SDK, Tableau

Integrated Data Pipeline: Hadoop, Scraping, DB, Testing

Project Summary

Developed a data scraping solution for an anime-related website to efficiently collect and process data for analysis and reporting purposes. - Roles and Responsibilities: Led the implementation of a comprehensive data scraping solution, orchestrated the setup of a scalable Hadoop ecosystem, managed infrastructure using Docker and VirtualBox, and oversaw data transfer from HDFS to SQLite for analysis.

Techstacks Used

Dockers, Hadoop, Py-spark, Scrapy, MySQL and GCP

Chatbot using RAG

Project Summary

Retrieval-Augmented Generation (RAG) model-based chatbot application. The chatbot uses LLAMA3 to help students answer queries by orchestrating a flow through various modules and displaying the results.

Techstacks Used

langchain, python

Los Angeles Crime Data Exploration and Visualization

Project Summary

Cleaned and visualized a complex crime dataset using Tableau Prep and Tableau Desktop facilitating better decision making through reliable insights.

Techstacks Used

Tableau Prep, Trifacta Data Prep, GCP Big Query, Data Profiling, Data Cleaning and Uncleaning processes.

Prime Video Data Analysis Project using Power BI

Project Summary

Developed an interactive Power BI dashboard for analyzing Prime Video content, enabling data-driven decision making for production strategies by providing clear insights into library composition and trends, thereby enhancing content selection and audience engagement.

Techstacks Used

Power BI, Data Profiling, Data Cleaning and Uncleaning processes.

Interactive Dashboard with SAS Visual Analytics

Project Summary

To create an interactive dashboard for visualizing and analyzing a dataset containing information about German companiest.

Techstacks Used

SAS Visual Analytics, Data Cleaning and Analysis Technique.