PROJECTS UNDERTAKEN
German Biography Generator
Project Summary
Developed a summarization tool using a large language model (LLM) to generate concise German biographies by processing text in chunks, refining coherence, and removing redundancies. The tool efficiently reads and summarizes Word, CSV, and PDF documents, with robust exception handling to ensure clear, focused outputs.
Techstacks Used
NumPy, Pandas, Flask, API, NLP
Hush Hush Recruiter Candidate Selection
Project Summary
Led project team at Hush Hush Recruiter in defining goals, extracting GitHub data via APIs, applying K-means clustering for candidate filtering, integrating SQLite for profile storage, and implementing automated email functionality for streamlined communication with chosen candidates
Techstacks Used
Scikit-learn, NumPy, Pandas, SQLite, Vercel
Data-Pipeline-Project
Project Summary
The project involves constructing a data pipeline with Python scripts and Docker, encompassing data fetching, processing, and storage in Google Big Query, along with visualization in Tableau, leveraging Docker for portability, Google Cloud SDK for integration, and Python for scripting, with detailed instructions provided for setup and usage.
Techstacks Used
Docker, MySQL, Google Big Query, Google Cloud SDK, Tableau
Integrated Data Pipeline: Hadoop, Scraping, DB, Testing
Project Summary
Developed a data scraping solution for an anime-related website to efficiently collect and process data for analysis and reporting purposes. - Roles and Responsibilities: Led the implementation of a comprehensive data scraping solution, orchestrated the setup of a scalable Hadoop ecosystem, managed infrastructure using Docker and VirtualBox, and oversaw data transfer from HDFS to SQLite for analysis.
Techstacks Used
Dockers, Hadoop, Py-spark, Scrapy, MySQL and GCP
Chatbot using RAG
Project Summary
Retrieval-Augmented Generation (RAG) model-based chatbot application. The chatbot uses LLAMA3 to help students answer queries by orchestrating a flow through various modules and displaying the results.
Techstacks Used
langchain, python
Los Angeles Crime Data Exploration and Visualization
Project Summary
Cleaned and visualized a complex crime dataset using Tableau Prep and Tableau Desktop facilitating better decision making through reliable insights.
Techstacks Used
Tableau Prep, Trifacta Data Prep, GCP Big Query, Data Profiling, Data Cleaning and Uncleaning processes.
Prime Video Data Analysis Project using Power BI
Project Summary
Developed an interactive Power BI dashboard for analyzing Prime Video content, enabling data-driven decision making for production strategies by providing clear insights into library composition and trends, thereby enhancing content selection and audience engagement.
Techstacks Used
Power BI, Data Profiling, Data Cleaning and Uncleaning processes.
Interactive Dashboard with SAS Visual Analytics
Project Summary
To create an interactive dashboard for visualizing and analyzing a dataset containing information about German companiest.
Techstacks Used
SAS Visual Analytics, Data Cleaning and Analysis Technique.