About Me

Greeting DataNauts👋!! I'm a Data Scientist with 2 years of Professional & Industry Experience in Data Science & Product Developement + 4+ years of Solid self-learning & hands-on experience in working on Machine Learning/Data Science Project developement. Ability to go above and beyond and never Stop unless and until reach boson of the Quark(Problem).

Education: - BTech in Information Technology from Governement College of Engineering, Karad (2019 - 2023) { CGPA: 8.50 }

Key Strengths: Contribution to the Success of the Team is One of the Core Principle of My Professional Journey. Attention to detail, Collaboration, dedication, Curiosity and Problem Solving are my key Strengths.

Technical Skillsets: Generative AI, Large Language Modeks, RAG, Agents, Machine Learning, Predictive Analytics, Deep Learning, AI, Data Science, AWS, Docker, Git, MLOPS, Kubeflow, Mlflow, Async, Package Developement, Optimization, Automation..

Give back to Society: Talk and Write about Machine Learning/Data Science, Generative AI on my Blog Community Teckbaker's Started during my college in 2023, and also on Medium to Learn and Share the Knowledge.

🚀 What sets me apart is not just my technical expertise, but my insatiable curiosity and commitment to continuous learning. I thrive on challenges and am always eager to explore new horizons. Whether it's diving into the latest technological advancements or embracing a leadership role, my enthusiasm knows no bounds.

Extra-Curricular Activities: As a YOGA Practitioner, I find balance in both mind and body, fostering resilience and focus. A fervent reader, I believe in the transformative power of knowledge. Connecting with diverse minds energizes me, and I actively seek opportunities to Collaborate with people from various walks of life.

Experience


Full Time


Katonic AI (February 2023 - Present)

About Katonic.ai A Platform as a service No-Code Generative AI Platform Startup Started in 2020. An Award winning MLops platform. Headquartered in Sydney, Australia. 30+ member Agile Product Developement team. Building GenAI Platform for Larger Enterprises(B2B).

I worked as a Data Scientist cum Founding Engineer at Katonic.ai A Core Contributor in Product Developement and Data Science team. Joined as a Full time Intern for 4 months (Feb 2023 - June 2023). Converted into Full-time Data Scientist role.

Internship Contribution (February 2023 - June 2023)

1) Research and Understand past Client Use-case on Video Labelling.
2) Worked on Video Labelling Custom Model Deployment as an API using Mlflow, FastAPI and tensorflow on Mlops Platform.
3) Worked with Founder and VP in building demo Machine Learning usecases using streamlit.

Key skills
- Machine Learning, Video Preprocessing/Classification, Python, Mlflow.

POC Contributions

1) Worked in a team of 4, for building a Banking POC- Extraction Usecase Broker Support Validation, Contributed in Transactions and Credit Card Statement Validation Pipeline.
2) Worked in a team of 2 for building a RAG Chatbot an AI copilot for the Everest group Internal Analytics team, Our POC solution convert the POC to Paid Client.
3) Contributed to implementing Data-preprocessing pipeline for a Manufacturing company using Kubeflow for their Text-to-SQL Usecase, Efficiently Fetching the raw data from their DB, Convert raw data into Usecase specific tables, Storing into postgres to further used by Agentic System to generate a SQL query.

Key skills
- LLM models, GenAI implementation for real-time usecases, Prompt-engineering, SQL, Kubeflow ingestion & Data Preprocessing Pipelines.

Product Developement contribution

1) Building & Strategizing Fine-tuning Product/Platform(Adaptive Studio) developement.
2) Contributed in adding Mlflow support for Custom LLM model logging and Experimentation during LLM model Finetuning, integrated to their finetuning pipeline.
3) Contributed in improving and optimizing the Smartcopilot(RAG as API feature) by implementing advance RAG improvement stratergies, Iteratively implement features based on Client Feedbacks.

Key skills
- Strong Debugging, Agile Software developement, Product-first thinking, Competitor Research, Python Codebase developement from Scratch, Python, API-Developement(FastAPI), Mlflow, Pytorch.

Value Added

1) Creating and delivering Go-to-market GenAI usecases & demos for the clients of various domains E-commerce, Healthcare, telecom, using Streamlit
along with sales team helped to Acquire clients for POC's.
2) Worked in a Team to create a Multimodal telecom domain usecase for T-systems Challenge, Innovative solution helped the team to get selected in Top-10.
3) Able to Convert a POC to a productionize paid Client-EverestGroup by enhancing their Copilot, This contribution boosted company growth and added revenue, strengthening client engagement and product adoption
4) Building Smartcopilot feature, a Knowledge chatbot which provides API to integrate on your own systems, backed by Udyog Mitra Chatbot for Maharashtra Government(MSINS).

Feedback Received

- Strong Work Ethic, Solution oriented approach, Attention-to-detail, Innovative & Streamline Work, Clean coder
- Collaborative team player

Key Learnings

- Thrive in a Startup Environment- Faster Delivery.
- Learn Resilience, Handle Pressure.
- Ownership of the tasks/Work.
- End to End system and project understanding.
- Developed Strong Debugging Skills.
- Solution Oriented & Decision Making Mindset to approach any Problem.
- Strong Collaboration within product team to stratergise, research and developement features.


Client: Everest Group (September 2023 - October 2024)


A Client Usecase by Katonic.ai, Worked for Full-time from Research - POC - Deployment - support. Our team was able to convert their POC to paid-client.

About Usecase
A Knowledge grounded Chatbot Assistant for the Internal Analytics team of Everest Group, Integrated with their Microsoft teams.
KPI by their team - Reduce 50% of their analytics team research time, improve productivity & High Accuracy of responses.

Contribution
1) A Document Pre-processing Pipeline specific for the client data, contributed to multi-class figure classification model using Deep Learning. fine-tuned a ResNet-50 model to classify the 8 figure patterns viz. bar_chart, diagram_chart, flow_chart, graph_chart, growth_chart, pie_chart, normal_images, tables_chart.
2) Developed an SDK to preprocess the research industry documents leads to 20% improvement in retrieval.
3) Research on Best retrieval methods, chunking methods, data loaders, LLM models, embedding models specific to their data, leads to 30% improvement in performance.
4) Research and added metadata like Source, pages, published year, authors, helped for quality responses & pre-filtering during retrieval.
5) Innovative Solution: adding Source citations and page numbers, helped their Analysts for Easy Navigation of the reports, along with answers, Helped to Satisfy Client goal.
6) Created batch Ingestion pipeline using Kubeflow to insert daily updated data of PDF's + Blogs into Milvus Vectorstore, An Innovative Solution to re-insert latest updated information from modified PDF's, helped to increase the accuracy and correctness of responses.
7) Continuous Feedback from their analysts, Analysed pattern of dynamic questions and thinking, Developed a Question routing approach with calling respective RAG systems based on Question type, helped improving Quality of Chatbot.
8) For Statistics related Questions, developed a Text-to-Python, unlike Text-to-SQL it generated Python code, As Dataset was less dynamic(non-changing schema) + 3 tables + LLM model specialised in python code generation, Helped to Fulfill Client goal.
9) Develope Async codebase, to reduce the latency of responses and ttfirst-token.
10) Cost reduction by trying out prompt compression techniques, reduced cost per stats question upto $0.23, $0.12 for other questions.

Outcome delivered
- Saved 30 minutes of research time per Question of their research Analysts.
- Automated Ingestion Pipeline to fetch latest reports and daily blogs data, for up-to-date knowledge responses.
- 1000 analysts used the chatbot on daily basis for their research work.

Key Skills
- Kubeflow, Python, SDK building, R&D, RAG, prompting, prompt-compression, Embeddings, LLM, VectorDB, retrieval, Async, Optimization, Agile developement.


Open Source Contribution

Kangaroo-LLM (September 2024 - December 2024)

open source contribution

Research
- Research & documentaion on Finetuning approaches.
- Research on synthetic data generation
- Research on Common-crawl(CC) data for AUS, metadata storage stratergy for Kangaroo-bot.
- Research on Nemo-Curator for pre-trained or CC type data preprocessing for Secured, Soverign model Building.

Kangaroo-bot Developement
- Worked on Building Webscraping Pipeline for Scraping 7 Lakh AUS websites. - Implemented Secure scraping approaches helps to scrape allowed URL's, internal Sub-url's, removed errored wesites.
- Efficient multiprocessing, pipeline execution stratergy, kubeflow capability, reduced the scraping time from 20days to 7days.

Key Skills
R&D, Reading Research Paper, Effective and Secured Web Scraping, Scalable Data Pipeline Building, Kubeflow, pyspark.


Internship

Persistent Systems - Software Engineer Intern
(January 2023 - May 2023)

Selected for Persistent Systems Pre-onboarding Software Developement Internship Program, 4 month Expert program by persistent Leaders.
Learned Practical Understanding from Industry Leaders, and fundamentals of Software Developement.

Gain Practical Experience and Knowledge in Advance Python, Data Analysis, Object Oriented Programming, Git, Docker, Linux, SQL, Software Developement Lifecycle, Agile Developement


Devtown (formerly ShapeAI) - Data Science Intern
(June 2021 - December 2021)

About

A 4 month training and 3 month Project based Internship, Worked on building a QA Chatbot using Encoder-Decoder models for Devtown website named HelloBot.ai

• Developed expertise in understanding about Data Science Project Lifecycle with Data Preparation, Data Cleaning, Data Preprocessing, Predictive Analytics, Machine Learning, model development lifecycle.

• Worked on innovative projects that deepen the expertise in model development, model training & Validation, Imbalanced datasets, Hyperparameter Optimization, Model performance, Optimization.

Selected as one of 10 team members from a batch of 150 for the project.

Project

• Received Recognition & LOR for building a Multi-class wine quality Prediction classification case-study with improved accuracy of 70% on Random Forest model without overfitting, helped for the Paid project selection.

• Building HelloBot.ai Project in a team of 10, actively contribute in Building Encoder a part of Encoder-decoder models from Scratch(check in projects section), Contributed in dataset preprocessing, research and developement of encoder model for the Bot.

Key skills

Machine Learning, NLP, Tensorflow, Predictive Analytics, ML Model Building, Python, OOP, SQL, Object detection, Hypothesis testing, Data-preprocessing, feature engineering, model training, Optimization, Data Scraping.


Smartknower- AI Intern (October 2020 - January 2021)

About

A 2 month Training and Project based Internship program by smartknower.

• Developed proficiency in Building Deep Learning models. deep understanding of math concepts and gain confidence in building deep learning based models for a problem.

• Hands-on experience on working with CNN, Keras, Tensoflow, Neural Network Training & Developement, Gradient optimizations and Regularization techniques, Hyperparameter Tuning, Transfer Learning models, Image Preprocessing and Augmentation using Tensorflow.

• Active Participation in Technological Tasks and Activities.

Project

• Assigned a project to build an image classification model using transfer learning for a kaggle competition, with the constraint of using only data not present in the pretrained models' training set. Worked on improving accuracy while preventing overfitting.

• Developed a Dead Stars Image Classification Model, achieving 90% accuracy using the ResNet50 model. Conducted research and development, including data collection, selecting the best-performing model, expanding the dataset, adding hidden layers, and optimizing the model for improved performance.

Key skills

Deep neural network, Neural Network Training, CNN, Tensorflow, keras, Image Processing, Image classification model building, Streamlit, Transfer Learning, Regularization techniques, accuracy improvement techniques.

Projects

I worked on various real world projects ranging from classic Machine Learning to Deep Learning including applications for Natural Language Processing and Computer vision. Helped to gain practical experience for tools and techniques to work in Data Science Industry.

Working on diverse, real-world Projects, helped improving problem-solving skills, quick decision-making, and the ability to think and build scalable, impact-driven solutions

Continuosly Learning and Building, to contribute best of the solutions for the industry projects.



Generative AI

Project 1: Hybrid RAG- Python Package (August 2024 - Present)


Ingestion Pipeline - End to End support to insert your vector data inside Milvus VectorDB.
Experiment Tracking & Tracing with MLflow – Log experiments, parameters, and traces for every LLM and retrieval step, ensuring efficient latency & cost tracking.
RAG Evaluation with Ragas – Measure performance using faithfulness, answer relevance, and context precision, with future support for MLflow evaluation.
Cost Monitoring – Keep track of API usage by setting LLM pricing inside API parameters to optimize expenses.
Hybrid Search Capability – Semantic (dense) & keyword (sparse) retrieval, query expansion, Milvus-optimized retrieval, self-query retrieval, reranking, and auto-metadata filtering.
Nemo Guardrails (v0.1.1) – Uses vector similarity for question classification, reducing middleware time, preventing prompt injection attacks, and enforcing policy restrictions. available in v0.1.1.
Smart Summarization & Q&A Handling – Supports direct QA over documents, metadata filtering, and map-reduce summarization for extracting insights across document chunks.
Follow-up Question Generation – Auto-generate follow-up questions to improve engagement with users.
Custom PyFunc Hybrid-RAG MLflow Model – Register, deploy, and serve the best model directly as an MLflow API for production-grade scenarios.
Optimized Modules with Async Code – Fully asynchronous support for high-performance execution on Python 3.11+.
Speech-to-Text Model – Supports local multilingual models, Hugging Face Inference API, and custom endpoints for speech-to-text conversion.
Enhanced Logger Support – Detailed success/error logs stored in log/ with timestamped logs for full traceability.
Intelligent Modular Documentation – Well-structured developer-friendly documentation with modular examples.
CI/CD Support – Seamless model integration & deployment with GitHub Actions for Build-Test-Deploy pipelines.
Utility Functions for API/Streamlit apps – Enables response storage on GitHub or AWS S3 for fine-tuning datasets and evaluation tracking.
Poetry, Makefile & Pre-commit Hooks – Ensures best practices with pre-commit checks, packaging support, and agile development workflows.


Project 1.1: AI Consultant Chatbot (December 2024 - January 2025)

Frontend: Streamlit
I've builted a Chatbot over EY-India Blogs data Scraped from their Website(Scraping was allowed by their robots.txt) and test the end to end streamlit application builted using streamlit as frontend and Request-response using a RestAPI builted using FastAPI which uses Hybrid-RAG as Python package:)

Backend RestAPI: FastAPI

Builted RestAPI using FastAPI, limiter on API usage(10req/s), router support for robust system building, corsmiddleware to handle cross origin requests, swagger support.

Deployment: AWS ECS
- Used github actions to create and push the Docker-image to AWS ECS, further image used by AWS EKS to deploy the container.
- To optimize deployment, both Streamlit and FastAPI applications are hosted within a single container, with each service exposing its ports independently. Supervisord is used to manage both processes efficiently.
- The FastAPI application is exposed first, serving as the backend.
- The Streamlit application is integrated within the same container, ensuring seamless interaction with FastAPI.
- The Docker image follows best practices to be lightweight, ensuring minimal memory consumption and faster build times.


Project 2: Search: Multimodal Product Recommendation
(March 2024 - May 2024)

A Learning Project to understand and get hands-on a Multimodal usecase building.

- Builted an Any-to-Image search Application, Where users can input any modality and get the product recommendations in Images. You can search a collection of images using text, audio and images modality.
- Created a Synthetic dataset using pinterest fashion dataset contains Image-URL's.
- Approach 1: I created the image descriptions dataset based on the pinterest dataset images using gemini pro vision model.
- Initially tried with an approach, for text to image only.. Where doing similarity between question and image descriptions. Less accurate and more complex for other modalities.
- Approach 2: Used Imagebind multimodal which supports Text, Image and Audio modality. Store data into Pinecone vectorDB.
- Created a Gradio application where Users can try all 3 modalities and get product recommendations as Images.

Key Learnings

- Pinecone, Vector Embeddings, Multimodal, Similarity search, Gradio



Project 3: GenAI in E-commerce (July 2023 - August 2023)

Created Usecases to understand the Ecommerce domain and how genAI can be used for solving and automating the usecases.


MLOPS

Project 1: Full-Stack Mlops (November 2023 - Present)

A Learning Project to get hand-on understanding and learning Mlops by building. Currently Working on it.


NLP

Project 1: HelloBot.ai (October 2021 - December 2021)

This is a real-world Conversational AI NLP Chatbot builted using Encoder-Decoder models on devtown company data.

- Mainly Contributed in building Enoder Model from scratch using tensorflow and Python OOP.
- Trained the chatbot on QA dataset.

Key Learnings
- Tensorflow, Encoder-Decoder, OOP, Python, Text-processing, Model Training, Deep Learning, NLP.



Project 2: Text Classification: Airline
Customer reviews Analysis (October 2021 - October 2021)

A Learning Project: Twitter Airline Sentiment Analysis and prediction Application, airline sentiment dataset from kaggle.

The methods used along to analyse and predict the sentiment is as follows:
- Text Preprocessing: Utilized neattext NLP functions for data cleaning, including removing special characters, stopwords, and performing lemmatization to normalize words.
- Feature Engineering: Converted text into machine-readable formats using One-Hot Encoding (CountVectorizer - Bag of Words) and Word Embeddings (Word2Vec).
- Handling Imbalanced Data: Checked class distribution and applied SMOTE (Synthetic Minority Over-sampling Technique) to balance the dataset.
- Dimensionality Reduction & Model Selection: Performed PCA for visualization and selected the best-fit model.
- Model Implementation: Implemented and compared multiple models, including Random Forest, SVC, and Multinomial Naïve Bayes, with Naïve Bayes performing optimally.
- Neural Network Optimization: Used a GRU-based approach to extract key features, flatten input vectors, and feed them into dense layers for improved accuracy.

Key Learnings
- text-preprocessing, machine learning, ML algorithms, NLP, Neural network, GRU, streamlit.



Machine Learning

Project 1: Hypothesize churning Of Gas and
Electricity customers (July 2022 - July 2022)

Created a Random Forest model for KPIs defined by Hypothetical utility client Powerco i.e. (20% discount affects the churn) BCG Data Science Job Simulation on Forage - July 2022.
- Completed a customer churn analysis simulation for Powerco Analytics, demonstrating advanced data analytics skills, identifying essential client data and outlining a strategic investigation approach.
- Conducted efficient data analysis using Python, including Pandas and NumPy. Employed data visualization techniques for insightful trend interpretation.
- Completed the engineering and optimization of a random forest model, achieving an 85% accuracy rate in predicting customer churn.
- Completed a concise executive summary for the Associate Director, delivering actionable insights for informed decision-making based on the analysis.



Project 2: Data Analysis on Social Media trends(March 2022)

Accenture North America Data Analytics and Visualization Job Simulation on Forage - March 2022
Completed a simulation focused on advising a hypothetical social media client as a Data Analyst at Accenture.
Cleaned, modelled and analyzed 7 datasets to uncover insights into content trends to inform strategic decisions.
Prepared a PowerPoint deck and video presentation to communicate key insights for the client and internal stakeholders


Project 3: Breast Cancer Prediction using Pyspark
(October 2022 - December 2022)

Created a Breast cancer Prediction ML model to hypothesize the patient's condition(will Alive/Dead) using Pyspark on Databricks



Project 4: Best Data Science Youtube Channel-2022
(September 2022 - September 2022)

Best Data Science youtube channel analysis. scrape data using Youtube API auth by Google cloud. topic modeling for winner youtube channel 3B1B using BERT.



Project 5: Forecast Bank Loan Application
(September 2021)

Created backend Xgboost Advance fintech ML model to Forebode Bank Loan, concluded Income & Credit History as salient features by RF model frontend using Streamlit module. deployment using Git over Heroku.

Key Learnings
- Machine Learning, model building, model training, Random forest,



Deep Learning

Project 1: Dead Stars Image Classification Application
(November 2020 - February 2021)

Identify DeadStars is a Transfer-learning based application builted using streamlit. the application is able to classify dead stars mainly Pulsar Star, Black holes, and White Dwarf with an accuracy of up to 95%.created docker image for the application front_end and deployed over Amazon EC-2.




Project 2: Research Publication & Project: Indian
Sign Language Recognition System (January 2023 - May 2023)

- Developed a deep learning-based hand gesture recognition model using CNN, GRU, and LSTM, achieving 97% accuracy in translating Indian Sign Language (ISL) into text or speech. Implemented computer vision techniques to differentiate between static and dynamic gestures, enhancing real-time sign recognition for improved accessibility.



Cloud: AWS


Project 1: Launching Wordpress Using
AWS EC-2 and Load Balancers(July 2022 - August 2022)

An AWS Internship task, setup wordpress on AWS.

Configured WordPress with RDS & EC2: Set up an AWS EC2 instance, installed WordPress, and connected it to an RDS database for dynamic content management.
Implemented Load Balancing: Deployed an Application Load Balancer to distribute traffic across multiple EC2 instances, ensuring high availability and scalability.
Infrastructure Management & Security: Configured Apache, PHP, IAM roles, and VPC security for seamless deployment, access control, and efficient resource utilization.

Certifications

Fractal Analytics: 1729 You&AI

It was a Professional hands-on Workshop held Virtually by Top Data Scientists in India organised by Fractal Analytics and Analytics Vidhya Learned the ethics and current demands of Data Industry along with Top-notch projects. Buisness Driven Data Solutions approach and how to provide the efficient solutions were discussed by experienced Data Scientists The tools and techniques that i learned during the workshop are:

1. Hands-on with Low Code Machine Learning using Pycaret.
2. Overview of Building Data Pipeline on AWS using Python and SQL.
3. Hands-on intro to Time-Series Forecasting, Anomaly detection and Recommendation Systems.
4. Built Data Pipeline using Apache Spark.
5. Deploying Models in Production and model maintenance(MLOPS).

AWS Devops


AWS: AWS Machine Learning




AWS: Data Analytics on AWS



Linkedin Learning:Excel




SQL For Data Science



DeepLearning.AI Certifications

DeepLearning.ai Courses Of Generative AI


Hyperparameter Tuning in Tensorflow




Introduction to Tensorflow for AI/ML/DL




CNN in Tensorflow



Blogs

Writing Blogs helped me to Stay updated with latest trends, Learn faster and deeper.


Teckbakers

Teckbaker's is a Blog Series Community started during my college in 2023, to make available all Technology content at a single place, I contribute to Machine Learning, Generative AI series

Popular Articles

Inside the Mind of ChatGPT
Transformers
AI Consultant Chatbot: Hybrid RAG
Case study: Why Netflix is the OTT Leader




Medium

I used to write a short form and deep-dive research blogs on medium for Data Science and Machine Learning, GenAI.

Popular Articles

Ensemble Learning, Vote for better performance
Convolutional Neural Networks, Deep Dive
Regularization Techniques in Deep Learning
Low Code Machine Learning using Pycaret

Elements

Text

This is bold and this is strong. This is italic and this is emphasized. This is superscript text and this is subscript text. This is underlined and this is code: for (;;) { ... }. Finally, this is a link.


Heading Level 2

Heading Level 3

Heading Level 4

Heading Level 5
Heading Level 6

Blockquote

Fringilla nisl. Donec accumsan interdum nisi, quis tincidunt felis sagittis eget tempus euismod. Vestibulum ante ipsum primis in faucibus vestibulum. Blandit adipiscing eu felis iaculis volutpat ac adipiscing accumsan faucibus. Vestibulum ante ipsum primis in faucibus lorem ipsum dolor sit amet nullam adipiscing eu felis.

Preformatted

i = 0;

while (!deck.isInOrder()) {
    print 'Iteration ' + i;
    deck.shuffle();
    i++;
}

print 'It took ' + i + ' iterations to sort the deck.';

Lists

Unordered

  • Dolor pulvinar etiam.
  • Sagittis adipiscing.
  • Felis enim feugiat.

Alternate

  • Dolor pulvinar etiam.
  • Sagittis adipiscing.
  • Felis enim feugiat.

Ordered

  1. Dolor pulvinar etiam.
  2. Etiam vel felis viverra.
  3. Felis enim feugiat.
  4. Dolor pulvinar etiam.
  5. Etiam vel felis lorem.
  6. Felis enim et feugiat.

Icons

Actions

Table

Default

Name Description Price
Item One Ante turpis integer aliquet porttitor. 29.99
Item Two Vis ac commodo adipiscing arcu aliquet. 19.99
Item Three Morbi faucibus arcu accumsan lorem. 29.99
Item Four Vitae integer tempus condimentum. 19.99
Item Five Ante turpis integer aliquet porttitor. 29.99
100.00

Alternate

Name Description Price
Item One Ante turpis integer aliquet porttitor. 29.99
Item Two Vis ac commodo adipiscing arcu aliquet. 19.99
Item Three Morbi faucibus arcu accumsan lorem. 29.99
Item Four Vitae integer tempus condimentum. 19.99
Item Five Ante turpis integer aliquet porttitor. 29.99
100.00

Buttons

  • Disabled
  • Disabled

Form