Available for opportunities

Hi, I'm Dheer Gupta

|

Building AI-powered solutions that transform raw data into actionable intelligence. From voice AI systems handling 24/7 operations to threat intelligence platforms processing 178K+ vulnerabilities.

0 Records Processed
0 % F1-Score (XSS)
0 Institutions Served
threat_classifier.py
class ThreatClassifier:
    def __init__(self):
        self.model = RandomForest()
        self.encoder = SBERT()

    def classify(self, threat):
        embedding = self.encoder.encode(threat)
        return self.model.predict(embedding)

# F1-Score: 0.98 (XSS), 0.84 (SQLi)
# Processed: 178,796 records

Turning Data Into Decisions

I'm a recent Computer Science and Data Science graduate from Knox College with a passion for building systems that make complex data actionable.

My work spans AI-powered property management systems, cyber threat intelligence platforms, and astronomical data pipelines. I specialize in taking messy, real-world data and transforming it into systems that help people make better decisions faster.

Whether it's helping security teams prioritize vulnerabilities with a 7x improvement in high-urgency detection, or enabling astronomers across 5 institutions to query 14GB of nightly observation data in under 1 second, I focus on building solutions that deliver measurable impact.

AI/ML Engineering Voice AI, NLP, Classification
Cybersecurity Threat Intelligence, CVE Analysis
Data Engineering ETL Pipelines, Real-time Processing
Dheer Gupta

Where I've Made Impact

Research Feb. 2025 - Jun. 2025

Data Science Researcher

Knox College

Cyber Threat Intelligence Platform

The Problem

Security teams face information overload with thousands of CVEs published yearly, scattered threat intelligence, and no clear prioritization. Manual triage is slow, leading to delayed patching of critical vulnerabilities.

What I Built

An end-to-end automated threat intelligence pipeline that ingests vulnerability data, classifies threats, detects emerging attacks, and prioritizes CVEs for security teams.

Technical Implementation

Data Ingestion NVD API 2.0, Web Scraping
NLP Pipeline NLTK, Regex, Custom Preprocessing
Classification Random Forest, SBERT, TF-IDF
Anomaly Detection Isolation Forest, Z-score
Visualization Streamlit, Plotly Dashboard
Deployment Docker Containerization

Actionable Outcomes

  • Automated threat categorization: Auto-labels threats as XSS, SQL Injection, Ransomware, etc., enabling filtered views by attack type
  • Prioritized patching queue: Urgency scores rank vulnerabilities so teams patch the most dangerous first
  • Early warning system: Anomaly detection flags zero-day indicators and mention spikes before widespread exploitation

Quantified Impact

7x Improvement High-urgency detection: 4.7% to 32.5%
178,796 Records Processed CVEs and threat reports
0.98 F1-Score (XSS) 1.00 precision, 0.96 recall
0.84 F1-Score (SQLi) 0.84 precision, 0.84 recall

Classification Performance

Threat Category Precision Recall F1-Score
XSS 1.00 0.96 0.98
Phishing 0.99 0.73 0.84
SQL Injection 0.84 0.84 0.84
Malware 0.97 0.73 0.83
Supply Chain 0.95 0.72 0.82

Urgency Scoring System

45% CVSS Severity
25% Patch Status
15% Sentiment
10% Exploit Indicators
5% Recency
Distribution: 12.1% Low | 55.3% Medium | 32.5% High Embedding Speed: 1,447 texts/second
Python Machine Learning NLP SBERT Docker
Collaboration Mar. 2025 - Jun. 2025

Data Engineer

MACRO Consortium

Astronomy Image Management System

Macalester, Augustana, Coe, Knox College, University of Iowa

The Problem

The Robert L. Mutel Telescope generates hundreds of astronomical images nightly (~14GB per night). Without a centralized system, researchers across 5 institutions could not efficiently search, filter, or access observation data.

What I Built

An automated astronomical data pipeline that ingests FITS images in real-time, extracts 40+ metadata parameters, scores image quality, and provides a searchable interface for researchers across all consortium institutions.

Technical Implementation

Backend API FastAPI, Python 3.12
Database MySQL 8.0, SQLAlchemy
File Processing Astropy (FITS extraction)
Auth Google OAuth 2.0, JWT
Monitoring Watchdog (real-time detection)
Infrastructure Docker, Nginx, 5 microservices

Actionable Outcomes

For Astronomers/Researchers
  • Instant data access: Query observations by date, filter type, temperature, quality score, target object in <1 second
  • Quality filtering: Automated scoring flags research-grade images so astronomers focus on usable data
  • Cross-institutional access: Students and faculty from all 5 MACRO schools query the same centralized database
For Observatory Operations
  • Zero-touch ingestion: New observation files detected and processed automatically
  • Continuous operation: Designed for robotic telescope running unattended multi-night campaigns

Quantified Capabilities

~14GB Nightly Data 400-500 FITS images per night
40+ Parameters Metadata extracted per image
<1s Query Time Multi-parameter search response
100% Accuracy Metadata extraction accuracy

Quality Scoring Algorithm

+20 Exposure time >30s
+20 Airmass <1.5
+30 Science frame type
+10 Standard filter (R,G,B,L)

Quality flag = True when score ≥ 60%

FastAPI MySQL Astropy Docker OAuth2
Contract Dec. 2025 - Jan. 2026

Digital Annotation Expert

Mercor

AI Training Data & Model Evaluation

What I Did

Evaluated and annotated multimodal AI outputs (audio, images, video) to improve model accuracy and reliability for production AI systems.

AI/ML Significance

Evaluate AI-generated outputs Model quality assessment
Compare results, select best RLHF pipeline contribution
Tag/annotate multimedia Training data creation
Follow consistency guidelines Data quality assurance

Actionable Outcomes

  • Training data creation: Tagged and annotated multimedia content used to train multimodal AI systems
  • Model evaluation: Compared AI-generated results and selected highest quality outputs, contributing to RLHF feedback loops
  • Quality assurance: Maintained consistency in AI training data following project guidelines
  • Multimodal AI improvement: Directly contributed to improving accuracy and reliability of production AI systems
RLHF Data Annotation Multimodal AI Quality Assurance
Part-time Sep. 2022 - Jun. 2025

ITS Lab Assistant

Knox College

Information Technology Services

What I Did

Provided technical support across 5 campus computer labs for nearly 3 years, troubleshooting hardware, software, and network issues for students and faculty.

Responsibilities

  • Technical troubleshooting: Diagnosed and resolved hardware, software, and network connectivity issues across Windows and Mac systems
  • User support: Assisted users with printing, login issues, application problems, and general IT inquiries
  • Network troubleshooting: Identified and resolved connectivity issues, escalating complex problems to network administrators
  • Access management: Routed IAM and access control issues to IT administrators; directed users to Help Desk for ticket generation

Skills Demonstrated

Technical Troubleshooting Customer Service Problem Escalation IT Ticketing Workflows Multi-platform Support
Technical Support Problem Solving Windows Mac

What I've Built

Cyber Threat Intelligence

Automated pipeline for gathering and analyzing cybersecurity vulnerability information from NVD API and security news sources.

  • Data collection with incremental CVE record retrieval
  • HTML cleaning, PII masking, CVSS severity extraction
  • Multi-vector threat categorization with urgency metrics
  • Zero-day detection and statistical anomaly detection
  • Streamlit dashboard for threat visualization
Python Docker NLP Streamlit
View on GitHub

Astronomy Image Management System

Microservices architecture for processing FITS astronomical images for the MACRO Consortium (5 universities).

  • 5 Docker containers: Frontend, API, Database, File Watcher, Ingestion
  • FastAPI REST service with MySQL 8.0 database
  • FITS file processing extracting 40+ astronomical parameters
  • Google OAuth 2.0 + JWT with role-based access control
  • Advanced filtering by date, temperature, quality thresholds
Python FastAPI Docker MySQL OAuth2
View on GitHub

Yelp Data Mining Project

Data mining and analysis of Yelp reviews to discover cuisines, popular dishes, restaurant recommendations, and hygiene predictions.

  • Exploratory Topic Modeling with LDA/PLSA
  • Cuisine Similarity Analysis using TF-IDF and clustering
  • Dish Recognition with phrase-mining algorithms
  • Hygiene Prediction using logistic regression, SVM, ensemble methods
  • Restaurant Recommendations based on food preferences
Python Machine Learning NLP Data Mining
View on GitHub

Netflix Recommendation System

Content-based recommendation system analyzing 8,800+ Netflix titles with ensemble ranking methods.

  • Feature Engineering: Genre encoding, numerical normalization
  • Algorithms: Cosine similarity, k-NN, Gaussian Naive Bayes (84% accuracy)
  • K-Means clustering and association rule mining
  • Google Sheets integration for real-time personalization
  • Ensemble ranking combining multiple methods for top-10 recommendations
Python Scikit-learn Jupyter ML
View on GitHub

Data Breaches Severity Research

Research paper investigating factors that determine the severity of data breaches in companies.

  • Analysis of variables influencing breach severity
  • Statistical analysis of breach data patterns
  • Identification of key risk factors for organizations
  • Recommendations for breach prevention strategies
Research Cybersecurity Data Analysis
View on GitHub

Solar System 3D

Interactive 3D model of the Solar System built with WebGL, featuring realistic planetary textures and orbits.

  • Real-time 3D rendering with WebGL
  • Accurate planetary textures and relative sizing
  • Interactive camera controls and orbit visualization
  • Custom sphere geometry and shading
JavaScript WebGL 3D Graphics
View on GitHub

Vapi Webhook Handler

Backend webhook handler for Vapi API services, deployed on Vercel for voice AI integrations.

  • Node.js and Express 5.1.0 backend
  • Webhook event processing and routing
  • Deployed on Vercel for serverless execution
  • Integration with voice AI services
Node.js Express Vercel API
View on GitHub

Midnight

2D game built with Godot Engine featuring AI-driven enemy behaviors. Team project for CS 292.

  • Godot Engine game development
  • AI-driven enemy pathfinding and behaviors
  • Collaborative team development (4 members)
  • Scene-based architecture design
Godot GDScript Game Dev
View on GitHub

Minesweeper

Classic Minesweeper game implementation with clean UI and game logic.

  • Complete Minesweeper game mechanics
  • Recursive cell reveal algorithm
  • Difficulty levels and timer
Game Algorithms
View on GitHub

Haskell Tic-Tac-Toe

Tic-Tac-Toe game implemented in Haskell, demonstrating functional programming paradigms.

  • Pure functional implementation
  • Pattern matching and recursion
  • Immutable game state management
Haskell Functional
View on GitHub

Simple HTTP Server

Basic HTTP server implementation demonstrating networking and protocol handling.

  • HTTP request parsing and response handling
  • Socket programming fundamentals
  • Static file serving capabilities
Networking HTTP
View on GitHub

PentestGPT

A GPT-empowered penetration testing tool for automated security assessments (fork).

  • AI-assisted vulnerability discovery
  • Automated penetration testing workflows
  • Integration with security tools
Security AI Pentesting
View on GitHub

Technical Expertise

Data Science & ML

  • Multi-label Classification (Random Forest, One-vs-Rest)
  • Anomaly Detection (Isolation Forest, Z-score)
  • NLP Pipelines (TF-IDF, SBERT, Text Preprocessing)
  • Feature Engineering (Hybrid Text + Numeric)
  • Model Evaluation & Benchmarking
  • AI Training Data Annotation (RLHF)

Cybersecurity

  • CVE/NVD Data Analysis
  • Threat Categorization (XSS, SQLi, Ransomware)
  • Vulnerability Prioritization
  • Emerging Threat Detection
  • Security Risk Management

Data Engineering

  • ETL Pipelines at Scale
  • Parallel Processing (ThreadPoolExecutor)
  • Real-time Data Ingestion
  • Database Optimization (Pooling, Indexing)
  • API Design (REST, FastAPI)

AI/ML Engineering

  • Voice AI Integration (Vapi)
  • NLP for Entity Extraction
  • Classification Pipelines
  • Sentiment Analysis
  • Multimodal AI Evaluation

Tools & Technologies

Python
TypeScript
Node.js
PostgreSQL
MySQL
Docker
React/Next.js
FastAPI
Express.js
Scikit-learn
Pandas
Jupyter
Streamlit
WebGL
Godot
Haskell
C#
Git
Vercel
OAuth2/JWT
Bash
Java
NumPy
NLTK
Matplotlib
Plotly
Astropy
LangChain
OpenAI API

Certifications

Deloitte Cyber Job Simulation Forage
Data Mining Specialization UIUC
Foundations of Cybersecurity Google
Ethical Hacking from Scratch Udemy

Let's Connect

I'm currently open to full-time opportunities in Data Science, AI/ML Engineering, and Cybersecurity. If you're looking for someone who can turn complex data into actionable systems, let's talk.