Data Science · Statistical Modeling · Real-Time ML Systems · Public-Interest Analytics — Chicago, IL

Your data is an asset.
Let's make it act like one.

Data scientist who builds end-to-end — from PDF scraping and ETL pipelines to deployed ML systems and production web applications. Graduate-level coursework in econometrics, computational statistics, and Bayesian reasoning. Ships real things: 6.4M-row campaign finance platform in production, real-time Whisper-powered police scanner with async architecture, interactive crime mapping in a live newsroom, and an honors thesis analyzing 484K Indian villages.

Get a Free Consultation See Past Work View Follow The Money IL
6.4M+
Receipts Rows Ingested
530k+
D2 Totals Processed
1.3M+
Donor Analytics Rows
2
State + Federal Data Sources
Live
Production Scheduled Syncs

What I do

Five core services, all built around the same idea: your data should be working for you, not the other way around.

📊

Dashboards & Reporting

Replace the weekly scramble of pulling numbers from five different places. I build clean, automated dashboards and reports that update themselves — so you always know where things stand.

🧹

Data Cleaning & Analysis

Got years of messy spreadsheets, duplicate records, or data you've never been able to make sense of? I'll clean it, structure it, and surface the patterns that matter for your decisions.

🔍

Web Scraping & Data Collection

The information you need often exists online — in PDFs, government sites, or competitor pages — but there's no download button. I build custom scrapers that collect and structure it for you automatically.

🗳️

Political Data & Voter Targeting

Voter file analysis, precinct-level targeting, historical performance modeling, and field strategy recommendations. Data-driven campaigns win — I help you build the infrastructure for it.

🧪

Simulation & Causal Inference

Need to know what's actually driving your results — not just what correlates? I build Monte Carlo simulations to stress-test assumptions and use causal inference methods to isolate real effects from noise, so you can make decisions based on evidence, not guesswork.

🎙️

Real-Time ML & NLP

Live audio transcription, keyword detection, and alerting systems powered by Whisper speech-to-text with quantized inference, voice activity detection, and fuzzy text matching — fully async architectures built to run in production.

Selected work

A few examples of real projects and their outcomes.

Real-Time ML

DeKalb Scanner Alerts

Real-time police scanner transcription and alerting system. Ingests live Broadcastify audio via FFmpeg, runs Whisper STT with Int8 quantization, detects critical keywords via Aho-Corasick + fuzzy matching, and pushes alerts via email/Slack/WebSocket dashboard. Fully async architecture packaged as a standalone macOS app.

→ 14 modules, ~800 LOC, fully async architecture
Campaign Finance Platform

Follow The Money Illinois  Live Site ↗

Built a public campaign-finance platform that ingests Illinois state bulk filings + FEC federal data, powers searchable dashboards, and surfaces donor/network analytics.

→ 6.4M+ receipts rows, 530k+ D2 totals, and 1.3M+ donor analytics rows; production deployment with scheduled syncs.
Political Data

Voter File Targeting — COD Board Campaign

Analyzed a 600,000-row Illinois voter file to compute precinct-level deviation from Democratic baseline. Built field targeting recommendations from the findings.

→ Campaign earned 40,000+ votes
Reporting Automation

Daily Operating Reports — Healthcare Org

Designed Power Query pipelines and VBA-driven Excel tools for class-booking, patient-management, and daily operating reports. Built R-based automated reporting infrastructure and dashboards.

→ Cut ~10–15 hours of weekly manual work
Web Scraping

Election Data Pipeline — University Research

Scraped and standardized presidential primary election data from 100+ state-party PDF documents into a single, clean, reproducible dataset for ongoing academic research.

→ 100+ PDFs → 1 clean dataset
Data Journalism

Crime Data Scraper & Interactive Map ↗

Built a Python scraper to extract data from 250 PDF crime logs, then analyzed and visualized findings in an interactive R-based crime map ↗ for a university newsroom.

→ 155% increase in daily page views
Simulation & Causal Inference

Monte Carlo Study — ANOVA Robustness

Designed a 700-iteration Monte Carlo simulation in R to evaluate how ANOVA holds up under varying effect sizes and non-normal error distributions. Analyzed Type I error rates and statistical power across conditions.

→ Quantified method reliability under real-world violations
Full-Stack Dashboard

Legislative Bill Stats  Live App ↗

Full-stack dashboard aggregating U.S. Congress and Illinois General Assembly bill activity with searchable dashboards, CSV export, and D3.js network visualizations.

→ FastAPI + SQLite + Chart.js + D3.js
Honors Capstone

Electrification & Welfare in Rural India

Cross-sectional analysis of 484,630 Indian villages (12 merged datasets). MLR testing domestic, commercial, and agricultural electrification effects on consumption. R² = 0.82. 11 academic citations (JPE, World Development, Energy Economics).

→ Found 1.6% welfare increase from domestic electricity access

About

I'm Devin Oommen — a data scientist based in the Chicago area. I graduated from Northern Illinois University in 2025 with honors in Political Science, with graduate-level coursework in econometrics (OLS, 2SLS/IV, causal inference), computational statistics (MLE, Monte Carlo, bootstrap, KDE), and Bayesian reasoning.

I build end-to-end — from PDF scraping and ETL pipelines to deployed ML systems and production web applications. Recent work includes a 6.4M-row campaign finance platform, a real-time Whisper-powered police scanner with fully async architecture, interactive crime mapping in a live newsroom, and an honors thesis analyzing 484K Indian villages.

I'm especially interested in working with small businesses that know they're sitting on useful data but don't have the time or tools to do anything with it — and with political campaigns and organizations that want to make sharper, evidence-based decisions.

Tools & Technologies
Python R SQL JavaScript Bash Pandas / NumPy / SciPy Scikit-learn Flask FastAPI faster-whisper / WebRTC VAD asyncio / aiosqlite Pydantic / RapidFuzz SQLite R Shiny / Leaflet tidyverse / sf / ggplot2 D3.js / Chart.js Tableau Excel / VBA Git / GitHub Azure DevOps Jupyter Quarto / R Markdown / LaTeX BeautifulSoup / Playwright / Selenium pdfplumber / PDF Parsing FFmpeg / Real-Time Audio Data Pipelines / ETL Geocoding / Google Maps API FEC / OpenFEC APIs OLS / 2SLS/IV / Econometrics MLE / Bayesian Inference Monte Carlo / Bootstrap / KDE Causal Inference Whisper STT / NLP Pipelines Geospatial Visualization Jira
Recognition

MPSA Conference Presenter · ICPA News Story of the Year · ICCJA Reporter of the Year · Peters Scholarship for Public Service · Mortar Board Honor Society

Let's talk about your data.

Free 15-minute consultation. No pitch, no pressure — just an honest look at whether I can help.

Email Me
LinkedIn ↗   ·   GitHub ↗