Justin Jonany

Machine Learning Research Engineer

Justin at a pumpkin patch surrounded by pumpkins
Website was created by vibe coding and several cups of tea

Hey Friends 👋

I'm Justin, a tea-obsessed Machine Learning Research Engineer blending AI with the real world—all fueled by milk oolong.

My mornings: tea + research papers. My afternoons: shipping ML systems that actually work.

Waterloo, ON

About Me

Currently at Thema AI creating end-to-end pipelines, revenue models, and multi-agent systems that help investors discover their next great acquisition.

University of Waterloo seal

Education

University of Waterloo

BMath Data Science

Graduating June 2027

GPA

86.27%

Standing

Term Distinction (All Terms)

Clubs & Leadership

Basketball Intramurals, Volleyball Intramurals, Indonesian Student Association PR Lead, Board Game Designer

Scholarship

University of Waterloo President's Scholarship of Distinction (2023)

Experience

Current

Machine Learning Research Engineer

Thema AI

Ottawa, ONApr 2025 – Present
PythonLanceDBClusteringHTTPXcuMLRerankingPolarsJinja
  • Engineered end-to-end data enrichment pipelines using SERP APIs, web scraping, and LLMs for 6M+ product and company data
  • Lead development of AI-powered acquisition target ranker using multi-agent systems
  • Performed topic modeling and dimensionality reduction (UMAP) to standardize 10M+ data points
  • Developed statistical revenue prediction models with 95% confidence bounds 20% tighter than traditional providers
  • Built concurrent company search engine that identifies 3x more relevant companies than OpenAI while being 4x faster and 10x cheaper

Data Science Intern

Ontario Securities Commission (OSC)

Toronto, ONMay 2024 – Dec 2024
PythonLlama3OpenAIBig DataAzurePrompt EngineeringPyspark
  • Developed graph-based algorithm to identify 500+ suspicious trader groups among 20M+ individuals
  • Engineered hybrid pipeline combining NLP, OCR, LLM, and RAG with 95% accuracy
  • Led development of LLM evaluation frameworks based on academic papers

Remote Web Developer

PT Alto Sentosa

RemoteApr 2023 – Jul 2023
PythonReactWeb DevelopmentFront-EndHTML/CSSJavascript
  • Increased customer engagement by 30% in three months by leading design and implementation of a responsive 10+ page React web experience.
  • Partnered with creative and project teams to iterate on UX flows, driving a noticeable rise in customer inquiries and conversions.

Featured Projects

Spotlight

Forward Looking Active Retrieval Generation (FLARE)

July 2024 – Oct 2024

PythonNLPOCRLLMOpenAIRAG Architecture
  • Developed advanced RAG architecture based on research paper, specializing in long-form text generation
  • Reduced LLM hallucinations by 80% through statistical analyses
  • Increased accuracy of traditional LLM PDF extraction from 91% to 97%

SciDigest

Nov 2023 – Dec 2023

PythonNLPDeep LearningBi-LSTMRNNTensorFlowPyTorch
  • Multi-input deep learning model for structuring scientific abstracts based on research paper
  • 40% faster training on PubMed 200k RCT dataset with 2M sentences through optimization.
  • 90% accuracy with modified Bi-LSTM with token, character, and positional embeddings.
SciDigest architecture diagram

Phylogenetic Analysis of COVID-19 Variants

Feb 2022

BioinformaticsMEGA XNCBI GenBankStatistical AnalysisLiterature Study
  • Constructed phylogenetic trees from Southeast Asian SARS-CoV-2 genomes using NCBI GenBank datasets and MEGA X workflows.
  • Applied the Kimura 2-parameter model with maximum likelihood inference and 1,000 bootstrap replications to validate evolutionary relationships.
  • Mapped viral mutation patterns and geographic spread, revealing Vietnam's variant most similar to the Wuhan reference while Thailand diverged across eight months.
  • Recognized with a Gold Award at LIVI 2022.
Bioinformatics visualization of COVID-19 variants

IndoFoodNet

Aug 2023 – Sept 2023

PythonComputer VisionTransfer LearningTensorFlowTensorFlow
  • Achieved a 94% F1-score on Indonesian Padang food Image classification.
  • Boosted baseline accuracy by 4% after fine-tuning TensorFlow Hub's EfficientNetV2 feature extractor.
  • Engineered data augmentation pipeline to stretch sixty image per class training data.
IndoFoodNet model performance dashboard

AutoPartner

Sept 2023

PythonPytorchDeep LearningEnsemble Models
  • Built a car sales forecasting system that blends classical ML models with custom neural network.
  • Implemented forward/backward passes and gradient descent enitrely from sctach, without TensorFlow or PyTorch, to create a blending model.
  • Achieved 7.6% MAPE—35% better than best Scikit-Learn models—using MLP/DNN architectures with ReLU/Leaky ReLU.

Current Focus

Where I'm investing deep work hours right now, from decentralized data integrity research to better ways of validating unsupervised models in production.

Blockchain

Currently diving into blockchain technology, decentralized systems, and how distributed ledgers plus consensus mechanisms can solve data integrity problems.

A big part studying smart contracts, Web3 infrastructure, and how decentralized rails could be integrated with data pipelines to make them more transparent and trustworthy.

Unsupervised ML Evaluation

The core challenge with unsupervised learning is the lack of ground truth, so I am researching creative out-of-the-box methods to evaluate them

This exploration grows out of insider trader detection at OSC, and to my current work in evaluating company search engine results.

The goal is to build statistically sound evaluation frameworks that make unsupervised models production-ready without eyeballing.

Coursework Snapshot

Core math, statistics, and computer science classes.

Fall 2022 · 1A

Term GPA 89.60%
  • CS 135 Designing Functional Programs
  • MATH 135 Algebra for Honours Mathematics
  • MATH 137 Calculus 1 for Honours Mathematics

Winter 2023 · 1B

Term GPA 86.40%
  • CS 136 Elementary Algorithm Design and Data Abstraction
  • MATH 136 Linear Algebra 1 for Honours Mathematics
  • MATH 138 Calculus 2 for Honours Mathematics
  • STAT 230 Probability

Spring 2023 · 2A

Term GPA 85.00%
  • CS 246 Object-Oriented Software Development
  • MATH 235 Linear Algebra 2 for Honours Mathematics
  • MATH 237 Calculus 3 for Honours Mathematics

Winter 2024 · 2B

Term GPA 85.40%
  • CS 241 Foundations of Sequential Programs
  • CS 245 Logic and Computation
  • CS 251 Computer Organization and Design
  • STAT 231 Statistics

Winter 2025 · 3A

Term GPA 84.75%
  • CS 240 Data Structures and Data Management
  • CS 348 Introduction to Database Management
  • MATH 239 Introduction to Combinatorics
  • STAT 331 Applied Linear Models

Winter 2026 · 3B

Term GPA TBD
  • STAT 332 Sampling and Experimental Design
  • STAT 341 Computational Statistics and Data Analysis
  • CS 341 Algorithms
  • STAT 333 Stochastic Processes 1

Beyond The Keyboard

When I am not ML-ing, you can usually find me steeping new teas, chasing endorphins at the gym or on long runs, and experimenting in the kitchen with bold flavors and recovery-friendly meals. I also love korean tv shows and alternative music!

Tea Enthusiast

Tea Enthusiast

My pantry is a rotating library of lapsang souchong, pu-erh bricks, gyokuro, genmaicha, smoked earl grey... (the list keeps going)

Gym, Running, and Hiking

Gym, Running, and Hiking

Me and my friend hiking in Cold Springs, New York

Cooking

Cooking

Weekends are for reverse-engineering restaurant dishes, replicating dishes from restaurants, and perfecting post-run recovery meals