Week of 3/19 – 3/25: SFI ML Search, Security Hardening, and Deployment Prep

Overview
SFI Foundation: ML-Powered Spec Search
Security Hardening: Mitigating Vulnerabilities in opencodingsociety.com
What’s Next: Deployment
Summary

Overview

This week was a mix of feature development, security research, and deployment planning. The SFI Foundation capstone project hit a major milestone with the ML-powered spec search going from concept to working implementation, and I spent significant time investigating how to harden opencodingsociety.com against real-world vulnerabilities using tools like Shannon and Brin.

Deployment is the next step — more on that at the bottom.

SFI Foundation: ML-Powered Spec Search

The biggest deliverable this week was building out the machine learning pipeline for the SFI Foundation’s spec search system. The SFI Foundation certifies safety equipment for motorsport drivers — helmets, fire suits, roll cages, fuel cells, harnesses — and each product is tied to a numbered SFI Specification (e.g., Spec 3.3 for Driver Accessories, Spec 16.1 for Roll Bars). Currently, finding the right spec means clicking through 8 separate static HTML pages and scanning dense tables. Our goal is to replace all of that with a single, intelligent search bar.

What I Built

Data Pipeline: I wrote a Python extraction script (scripts/extract_sfi_specs.py) that parses the raw SFI Foundation HTML source files across all 7 product categories (Auto Racing, Drag Racing, Drag Racing Chassis, Fuel Related, Boat Racing, Tractor Pulling & Chassis, and Personal Protective Gear) and outputs a unified JSON dataset at _data/sfi_specs.json. Each entry captures the product name, category, subcategory, spec numbers, associated PDFs, and effective dates. This structured data is the foundation everything else builds on.

TF-IDF Search with Cosine Similarity: Inspectors don’t always search by spec number — they might type “fire suit” when the actual name is “Driver Suits,” or “roll cage” when the official term is “Roll Bars and Padding for Oval Track Racing Cars.” Simple keyword search fails here. I implemented a TF-IDF vectorization pipeline combined with cosine similarity scoring. The model builds a vocabulary from every product name, category, and subcategory in the dataset, then ranks incoming queries against that vocabulary. Using ngram_range=(1, 2) lets the model understand two-word phrases like “roll bar” or “fuel cell” as single concepts.

Synonym Expansion: Motorsport terminology is full of colloquialisms. A “Hans device” is officially a “Head and Neck Restraint,” a “cage” is a “Roll Bar,” and a “fire suit” is a “Driver Suit.” I built a synonym map that expands user queries before they hit the TF-IDF model, handling the most common mismatches without needing a full embedding model.

Naive Bayes Category Classifier: When a query is ambiguous (e.g., “clutch” appears in Auto Racing, Drag Racing, and Tractor Pulling), a Naive Bayes classifier trained on the spec data predicts the most likely category and re-ranks results accordingly.

Spec Number Detection: A regex-first, ML-fallback approach — pattern matching catches clean spec numbers like “3.2A” or “SFI 16.5” instantly (< 10ms), and the TF-IDF pipeline handles everything else.

Why Not an LLM?

With only ~100 specs in the dataset, a TF-IDF model trains in milliseconds and runs searches in under 50ms. An LLM API call takes 1-2 seconds and requires internet access — tracks don’t always have great connectivity. The entire model can run client-side in the browser via a pre-computed index, and when results are wrong, I can inspect the TF-IDF weights and synonym map to understand why. For this problem, simpler ML is the right ML.

Results

Metric	Before (Static Pages)	After (ML Search)
Spec lookup time	30–60 sec	< 5 sec
Pages to navigate	8	1
Handles synonyms	No	Yes
Works on mobile	Barely	Fully responsive
Fuzzy matching	None	TF-IDF + cosine similarity

The full technical writeup is available at /sfi-ml.

Security Hardening: Mitigating Vulnerabilities in opencodingsociety.com

Beyond feature work, I spent a significant portion of this week researching and evaluating security tools to protect opencodingsociety.com from common web vulnerabilities. As an educational platform serving students and teachers, it is critical that the site is hardened against the kinds of attacks that target web applications — SQL injection, cross-site scripting (XSS), server-side request forgery (SSRF), authentication bypasses, and more.

I investigated two tools in particular that are well-suited to our stack and threat model:

Shannon — AI-Powered Autonomous Pentesting

Shannon is an autonomous AI pentester developed by Keygraph that combines source code analysis with live exploitation to identify and validate vulnerabilities. What sets Shannon apart from traditional static analysis tools is that it doesn’t just flag potential issues — it attempts to actually exploit them and only reports vulnerabilities that come with a working proof-of-concept. This means no false positives cluttering the report.

Why Shannon matters for opencodingsociety.com:

White-box analysis: Shannon can read our Flask backend source code and understand the application’s logic — how routes are defined, how user input flows through the system, where database queries are constructed. This lets it identify injection points that surface-level scanners would miss.
OWASP coverage: It targets the full OWASP Top 10 categories — SQL injection, XSS, SSRF, broken authentication, and authorization bypasses — which are exactly the vulnerability classes most relevant to a Flask application that handles user authentication, student data, and API endpoints.
Autonomous operation: Shannon handles complex testing workflows independently, including navigating authentication flows, browser automation, and generating detailed reports — all initiated with a single command. This means we can integrate it into our development workflow without needing dedicated security expertise on the team.
Proven track record: In testing against OWASP Juice Shop (a deliberately vulnerable application), Shannon identified 20+ vulnerabilities including authentication bypasses and data exfiltration issues.

Shannon is available as an open-source tool (Shannon Lite, AGPL-3.0) for local testing, which means we can run it against our own deployment without any cost or data leaving our infrastructure.

Brin — Universal Security Allowlist for AI Agents and Dependencies

Brin takes a different approach to security. Instead of scanning our application code for vulnerabilities, Brin pre-scans the external dependencies and contexts that our application consumes — packages, repositories, web resources, and third-party integrations — before they are executed or trusted.

Why Brin matters for opencodingsociety.com:

Dependency scanning: Brin detects install-time attacks, credential harvesting, and typosquatting in packages. Given that our project relies on numerous Python and JavaScript dependencies, this is a critical layer of defense against supply chain attacks — one compromised dependency could expose the entire platform.
Repository scanning: It scans for malicious commits, agent config injection, and compromised dependencies within repositories. As an open-source educational project with many student contributors submitting pull requests, this helps us catch threats that code review alone might miss.
Contributor analysis: Brin flags sleeper accounts, identity spoofing, and anomalous behavior patterns among contributors. With a large student contributor base, this is valuable for detecting compromised or malicious accounts.
Sub-10ms response times: Brin returns trust scores via a single HTTP GET request with no authentication required. This means we can integrate it directly into our CI/CD pipeline without adding meaningful latency to builds or deployments.
Catches what LLMs miss: According to Brin’s benchmarks, frontier language models miss 57% of threats because they cannot detect graph-based signals like dependency chains and publisher reputation. Brin fills that gap.

The combination of Shannon (offensive testing against our own code) and Brin (defensive scanning of our dependencies and external inputs) gives us a defense-in-depth strategy — we are testing our code from both the inside out and the outside in.

Planned Security Workflow

Pre-merge: Run Brin against any new dependencies introduced in a PR to verify they are safe before merging.
Post-merge: Run Shannon against the staging deployment to validate that no exploitable vulnerabilities were introduced.
Ongoing: Periodic Brin scans of our full dependency tree to catch newly discovered supply chain threats.

What’s Next: Deployment

With the ML search pipeline built, the security research done, and the codebase hardened, the next step is deployment. The plan is to get the full SFI Foundation project — ML-powered spec search, mobile-first redesign, and all — deployed and accessible to real users. This includes:

Deploying the Flask backend with the ML search endpoints live
Deploying the updated frontend with the single search bar replacing the 8 static pages
Integrating the security tools (Shannon and Brin) into the deployment pipeline so that every release is automatically scanned
Monitoring and iteration — tracking which search results users actually select to improve ranking over time, and watching for any security alerts from Brin on our dependency tree

Deployment is the bridge between “it works on my machine” and “it works for the people who need it.” That is where the focus goes next.

Summary

Area	What Got Done
SFI ML Search	TF-IDF pipeline, synonym expansion, Naive Bayes classifier, spec number detection, data extraction script
Security Research	Evaluated Shannon (autonomous pentesting) and Brin (dependency/supply chain scanning) for opencodingsociety.com
Blog & Documentation	Published technical writeup on ML approach, updated capstone page
Next Step	Deployment — getting the full stack live and integrating security scanning into the pipeline