Case Study

Codeword Pipeline

A 6-stage real-time ML pipeline that monitors live audio, transcribes speech, classifies content, and extracts structured data — running unattended in production.

Python scikit-learn TF-IDF Vosk FFmpeg

500+

Identified

Missed

False Positives

6/6

Self-Caught

Failures

The Build Story

The tension between training confidence and production accuracy — and why the metrics you optimise for during development aren't always the metrics that matter.

ML Pipeline · Production Results

I wanted high confidence scores.

The pipeline wanted room to operate.

The tension between training metrics and production accuracy nearly killed a system that turned out to be effectively perfect.

ADH DIGITAL

1 / 6

The System

A 6-stage real-time ML pipeline that monitors a live audio stream, transcribes speech to text, and extracts specific data points — contest codewords announced on a public radio station.

The classifier needs to distinguish true announcements from false positives in noisy, garbled transcription output.

The model configurations that produced the highest accuracy scores during training weren't the ones that produced the most useful results in production.

ADH DIGITAL

2 / 6

The Tension

Tighter constraints improved training scores but degraded real-world results.

Training confidence Production accuracy

Conceptual — based on observed behaviour during development

ADH DIGITAL

3 / 6

Production Results

500+

Codewords identified

Missed announcements

Application failures

False identifications

6 / 6

Caught by pipeline review

Hands off operation

Cron-scheduled to contest hours · automated capture archiving

ADH DIGITAL

4 / 6

The Lesson

The metrics you optimise for during development aren't always the metrics that matter in production.

Functional accuracy — does this system do the thing it's supposed to do, reliably, in the real world — sometimes requires relaxing the numbers that make you feel safe during training.

If your instinct is to tighten everything until the scores look perfect, that instinct might be the problem.

ADH DIGITAL

5 / 6

The system I almost didn't trust has been effectively perfect in production.

How many of your engineering decisions are optimised for your comfort instead of the outcome?

ADH DIGITAL

6 / 6

Full Case Study

The complete case study — covering the 6-stage architecture, classifier design decisions, the training-vs-production tension in detail, and what I'd do differently — is in progress.

In the meantime, the carousel above covers the core story: why the system I almost didn't trust turned out to be effectively perfect.

See CipherRank Case Study All Projects

Codeword Pipeline

I wanted high confidence scores.The pipeline wanted room to operate.

I wanted high confidence scores.

The pipeline wanted room to operate.