Case Study

CipherRank

A gamified CompTIA exam prep platform with three certification tracks — Security+, Network+, and SecAI+ — scenario-based missions, RPG progression, and an AI generation pipeline with two-stage validation and cost-modelled token economics.

Swift 6 SwiftUI SwiftData CloudKit Cloudflare Workers StoreKit 2 AI Generation

648

Missions

2,242

Decisions

Subdomains

Ranks

$0.051

Per AI Gen

Sessions

The Problem

Passing the CompTIA Security+ exam requires mastering 35+ complex domains — cryptography, network security, threat analysis, access control, and more. Existing study tools are flashcard decks and sterile practice tests that offer no narrative, no stakes, and no sense of progress. Learners study in isolation, disengage quickly, and either cram ineffectively or abandon the certification entirely.

The deeper problem is pedagogical. Security is a decision-making discipline, but every exam prep tool on the market tests recall. Pocket Prep, Jason Dion courses, and Professor Messer are linear, passive, and detached from real-world context. They ask "which protocol uses port 443?" when the real exam — and the real job — asks "your network was breached and here are the logs, what do you do next?"

CipherRank transforms Security+ exam preparation into a progressive RPG experience. Learners are field operatives advancing through a career arc — from Recruit to Command Sentinel — by completing scenario-based missions that map directly to the exam blueprint. Every session is 3–7 minutes. Every wrong answer has consequences and teaches through scenario-contextual feedback. Study stops feeling like work and starts feeling like progress.

Architecture

On-Device (iOS)

SwiftData

7 entities, local-first

State Pipeline

XP, rank, mastery, streak

Mission Engine

Play loop, dual feedback

StoreKit 2

3 tiers, offline grace

CloudKit Sync

Cross-device, event-sourced

648 Missions

Bundled in app binary

AI Targeting

Weakest subdomain logic

Notifications

Daily study reminders

App Attest + StoreKit JWS

Cloudflare Worker (Stateless Edge)

AI Generation

Prompt assembly, LLM call, two-stage validation

Receipt Validation

App Store Server API verification

Rate Limiting

10/day, 150/month per device

Provider-Agnostic Abstraction

Claude Sonnet 4 (Primary)

Best schema compliance for nested JSON

Claude Haiku 4.5 (Fallback)

Cost fallback if usage exceeds targets

Local-First · No Server Database · No User Accounts · Offline-Capable

Event-sourced architecture: Mission Attempts are the append-only source of truth. All player state (XP, rank, mastery, streak) is derived from them. The server stores nothing — it proxies AI generation and validates receipts.

Key Decisions

Six-step design-before-build

Before writing any code, I completed six sequential design phases: data model and progression, UX flow and screen map, XP simulation, backend architecture, content authoring, and AI generation system — plus visual design as a parallel track. Each phase produced a versioned spec document. Each step's output became the input for the next.

This meant I had 120 curated missions validated against the schema before the AI generation prompt was designed — because the AI system needed those missions as few-shot training examples. I reordered the original step sequence when I recognized this dependency. The engineering phase then executed against locked specs rather than discovering design questions mid-build.

Tradeoff: weeks of design work before any visible output vs. zero rework during engineering and no discovered dependencies blocking progress

Three few-shot examples at $0.051 per generation

The AI generation pipeline includes three dynamically selected few-shot examples in every prompt — one per difficulty tier, at least one with branching logic. This costs roughly 7,150 input tokens per call. Two examples would have saved 30% on tokens, but testing showed that dropping below three degraded schema compliance on complex branching missions.

At $0.051 per generation with Sonnet 4, worst-case monthly cost per subscriber is $12.75 (hitting the 250/month rate cap every day). That leaves a 19% margin on the $29.99 Threat Director tier even at maximum usage. If average usage exceeds 150/month after 90 days, the system can swap to Haiku 4.5 ($0.014/gen) without an app update — the provider abstraction layer makes this a config change on the Worker.

Tradeoff: higher per-generation cost vs. reliable schema compliance on the hardest mission type, with a modelled fallback path

Two-stage validation with retry

Every AI-generated mission passes through two validation stages before reaching the user. Stage 1 checks schema compliance — correct JSON structure, valid subdomain IDs, decision point counts within range. A Stage 1 failure triggers one retry with a diagnostic hint appended to the prompt. Stage 2 checks quality and safety — content safety blocklist (no working exploit code, no real entities in defamatory roles), text-length floors, placeholder detection, and XP recalculation.

The Worker recalculates and overwrites XP on every mission regardless of what the LLM returns. The LLM's XP output is treated as unreliable by design. If both attempts fail validation, the device falls back to a curated mission — the user never sees a broken AI response.

Tradeoff: added latency and complexity vs. guaranteed content quality with a graceful degradation path

Zero free-text input to the LLM

The AI generation endpoint accepts exactly two parameters from the device: subdomain_id and difficulty. Both are whitelist-validated — 28 valid subdomain values and 3 valid difficulty values. No free-text user input is ever injected into the prompt. This eliminates the entire category of prompt injection attacks by design rather than by detection.

Three failed whitelist validation attempts from the same device within 10 minutes triggers a 1-hour block. App Attest verifies device authenticity. StoreKit 2 JWS verifies subscription entitlement. The attack surface for the AI endpoint is effectively zero after parameter validation.

Tradeoff: users can't request custom topics (only their weakest subdomain is targeted) vs. an AI endpoint that cannot be manipulated through its inputs

Technical Depth

Cost and token economics

The AI generation system was designed with a complete cost model before any code was written. The question wasn't "can we generate missions with AI" — it was "can we generate missions profitably at every usage level, with a documented fallback if costs exceed projections."

Monthly cost per Threat Director subscriber ($29.99/mo)

Pattern AI Cost Net Revenue Margin

Light (20/mo) $1.02 $16.47 66%

Typical (60/mo) $3.06 $14.43 58%

Heavy (150/mo) $7.65 $9.84 39%

Maximum (250/mo) $12.75 $4.74 19%

The rate cap (150/month, 10/day) ensures costs can never exceed $12.75 per subscriber under any circumstance with Sonnet 4. The 90-day review cycle was built into the spec: if average usage exceeds 150/month, the Haiku 4.5 migration path drops worst-case cost to $3.50 and pushes minimum margin to 54%. The provider abstraction layer in the Worker means this migration requires a config change, not a code change.

Monitoring is designed into the system: every generation logs provider, model, token counts, estimated cost, validation result, retry count, quality flags, and latency. A Cloudflare alert fires if daily AI cost exceeds $50 — signalling unexpected traffic or abuse before it becomes a cost problem.

Session continuity across 34 sessions

CipherRank was designed and built across 34 working sessions with Claude as an engineering partner. The challenge: Claude has no memory between sessions. Every new conversation starts cold. A project this complex — with interdependent specs, design decisions that ripple across documents, and engineering work that depends on locked design choices — cannot survive context loss.

The solution was a structured session briefing system. Every session closes with an updated briefing document that carries the full project state: current status of every component, what was done this session, next priorities, a cumulative design decisions log, a discrepancy tracker for cross-document conflicts, and a ready-to-paste opening prompt for the next session. The briefing is versioned (v1.0 through v3.4) and every section carries forward — the iron rule is that sections are updated but never dropped.

This is a context architecture problem. The briefing system is to multi-session AI collaboration what a well-maintained internal wiki is to a distributed engineering team — the institutional memory that prevents decisions from being revisited and dependencies from being missed.

What I'd Do Differently

The design-before-build approach worked well for architectural coherence, but it front-loaded all design decisions into a period when I had the least context about how the system would actually feel in use. Some decisions made during the UX Flow phase (Step 2) were later contradicted by implementation reality — the free-tier weekly cap, the streak multiplier values, and the feedback mode naming all changed during engineering. The discrepancy tracker in the session briefing caught these, but each one was a small rework.

The XP simulation (Step 3) was valuable for validating progression pacing, but I'd run it again after the content library hit 278 missions instead of only against the original 120. The difficulty distribution shifted as I authored more content, and the simulation was calibrated against the earlier mix.

If starting again, I'd keep the phased approach but build a minimal playable prototype after Step 2 — even before the backend architecture. Playing through two or three missions in a real UI would have surfaced the feedback mode naming issue, the weekly cap friction, and the sequential ordering need months earlier.

View in App Store About Me