Working with AI
A practitioner's methodology for treating AI as an engineering partner — not a code generator, not an autocomplete, not magic. A partner that requires specification, context, evaluation, and the domain expertise only you can bring.
Everything on this site was designed and built with Claude as an engineering partner. Three iOS apps. A production ML pipeline. A Cloudflare Worker. Design specs, content libraries, cost models, and this website. All in under a month.
The natural question is: did the AI build it, or did you? The answer is both, and the distinction matters. The AI handled syntax, pattern application, and the mechanical work of translating intent into code. I handled architecture, specification, product decisions, domain expertise, quality evaluation, and every decision that required understanding what the system is for, not just how it works.
That split — human as architect and evaluator, AI as implementer and pattern engine — is not a workaround for lacking coding skills. It is the emerging model for how software gets built. The skill is not prompting. The skill is knowing what to build, specifying it precisely enough that an AI can execute it, evaluating whether the output meets the bar, and diagnosing failures when it doesn't.
What follows are the six principles I've developed through building real systems this way. Each one was learned through a specific failure or breakthrough on a real project. None of them are about prompt engineering.
Specify, don't prompt
The difference between getting useful output and getting noise is not in your choice of words — it's in your clarity of intent. "Build me a customer support agent" gets you a generic chatbot. Specifying that you want an agent that handles password resets, order status inquiries, and return initiations, with escalation based on a defined sentiment scoring system, and logging every escalation with a reason code — that gets you a system.
This is not a new skill. Technical writers, lawyers, and QA engineers have done this kind of precise specification their entire careers. The only difference is the audience: instead of specifying for a human who fills in the blanks intuitively, you're specifying for an agent that takes your words literally.
When I designed the AI generation pipeline, the spec didn't say "generate Security+ practice questions." It specified: 3 dynamically selected few-shot examples per prompt, one per difficulty tier, at least one with branching logic. Whitelist-validated inputs only — 28 subdomain IDs and 3 difficulty values. Two-stage validation with retry on schema failure. XP recalculated and overwritten by the server regardless of LLM output. That level of specificity is why the system works reliably at production quality.
Feasibility before features
Before committing to a direction, establish what's actually possible within your constraints. The AI can help you evaluate this — but only if you ask before you start building. Most failed AI projects begin with a solution and discover constraints mid-build. The fix is to lead with the constraints and let the solution emerge from what's actually viable.
This applies equally to technical feasibility ("can iOS do raw packet capture?"), market feasibility ("is there an underserved niche here?"), and economic feasibility ("can we generate AI content profitably at this price point?"). Ask all three before writing code.
The original concept was a full network security toolkit for iOS. My first prompt wasn't "build me a network scanner." It was: "Given iOS platform constraints, how plausible is this?" The AI identified that iOS blocks raw sockets, packet capture, and nmap-style scanning — but gives full access to BLE via CoreBluetooth. I pivoted to BLE auditing before any code was written. That pivot, based on a five-minute feasibility check, became the entire product identity.
Build context architecture
AI has no memory between sessions. Every new conversation starts cold. For a project that spans 34 working sessions — with interdependent specs, design decisions that ripple across documents, and engineering work that depends on locked design choices — this is a critical limitation. The solution is to treat context as an engineering problem, not an afterthought.
I maintain a structured session briefing system: a versioned document that carries the full project state — current status of every component, what was done this session, next priorities, a cumulative design decisions log, a discrepancy tracker for cross-document conflicts, and a ready-to-paste opening prompt for the next session. Every section carries forward. Sections are updated but never dropped.
This isn't overhead — it's infrastructure. The briefing is to multi-session AI work what an internal wiki is to a distributed engineering team: the institutional memory that prevents decisions from being revisited and dependencies from being missed.
CipherRank's session briefing went from v1.0 to v3.4 across 34 sessions. Every version contains: project context, a status table for every component, a detailed record of what was done this session, next priorities with dependencies, a cumulative design decisions log (append-only — decisions are never removed), open questions with deadlines, a discrepancy tracker for cross-document conflicts, a file inventory, working style notes, and a session opening prompt.
The discrepancy tracker alone caught 13 cross-document conflicts — places where a decision made in one session contradicted a spec written in an earlier session. Without it, those contradictions would have silently propagated into the codebase.
For long technical conversations, proactively check context window health before starting a new major phase. After building BLEKit's entire UI layer — 15 new files, 25+ project files read, multiple design discussions — I asked whether to continue into code review in the same session. The right answer was to save everything and start fresh: a review done in a degraded context window misses issues that a fresh conversation with all files loaded simultaneously would catch. Knowing when to stop and start fresh is a collaboration skill, not a limitation.
Evaluate everything
AI is fluently wrong. It produces output that reads well, is structured correctly, and is confidently presented — but may be subtly incorrect. The failure mode is different from human failure: when humans are wrong, they tend to stumble, hesitate, or hedge. AI doesn't. The response looks identical whether it's right or wrong.
The skill here is resisting the temptation to read fluency as correctness. Review every output as if your name is on it. Build automated validation where possible. Audit before shipping, not after. And when you find edge cases — places where the core is correct but the boundaries are wrong — that's where you demonstrate genuine understanding of the domain.
Before attempting compilation on the BLEKit codebase, I ran a systematic review of every file — individually and as an integrated system. Thirteen issues were identified and fixed in one pass: a Sendable protocol contradiction, missing imports, incorrect access modifiers, a byte-ordering assumption, mock code leaking into production targets. None would have been caught by a compiler. All would have caused App Store review rejection or subtle runtime bugs. The audit-before-build pattern has become standard practice on every subsequent project.
The AI generation pipeline doesn't trust its own output. Every generated mission passes through schema validation (hard fail, triggers retry) and then quality/safety validation (content blocklist, text-length floors, placeholder detection). The server recalculates and overwrites XP on every mission regardless of what the LLM returns — because the LLM's arithmetic is unreliable by design. If both validation stages fail, the user gets a curated mission instead. They never see a broken AI response.
Bring what the AI can't
The AI knows patterns, syntax, and documented best practices. It does not know your users, your market, your operational constraints, or what failure costs in your specific context. The highest-value contributions a human makes in AI collaboration are product decisions, domain expertise, and quality judgment that comes from real-world experience.
This is where non-traditional backgrounds become an advantage, not a liability. Understanding what technology failure costs in aviation, manufacturing, or regulated environments gives you a perspective on trust boundaries and blast radius that no amount of engineering training provides. The AI applies patterns; you supply the judgment about which patterns matter.
The AI built the BLE scanning engine and the threat scoring algorithm. But the decision to add a fifth scoring dimension — reappearance detection — came from thinking about how real-world tracking actually works. A device that appears, disappears, and reappears is the strongest single indicator of deliberate following. The AI wouldn't have proposed this because it requires understanding the adversary's behaviour, not just the signal data. Similarly, ephemeral-by-default data handling was a product ethics decision: a tool that audits others' devices should not create a surveillance record of its own.
The original build sequence had AI Generation (Step 6) before Content Authoring (Step 5). I reordered them because the AI generation system needed curated missions as few-shot training examples — a dependency the AI wouldn't have flagged because it didn't know the full project plan. Recognising dependencies between workstreams is a project management skill, not a coding skill, and it's exactly the kind of human contribution that makes AI collaboration work.
Decompose, sequence, right-size
AI agents work differently from people. You can hand a human team a loosely defined brief and they'll figure it out. You cannot do that with an AI. The work needs to be broken into clearly defined units with explicit inputs, outputs, and success criteria. The decomposition itself is one of the highest-value skills in the AI job market — and it transfers directly from project management, systems engineering, and operational planning.
Right-sizing matters too. A task that's too large for the context window produces degraded output. A task that's too small wastes context on overhead. Learning to size work units for the capabilities of the agent you're working with — and knowing when to break a session and start fresh — is as important as the specification itself.
Building two apps on a shared engine could have been approached as one continuous effort. Instead, I decomposed it into three phases: shared foundation first (engine modules, UI components, test targets), then the simpler app (SentinelScan), then the harder app (Overwatch). Each phase stress-tested the layer beneath it. A boundary bug caught during SentinelScan's build was fixed before Overwatch needed the same component. The simpler-first sequencing wasn't obvious — the AI recommended it when asked "which approach produces the highest quality result?"
CipherRank's design phase was decomposed into six sequential steps with dependencies flowing downhill: data model → UX flow → XP simulation → backend architecture → content authoring → AI generation. Each step produced a versioned spec document. Each step's output became the input for the next. The engineering phase then executed against locked specs across 27 build sessions — and the total rework from design-phase decisions was minimal because the dependencies had been mapped before building started.
None of these principles are about AI specifically. Specification precision, feasibility assessment, context management, rigorous evaluation, domain expertise, and task decomposition are fundamental engineering and management skills. They predate large language models by decades.
What's changed is that these skills have become the primary bottleneck. The mechanical work of writing code, generating content, and implementing patterns is no longer the constraint — AI handles that. The constraint is the human ability to specify what should be built, evaluate whether it was built correctly, and bring the domain knowledge that the AI doesn't have.
If you're a technical writer, a QA engineer, a project manager, a lawyer, an auditor, or anyone who has spent a career specifying, evaluating, and decomposing complex work — you already have most of these skills. The gap is shorter than you think.