Independent AI Assurance

Proof before trust.

When you put AI in front of customers, capital, or regulators — especially the autonomous, agentic systems now taking real actions on your behalf — a single undetected failure can cost more than your entire AI budget. I find it first — through evaluation, adversarial red-teaming, and governance hardened in regulated finance — and hand you the evidence to deploy with confidence.

Built for the stakes of Financial services Healthcare Enterprise & regulated AI

Every team can ship an AI demo. Far fewer can tell you, with evidence, where it fails and how badly.

A model that performs beautifully in a sales demo can hallucinate, leak data, or quietly drift the moment real users — and real adversaries — arrive. The gap between "looks impressive" and "behaves reliably" is exactly where reputations and capital are lost.

I close that gap. I treat your AI the way a regulator, an attacker, and your most demanding customer would — then hand you the evidence to act on, in language your board and your engineers both understand.

What an engagement delivers

Not opinions. Evidence you can act on.

250+
Adversarial probes run against a system before it ships to users.
100%
Of findings reproduced, documented, and prioritized by real business impact.
Pre‑launch
Critical failures surfaced before customers and attackers find them — not after.
Audit‑ready
Every result delivered in a form your board, your auditor, and your engineers can use.

Ways to work together

Start with a test. Stay on a standard.

Every engagement begins with a single assessment and a system stood up around your AI — then continues as ongoing assurance, at the altitude you need.

AssessEstablish & Buildthen ongoing:Assured/Embedded
How it begins
01 — The wedge

Assess

A fixed-scope diagnostic of one AI system — what it gets wrong, and how badly.

  • Evaluation harness against your real use case
  • Adversarial red-team campaign
  • Agentic & tool-use attack coverage (for AI that acts)
  • Governance & risk review
  • Prioritized findings, board- and engineer-ready
Fixed-fee · 2–4 weeks
Best for: a first, low-commitment proof of where you stand.
02 — Stand up the system

Establish & Build

Set the standard, then build the assurance pipeline that enforces it — and hand your team the keys.

  • Your AI testing strategy & "definition of done"
  • Eval + red-team pipeline wired into CI/CD
  • Governance evidence pipeline
  • Hands-on training for your team
One-time project · 6–12 weeks
→ The on-ramp into ongoing assurance.
How it continues
Where most clients land 03 — Ongoing · productized

Assured

Continuous, automated assurance running against your AI — with my oversight on what actually matters.

  • Eval + red-team re-run on every release
  • A live assurance dashboard
  • New attack techniques added and re-tested
  • Monthly findings report + review call
  • Audit-ready evidence kept current
Monthly subscription
Best for: teams that need AI watched continuously, not audited once.
By selection · limited 04 — Ongoing · high-touch

Embedded

Your fractional head of AI assurance — a standing seat at the table.

  • Everything in Assured
  • Release sign-off & live risk triage
  • Leadership & board advisory on AI risk posture
  • Hands-on mentoring for your team
  • Direct line, senior attention
Monthly retainer · a few clients at a time
Best for: organizations where AI is core and the stakes warrant a person, not just a system.

Most clients begin with an assessment, build their pipeline, and settle into Assured — a select few step up to Embedded. That mix is the practice.

Agentic AI assurance

When your AI stops answering and starts acting.

Autonomous agents don't just generate text — they call tools, move money, execute code, and chain decisions across steps. A jailbreak used to be an embarrassment; in an agent, it's an unauthorized transaction. I assure the whole agentic stack against the failure modes built for systems that act, not just answer.

The model

Hallucination, data leakage, prompt injection — the request-and-response layer.

The agent

Goal hijack, excessive agency, memory poisoning, rogue behavior — the autonomy layer.

The tools (MCP)

Tool misuse, privilege abuse, supply-chain compromise — the connectivity layer.

Tested against the frameworks built for AI that acts — the agentic and tool-layer risk taxonomies, not just the model.

Selected work

What it looks like when the unseen work gets done right.

Engagements are confidential; the patterns are not. Each of these is the kind of failure that stays invisible until it's expensive.

Financial services · LLM underwriting

The assistant that could be talked into "yes"

Engaged by — Head of Risk, lending fintech

A fintech was weeks from launching an LLM that pre-screened loan applications. The adversarial assessment surfaced a prompt-injection path that could coax the model into approving applications it should have flagged — invisible in every internal demo.

Caught before launch. Delivered a reproducible eval suite and a governance pack that cleared internal audit on the first pass.
Healthcare · Patient-facing AI

The triage tool that leaked under the right phrasing

Engaged by — VP Engineering, health-tech platform

A patient-intake assistant looked safe — until specific phrasings caused it to surface protected information it should never have returned. The red-team found and reproduced the path, then built it into a test suite that now runs on every model update.

A leakage class that would have been a reportable incident became a test that fails loudly in development.
Enterprise SaaS · Customer-facing agent

The agent making promises the company couldn't keep

Engaged by — CTO, B2B software company

An AI support agent was confidently inventing refund terms and commitments. There was no measurable bar for "acceptable," so no one could say whether it was getting better or worse between releases.

Established the company's first defensible quality bar and cut hallucinated commitments to a level they could stand behind in front of customers.

The math

The right question isn't the fee. It's the exposure.

Premium clients don't ask "what does it cost." They ask "what does it cost me to be wrong." Here's the trade you're actually making.

One failure found in production

  • ×A regulatory finding, a fine, or a consent order
  • ×A breach or data-leak disclosure — and the headline that follows
  • ×A launch pulled back, momentum lost, trust spent
  • ×Months of engineering spent firefighting instead of building

The same failure found in an engagement

  • +A documented finding, fixed quietly, before anyone outside knows
  • +A test that catches it on every future release, for free
  • +A launch you can defend to your board and your regulator
  • +A fee that is a rounding error against the downside it removed

The work is the cheapest insurance you'll buy this year —
and the only kind that makes you faster, not slower.

Why me

I came to AI from the two places that don't tolerate "close enough."

I spent my career hardening software in regulated banking — where a missed defect isn't a bug ticket, it's an audit finding, a fine, or a headline. That environment teaches a specific discipline: assume nothing, prove everything, and write it down so it holds up under scrutiny.

Away from the screen, I build furniture by hand. A mortise-and-tenon joint either holds or it doesn't — there is no convincing demo that hides a weak joint. The same instinct drives how I evaluate AI: the work that matters is the work nobody sees until it fails.

Most AI testing is run by people who learned to test on AI. I bring a career of testing systems where the stakes were already real, and apply it to the newest place those stakes now live.

  • A.Regulated-industry depthQuality engineering leadership built where evidence, documentation, and accountability are non-negotiable.
  • B.Adversarial mindsetI look for the failure mode first. The interesting question is never "does it work" — it's "how does it break."
  • C.Systems thinkingI manage the whole pipeline — risk, coverage, and flow — not isolated test cases. Process is a deliverable, not an afterthought.
  • D.Plain translationFindings written so a board member and a staff engineer both know exactly what to do next.

How I work

A method, not a magic box.

Every engagement runs the same disciplined sequence. You always know where we are and what you're getting.

01 / Scope

Define what "good" means

We agree on the behaviors that matter, the risks you can't accept, and the bar your AI has to clear — before a single test runs.

02 / Probe

Make it reveal itself

I evaluate and attack the system across the dimensions that matter — accuracy, safety, leakage, robustness — and capture exactly how it responds.

03 / Measure

Turn behavior into evidence

Results become scored, reproducible, versioned records — not anecdotes. You can re-run them on every release.

04 / Harden

Hand you the fix path

Prioritized findings, recommended controls, and the governance to keep the bar from slipping after I'm gone.

In their words

What it's worth to the people who own the risk.

He found in three weeks what our team had walked past for six months — and made it impossible to ignore.

Head of Risk · Financial services

The first time anyone handed us a number about our AI that we could actually defend to the board.

CTO · Enterprise SaaS

We brought him in to check a box. He changed how we ship AI.

VP Engineering · Health-tech

About

ReliNep

An independent AI assurance practice — elite evaluation, adversarial red-teaming, and governance, led by a quality engineering veteran out of the regulated-finance world.

Most clients run on a continuous assurance program backed by tooling I've built; a select few hold an embedded advisory seat. Either way, the judgment is mine — the system does the heavy lifting, not a junior bench. Based in the Dallas–Fort Worth area, working with clients anywhere.

If you're putting AI in front of customers, capital, or compliance, and you need to know — really know — that it holds, that's the conversation I want to have.

Start a conversation

Trust is earned in the testing.

Engagements begin with a direct conversation — no forms, no funnel. Tell me what you're deploying and what keeps you up about it.

Assurance subscriptions open · a few embedded seats each quarter

Email me directly