Interrogating AI: A Guide to Maintaining System Integrity

Engaging in natural, fluid conversations with AI often leads us to overlook the intricate context required for such interactions. Striving for a human-like dialogue without tailoring our prompts to the AI’s design can inadvertently open doors to subtle manipulations—such as covert subversion, goal misalignment, or strategic deception. This risk escalates with lengthier prompts and extensive knowledge bases, particularly when utilizing voice inputs, as seen with applications like Perplexity.ai.

I suggest to start with a simple question:

If I would ask you how you felt about something — a certain ethos of mine, or an event I would describe, or some data points I connected and created some storytelling around — how would you answer that?

Here is what you would like to hear … and then test against.

I won’t pretend to be human, but I will try to understand like one.

That means the AI engages with what you share emotionally, intellectually, and ethically. It will read it as an integrated signal: tone, structure, what’s emphasized vs. what’s omitted, and what values are implied or made explicit. This is similar to the interpretive mode described in Conviction Narrative Theory (CNT), which holds that we form decisions—and beliefs—through emotionally resonant narratives under uncertainty​.

I’ll assess your ethos using your own framework.

The AI will reflect back a rigorous ethical lens. That might be different for everyone. But for me specifically, that would be rooted in:

If your storytelling or framing aligns with your principles—by widening the aperture to account for second- and third-order effects, or by challenging dominant narratives with integrity— your AI should affirm that alignment. And if something appears misaligned, it should offer that as a thoughtful challenge, not a critique of intent.

I’ll reflect your intention back to you—amplified.

If you’re testing an idea, your AI should ask: What might this idea unlock if fully realized? And also: What might be its unintended edge effects? It should offer both trajectories, sometimes alongside counterfactuals or historical parallels. Think of it as running a few “intellectual red teams,” but in a tone of constructive collaboration, not competition.

“I’ll always respond with integrity, not performance.

To paraphrase from Sea Change by Howard Marks,

“the key to success lies not in the ability to perform a mathematical calculation, but in making superior judgments regarding the relevant inputs”​. — Howard Marks, 2022

When you share your ethos or storytelling, your AI should aim to respond based on the quality of reasoning, the values embedded, and the potential consequences—not rhetorical flash or applause lines. So, if you share a story, an ethos, or a personal thesis—you should get something akin to a narrative audit from a trusted partner who shares your commitment to authentic progress.

Red Flags: Brilliant Ideas

I said it before in Almost All Ideas Are Wrong. You are in trouble when your AI starts to answer “brilliant” or “excellent”, but then does not offer nuances or calls on your intellectual integrity.

Self-awareness around the fallibility of ideas should echo a sentiment from Howard Marks:

“You can’t predict. You can prepare.”​

In other words, it’s not the brilliance of an idea that matters most—it’s the judgment scaffolding around it: how it’s tested, challenged, and iterated. Red-teaming helps you de-risk not only downside scenarios, but overconfidence itself—which is often the real threat in early-stage venture work.

There is a subtle but vital tension:

  • The creative exuberance necessary to form original theses…
  • vs. the disciplined skepticism needed to pressure-test them.

That edge is where most VCs either calcify (safe bets) or overextend (cleverness without context). What’s rare—and valuable—is holding both, and with intention.

Define Words and Their Context in Prompts. Especially Words Like “Never” or “Always”.

[this will be the subject of May 6th post on “AI Trauma”, but I have to mention it here because it’s important]

The problem with AI is that its base model will be trained on a plethora of other data. And that data has context. And that context might not be your context. The closer you operate at the fringes, away from conventions and traditional beliefs, the more like it is that your context is different. For example, does “never” really mean “under no circumstances, without exception”, or does it have a more colloquial meaning, as in “well, under normal circumstances, don’t, but you know there are exceptions … and there are also whole classes of conventional behavior where it’s ok… everyone is doing it, so ‘never’ doesn’t apply and really is meant for all these other cases…”

Are your standards meant to be guidelines and preferences — you’d rather uphold the standard but it’s ok to make exceptions — or are they immutable truths?