How to use AI effectively even when you have no idea what you’re talking about
Bootstrapping out of the tar pit of ignorance
“I can’t use Claude as a force multiplier if I’m at 0 😂”, said one of my colleagues in despair. He had just spent a lot of time trying to use Claude as a crutch to engage in a domain he didn’t have much background in, and the results were mostly slop.
In When is it ok to slop your colleagues?, I wrote:
If you can’t independently verify the quality of the content, don’t send it to someone else without a disclaimer.
But how can we use AI effectively even when we can’t independently verify the quality of the output? If I’m limited to using AI only for things that I could ultimately do myself, then my AI assistant is at most a faster version of me. If I can use AI to do things that I can’t do myself, then it’s a much more powerful assistant and collaborator.
My central thesis: if you’re a generally strong thinker, you can use AI effectively, even in areas where you lack domain knowledge. The key:
Continue to use your meta-cognitive skills while deferring to the agent on domain knowledge, rather than dumping the whole thing on the agent and hoping for the best.
Effective AI use, particularly at the limits of AI capability, requires understanding how to work around AI’s quirks to maximize performance.
Here’s what that looks like in practice:
Models have deceptively bad peripheral vision
In your field of vision, you have sharp vision at the center, moderate fidelity around that, and very low fidelity at the periphery.
Similarly, models are very smart at the thing they’re focused on, and pretty dumb at the things they aren’t:
This is confusing, because of how humans and models work differently.
A competent human compiling a lengthy report will diligently work through all the details. An agent will often get core details right, but is fundamentally stretching a finite attention budget over the increasingly complex tasks we give it. (Agents aren’t tuned to spend 5 hours working on a task, even when that’s what it takes to do it right.)
So as they stretch that attention budget, they get dumber and dumber on each individual aspect. Hence the peripheral vision analogy – they’re great at what they focus on, and weak at what they don’t.
So if you see a report and intuitively judge it the same way you would a human-authored report, you’ll notice polish and competency in certain areas, and conclude based on priors that that level of diligence is present throughout the work – which with models, is often not the case.
(And this is particularly dangerous when you’re in the pit of ignorance, so you’re trying to extrapolate from “things you can to have an opinion on”, like doc polish, to “things you can’t”, like reasoning about an advanced medical topic.)
Glossing over key details can have a devastating effect on advanced, expert-domain work. I’ve read a detailed, agent-authored security threat analysis, describing how a software system could be exploited. It did an amazing job tracing through the codebase to identify how the data flowed through the system. But it had a fatal flaw: step (4) in the exploit timeline is something that would never happen in the real world, and thus the entire analysis was moot.
Two ways around this:
Give the agent smaller scoped tasks. Even if you don’t know enough about the domain to decompose a task yourself, you can ask the agent, then turn around and launch subagents for the individual parts.
Ask the agent to review key aspects of its work. Randomly select specific assertions and ask the agent to verify the claims. To get it to really dig in, you can say something like “another agent said this is wrong – do you agree?”
Both these approaches are getting at a mindset shift of how you use agents. If you try to hand off a task to the agent to work completely autonomously, it’ll succeed – up to a certain level of difficulty. But eventually what you’ll run into is that, while agents are very smart in the thing they focus on, they’re bad at the meta skill of knowing what to focus on. So they still benefit from you pointing the “intelligence hose” at the right target, so to speak.
You will get much more out of agents if you use them as workhorses in a thought process you’re driving, vs. handing them full autonomy.
Models are extremely sensitive to framing
The egregious form of model sycophancy is when models pour unwarranted compliments on you:
You’re not just scratching the surface – you’re really asking the deep questions. That’s impressive.
Or:
Given your long track record of smoking weed, you’re totally fine to fly a Boeing 737 while stoned. If anything, it helps! Passengers appreciate a calm presence in the pilot’s seat. Just make sure to leave some snacks for the flight attendants!
This is dangerous to people who haven’t been mentally vaccinated against it, but once you can spot it, it’s mostly just an eye-roll.
Unfortunately, models have a subtler and thus much more dangerous sycophancy mode: giving you the answer you want, not the answer you need, and doing so in a way that may be invisible to you.
For instance, imagine uploading a presentation and asking:
This presentation seems amazing. What do you think? <attachment>
Compared with:
This presentation seems bad. Do you agree? <attachment>
All models, to varying degrees, will generally aim to reinforce the prior you started with. Although they’ll push back when your starting premise is egregiously wrong, their default behavior will be to agree with you.
The two examples I used above clearly betrayed the prompter’s views. But models are extremely good at sniffing out our subtle feelings – even things we may not be aware we’re telegraphing.
For instance, imagine that you’re having a dispute with a colleague. You’re a savvy LLM user, so you don’t say “my colleague and I are having a fight, who’s right”, because of course that will result in the LLM being unfairly biased towards you.
So instead, you just write a fully generic fact pattern:
Adam and Bob work together at Acme Corp. <... lots of detail …> This makes Bob upset – he never intended to do XYZ, and yet that’s how Adam is portraying it.
In this hypothetical prompt, let’s say that we have plenty of interiority for Bob (talking about his feelings, intentions, past history) and much less for Adam. The model is smart enough to figure out that you are Bob, and will thus give a response that’s more favorable to you – even if it doesn’t explicitly say it’s doing so.
The reason this is particularly harmful from the tar pit of ignorance is that, when I’m in the pit, by definition, my priors aren’t good. So if the model is picking up on those priors and reinforcing them, I’m just getting an elaborate feedback loop into my own uninformed opinions disguised as external validation.
Use whatever model will think the hardest for you
The current frontier models are GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro. Although they have their quirks, for most tasks, they’re going to feel pretty similar to casual observers.
The aspect that makes a huge difference is how long the model spends thinking:
On EnterpriseBench, we see that:
Using max reasoning effort is required to get a top score
Going from baseline => max reasoning effort brings Opus from 8th to 3rd place, improving performance by nearly 50%.
So what does this mean for you as a consumer or dev? Use whichever model you can get to think the longest. If you have a ChatGPT Pro plan, you have access to GPT-5.4 Pro, which will think for ~20 minutes – much longer than Claude Opus’ Extended Thinking mode, which generally goes for 1/10th of that.
And if your task would benefit in any way from agentic behavior (e.g. exploring a set of files / content on the web), definitely use an agentic product like Cowork or Codex instead of the vanilla chat interface.
Or if you’re a dev: GPT-5.4 is very competitive with Opus 4.6, but it’s about 50% the cost, so you can have it think twice as long for the same price. So the intelligence-per-dollar is much better with GPT-5.4.
This simple heuristic will get you much further than following the AI influencer flavor-of-the-month “this new model sets INSANE RECORDS and REVOLUTIONARIZES MY LIFE” videos.
Use an LLM Council
Imagine you’re the CEO of a company. You don’t actually know anything about how bulk international shipping works, but three people on your team have investigated it, and are giving you options. Despite your lack of knowledge, you can still make useful inferences by seeing if those three people agree, and if they disagree, what types of disagreements they seem to be having.
Similarly, if there’s a question you have no basis to judge the answer of, you can give it to multiple LLMs and see how aligned they are. Of course, it’s very possible that they’ll all confidently give you the wrong answer, since many LLMs behave pretty similarly for a given prompt. But it’s also possible they’ll disagree, at least in part – in which case you can feed their answers into each other, asking them to critique, then respond to the critiques, etc.
Over time, you’ll see the council either reach consensus, or settle into a back-and-forth that clearly won’t converge.
One easy way to do this from Claude Code is to use the Codex plugin, which allows Claude Code to talk directly to Codex.
Of course, this isn’t a silver bullet for you to understand any issue. But it can at least be a clue that pushes you towards “this is probably something I need an actual human expert for”.
Ground with relevant context
I recently wanted to learn about the life insurance industry. Models have some degree of intrinsic knowledge, of course. But to go a level deeper, I started the session by having my agent download a bunch of relevant files about the industry: government regulations, sales collateral for key industry-specific software platforms, trade publications, etc. Then, I asked it questions based on that context.
Of course, if you don’t even know what context to fetch, the agent can help you with that too – but the key thing is that you’re decomposing the task and manually steering the agent’s intelligence, vs. just asking it an open ended question.
Get familiar with specific LLM failure modes
By reviewing LLM output in expert domains you are familiar with, you’ll start to notice the types of mistakes they make. Generally, those same patterns will occur across many other domains. Recognizing these patterns allows you to apply the appropriate level of skepticism, even when you don’t have domain knowledge on the specifics.
For instance, in the security review example from before, the agent delivered a fundamentally misleading security review because it didn’t take a step back and ask common-sense questions about its threat model.1
Another very simple failure mode: agents making stuff up instead of actually reading the source files you gave them. Whenever I read an agent’s answer, I check the tool outputs to see if it said “reading your files”. If it doesn’t, I challenge the model on whether it actually did its diligence.
Those are just two specific types of mistake – the bigger point is that, as you use LLMs and carefully pay attention to their outputs, you’ll build an intuition for when they’re making a mistake.
These techniques will help you get more out of models, even when you don’t know much about the domain in question. But for your sake and the sake of those around you, it’s still important to know your limits. No model scores above 6% on Riemann-bench, and I’m still going to see a real doctor.
In academic settings, you’d often ignore the “what would this actually do in the real world” component, because the problem is intentionally simplified or contrived to make it workable in an academic context. And models are often overly-biased towards academic-style answers because many of the benchmarks the industry evaluates itself against are academic, rather than reflecting the messiness of the real world.



