OpenClaw is a promising early adopter toy
Does anyone remember BabyAGI or AutoGPT? They were, in early 2023, the first time that someone wrapped GPT-3.5 in a ReAct loop, as it was then known, and hooked it up to a bunch of tools. They rocketed to the top of the Github Stars leaderboard, but ultimately failed to gain traction. They had the right ideas, but were just too early with regards to model capabilities.
Now we’re back with a new viral sensation! OpenClaw, fka Moltbot, fka ClawdBot. The frontier has massively advanced in the past 3 years.
Overall, I’ve found that it1:
Requires some technical expertise to set up vs. being a fully consumer-ready tool
Isn’t reliable for moderate-stakes work
Is prone to basic reasoning errors / agent harness issues
How do you use OpenClaw?
The “let your tequila-drunk cousin give you a haircut with a chainsaw” approach is to install OpenClaw on your machine and give it access to your personal email, iMessage, WhatsApp, etc. But for me, that misses the point. The interesting thing about OpenClaw is asking: how far can we push agents if we give them full autonomy in a safe sandboxed environment?
So for that, my setup is:
Revitalize an old MacBook Pro
Install Amphetamine so it stays on indefinitely
Install OpenClaw (which took some technical expertise, but fortunately, OpenClaw itself is pretty good at debugging issues once you get past the first step and can talk to it)
Create a separate set of accounts for your agent: give it its own Gmail, Apple ID, etc.
Create shared Google Drive folders / Notion workspaces etc., so you can control what the agent has access to vs. giving it your entire workspace
In 6 months, when this is more reliable, I’ll also give the agent a credit card with a low spending limit.
Reliability
I need the gutters on my house cleaned, so I asked OpenClaw to help. It researched local providers and gave me a list – so far, so good. Then I asked it to contact each provider to get a quote, using the information I’d provided in our shared Notion space. Some providers had contact email addresses, but most required the agent to fill out an online form.
25% of the forms blocked the agent with a CAPTCHA. It asked me for help, so I waddled over to the agent’s MacBook in my closet, and clicked past the CAPTCHA. Then, nothing happened, so I figured I’d just fill out the form myself. As I was midway through doing that, the agent hit the “Submit” button, so the company got a half-finished incoherent message from me. Whoops!
Of the forms the agent was able to fill out on its own, some portion was messed up. I gave the agent ~4 pieces of information to include about me, and it didn’t reliably include them all every time. And when it did attempt to include them, it sometimes malformed them. For instance, I had two links: a Calendly, and a public Dropbox with pictures of my gutter. OpenClaw would sometimes truncate the links, like this:
What I passed: https://calendly.com/my-calendly/cal-id
What the agent submitted in the form: “Please schedule time with me at https://calendly.com”.
Compounding the issue: most of these forms didn’t send a copy of the submission to me, and the agent didn’t write verbatim what it submitted in its status update, so I didn’t know which ones were malformed. So then, when a company reached out to me without using the Calendly, I didn’t know if that was because I’d sent them a malformed message, or if they just weren’t carefully reading it.
So overall the headache of various submissions being broken and requiring more back-and-forth meant that using OpenClaw for this was a net loss.
Reasoning Errors / Agent Harness
I’m taking French lessons, and my teacher sends me flashcards via a not-very-good flashcard app. So the obvious thing to do in 2026 is build my own flashcard app that imports from the teacher’s but is tuned to my idiosyncrasies.
Instead of prompting a coding agent directly, I wanted to see if OpenClaw could handle this at a higher level – create the Vercel deployment, etc.
The first thing it did was wildly undercook. It took a one-time static export of the flashcards and made a self-contained HTML file with them – which obviously wouldn’t work as new flashcards are added.
But beyond that, it struggled with self-awareness / meta-conscientiousness about who it was, who I was, and what we each had access to. It told me that it had opened its locally-hosted HTML file in “my browser”, by which it means its own browser – it was already told that it’s running on its own machine that I generally don’t have physical access to.
So I said, “please sign up for Vercel and host this there”, which sent us down another unfortunate path:
OpenClaw: “I signed up with your email address – what’s the sign-in code you just got?”
Me: “You have your own email address. Use that.”
OpenClaw: “ah of course. What’s my password? I’ve been logged out.”
Me: “<sends a 1Password link>”
OpenClaw: “Got it! My password is correct-horse-battery-staple.”
🤦the whole point of sending a 1Password link was to not put a credential in our permanent iMessage history
After that got sorted out, OpenClaw got confused, as if the agent harness was presenting messages to it out of order. I clarified a few things, then said: “you have everything you need. Please deploy the app.”
“What app? I don’t see anything in my workspace directory. Can you clarify what you’re referring to?”
Le sigh. Our conversation was <2k tokens at that point, so this was the agent harness being pretty badly broken, and failing to provide the right context to the LLM.
All the pieces are in place for OpenClaw to be great. I’m gonna give it another few weeks to patch up some of these issues, then try again.
The other thing I’m very excited for is Ultravox integration – I think it’ll be much better than the currently available providers.
All my testing in this post was done with Opus 4.5.

