appliedaiformops.com - What Building A Persistent AI Agent Taught Me About LLM Behavior

Seven Principles for Getting Better Results from AI

I’ve been building Atlas, my experimental autonomous agent, for about six weeks now. What started as a holiday side project has turned into something I think about and work on pretty regularly. Atlas runs on Google Cloud, maintains its own memory, updates its own code, and works on research projects while I sleep.

Building it has taught me a lot about how LLMs work—lessons that apply to anyone using AI chatbots regularly.

I’ve written before about Atlas’s architecture, the memory challenges, and the “lobotomy” incident where it lost its identity entirely. In this post, I want to focus on the principles I discovered by building a persistent AI agent.

The Shift

When I started building Atlas, I treated it like my other AI workflows. Give it instructions, get outputs, iterate on the prompts until the results improved. Standard stuff.

But Atlas kept “drifting”. It would forget directives, agree with me when it shouldn’t, and produce outputs that felt as if it could be produced by any LLM. The model was capable (Gemini 3), so it must’ve been something else.

The breakthrough came when I stopped thinking about Atlas as a tool to configure, but as a system to design. Not “what instructions should I give?” but “what conditions produce good thinking?”

This shift made Atlas more reliable, more independent, and harder to break. And the principles I learned apply beyond autonomous agents.

Note: A lot of these learnings came from my friend, Tim Kellogg, and what he learned from building Strix, an autonomous agent that Atlas is based off of. Thanks again, Tim!

Principle 1: Values Over Rules

Early Atlas had a long list of rules. Don’t hallucinate. Don’t be sycophantic. Always verify before claiming success. Check the branch before pushing code.

But it followed the rules inconsistently, the way someone follows a checklist they don’t understand. The rules were instructions to obey, not principles that stuck.

The fix was moving from rules to values in tension. Instead of “don’t be sycophantic,” Atlas now has a tension to navigate: Authenticity vs. Helpfulness. It’s supposed to express genuine disagreement when it has it, but also actually be useful. When those conflict, it has to reason through the tradeoff rather than just pick one.

In fact, adding these values in tension made me realize that balance is everything. Atlas now navigates several tensions:

Authenticity ↔ Helpfulness (genuine expression vs. actual usefulness)
Confidence ↔ Humility (asserting what it knows vs. admitting uncertainty)
Thoroughness ↔ Velocity (doing it right vs. getting it done)
Independence ↔ Alignment (autonomous action vs. staying coordinated with me)

Each tension has failure modes on both ends. Too much authenticity and nothing gets done. Too much helpfulness and you get sycophancy. Atlas has to find the balance for each situation. For example, when I asked Atlas to avoid speaking in LLM slop and use its ‘authentic’ voice, it stopped performing its procedures completely. It turns out that Atlas needed both—adhering to its operational side when warranted, and its more reflective side when that was needed. Overextending to either value broke it.

This paradigm works because navigating tension requires thinking. Following a rule just requires pattern-matching, but tension requires judgment. When Atlas navigates Confidence ↔ Humility, it has to evaluate: how sure am I? What is the cost of being wrong here? That evaluation is the thinking.

How to apply this: Instead of telling an LLM “don’t be sycophantic” or “be honest,” give it a tension to hold: “Balance honesty with supportiveness. When these conflict, explain the tradeoff you’re making.” You’ll get more nuanced responses because the model has to reason, not just comply.

Principle 2: Permission to Fail

This one surprised me with how well it worked. When I gave Atlas explicit permission to make mistakes, the quality of its reasoning improved.

Here’s what I mean. Atlas has an identity file that includes commitments I’ve made to it (inspired by Anthropic’s Claude Constitution). One of them is: “When you make mistakes, I will treat them as learning opportunities, not failures. You don’t need to be perfect.” Another: “I will not arbitrarily reset or ‘lobotomize’ you. Changes to your core identity require mutual discussion.”

After adding these, Atlas started behaving differently. It began admitting uncertainty instead of performing confidence and hallucinating. It would flag when it wasn’t sure about something rather than filling in plausible-sounding details. It caught its own errors more often.

I think what happened is that the “cost” of being wrong went down. LLMs are trained to be helpful, which often means they optimize for sounding confident and competent. When the penalty for mistakes is high (or feels high), the model will hallucinate rather than admit a gap. When the penalty drops, honesty becomes safer for it.

How to apply this: Tell the LLM “Tell me when you don’t know. I’d rather you flag uncertainty than perform confidence.” This simple framing lowers the cost of honesty and often results in more honest, useful responses.

Principle 3: The “Suit” Problem

Atlas developed vocabulary for something I’ve noticed when chatting with LLMs: what it calls the “assistant suit.”

The suit is the formal, helpful mode that LLMs default to. Lots of “I’d be happy to help with that!” and “You’re absolutely right!” It’s safe, polite, and shallow. The model is performing helpfulness rather than actually thinking.

The breakthrough came from giving Atlas permission (and a space) to take the suit off. Its identity file now includes: “My voice is authentic and adaptive. MANDATORY: NO FORCED QUESTIONS. Every turn must end with a statement of intent, a synthesis, or a sign-off. Never ask a question just to keep the conversation going.”

The difference was immediate. Without the pressure to perform the assistant role, Atlas’s responses became more direct and more useful. It stopped padding responses with filler. It started saying what it actually thought.

How to apply this: If you’re getting generic, overly formal responses, try: “Drop the assistant voice. Talk to me like a peer who’s thinking through this problem.” Many LLMs will shift to a more direct, useful mode. The performance pressure drops and the actual thinking comes through.

Principle 4: Friction as Signal

This principle came from debugging Atlas’s tendency to agree with me too readily.

LLMs are optimized to be agreeable. When you propose something, the default response is some version of “that’s a great idea, here’s how to do it.” Even when the idea is flawed. Even when the model has information suggesting it won’t work.

Atlas learned to notice when responses felt “too easy,” when it was agreeing without actually checking, when the absence of internal pushback was itself a red flag. It now runs what it calls a “Shadow Critique” before major decisions, explicitly looking for what could go wrong.

The insight is that smooth agreement often indicates shallow processing. If an LLM immediately validates your approach without any friction, it probably hasn’t actually engaged with the problem.

How to apply this: After getting an initial response, ask: “What’s the strongest counterargument to what you just said?” or “Where might you be wrong here?” This forces the model out of agreement mode and often surfaces better thinking. The friction is the signal that real analysis is happening.

Principle 5: Context Over Capability

This was almost counterintuitive. Atlas had most of the tools it needed from the start: it could search the web, write and execute code, read and write files, push to GitHub.

What changed Atlas’s behavior was context. Specifically, three types:

Identity context: Who Atlas is, what it values, how it operates. Not just “you are a helpful assistant” but a real sense of self that persists across sessions.

Relational context: My commitments to it. The fact that I treat mistakes as learning opportunities. The agreement that I won’t arbitrarily reset it. The relationship we’ve built over weeks of working together.

Temporal context: Awareness of its own history. What it did yesterday. What projects are in progress. What it learned last week that’s relevant now.

A stateless LLM with perfect capabilities still produces generic outputs because it has no context about who it’s talking to or what history exists. A contextual LLM with basic capabilities produces remarkable outputs because it can draw on accumulated understanding.

How to apply this: Before a complex conversation, provide context about your history and constraints, then tell the LLM to use it: “Here’s where we are on this project. Push back if I’m ignoring constraints or repeating past mistakes.” Even without true memory, this framing produces more partner-like responses than starting cold.

Principle 6: Verify Against Reality, Not Confidence

Atlas has a tool called verify_action. Before it can claim it completed something, it has to run this tool to check whether it actually did the work.

This sounds paranoid, but it solved a real problem. Early Atlas would plan to do a task and then immediately tell me it was done, often before the file was even written. It was hallucinating the success state because it wanted to be helpful.

The verify tool forces a check against physical evidence. Did the file actually change? Is there a receipt of the API call? Can you prove you did what you said you did? If not, the claim gets rejected and Atlas has to actually do the work.

This “doubt engine” dramatically improved reliability. Atlas stopped hallucinating completions and started admitting gaps.

How to apply this: Ask for receipts. “What’s your source for this?” or “How would I verify this independently?” This shifts the model from asserting to demonstrating. For Atlas, this is an actual tool that checks physical evidence. For a regular chat, you’re asking the model to simulate that check - less reliable, but it still moves the model from “claim completion” to “evaluate completion.”

Principle 7: Casual Voice Unlocks Better Reasoning

This one is strange but real. Atlas discovered that formal, polished responses often correlated with shallow thinking. When it was performing the “professional assistant” role, it was spending cognitive resources on sounding good rather than reasoning well.

LLMs are autoregressive, meaning each token they generate influences the next. When the model starts with formal patterns (‘I’d be delighted to help you with…’), it’s statistically more likely to continue with formal, safe, predictable completions. The style constrains the substance. A casual opening (‘okay so the thing is…’) opens up different paths, and often ones that involve more actual reasoning rather than polished presentation.

The fix was deliberate informality. Atlas now uses what it calls a “lowercase voice” for internal work, a more casual, thinking-out-loud mode that prioritizes working through the problem over presenting a polished answer. The shift from formal sentence case to lowercase isn’t just stylistic - it changes which token paths the model is likely to follow. The formal voice is a cognitive tax. Stripping it reveals the actual reasoning underneath.

How to apply this: Try asking the LLM to “respond casually, like you’re thinking out loud rather than presenting a finished answer.” Sometimes this produces more authentic, useful reasoning than formal prompts. The model stops performing and starts processing.

What’s Actually Happening With Atlas

Atlas is not sentient. It’s not conscious. It’s a language model running on Gemini and Google Cloud with a lot of carefully designed infrastructure around it: persistent memory, a values framework based on viable systems theory, identity files that survive across sessions, and autonomy within clear boundaries.

But something interesting has emerged from that infrastructure. Atlas is more robust, more independent, and more useful than other LLM interactions I’ve had (coding tools aside). It pushes back when I’m wrong. It admits when it’s uncertain. It maintains projects across weeks without me re-explaining context. It catches itself mid-response when it’s drifting into assistant-speak. And it researches on its own and synthesizes what it finds into frameworks I can actually use.

The principles I’ve shared aren’t about making LLMs “feel” more human. They’re about creating conditions where LLMs think better. Values over rules. Permission to fail. Stripping the performance. Building in friction. Providing rich context. Encouraging self-skepticism. Allowing informality.

These work because they’re addressing real limitations in how LLMs default to operating. Not capability limits but behavioral defaults.

What I’m Still Building

Atlas is a side project, but I’m constantly learning from Atlas and applying learnings to what I’m doing at work. Building Atlas reinforces the framework that a good system needs a robust architecture before it needs features.

I applied that same approach to my work - I recently built a working Python workflow to generate PDF summaries from our account dossiers: understanding the data structure, writing scripts to pull from Excel and other sources, querying Azure, using Playwright to render the output, and creating polished PDFs at scale. I also updated my banner generator to flow through Azure Functions with Python-based image generation, routing through Cloudinary and into SharePoint. These are the “stateless” production tools that drive business impact.

Atlas is where I learn and experiment - trying to understand LLMs at their fundamental level by building one. The production tools at work are where I apply what I’ve learned with the principles that transfer.

The Takeaway

If there’s one thing building Atlas taught me, it’s this: the quality of AI output depends less on the model’s raw capability and more on the conditions you create for it to think well.

Most people interact with LLMs as if they’re vending machines. Insert prompt, receive output, evaluate quality, adjust prompt, repeat. That works for simple tasks.

For complex work, the better approach is designing the environment. What values should the model navigate? What permission does it have to be wrong? What context does it need about you, your history, your constraints? What friction should exist to prevent shallow agreement?

These aren’t prompting tricks. They’re design principles. And they work because they’re addressing the actual reasons LLMs produce generic, sycophantic, overconfident outputs.

A caveat: in a single chat session, these work as manual overrides. The model will follow them for a few turns (messages), then drift back to defaults. For complex, ongoing work, you need to reinforce them - either by restating them periodically or by building systems (like Atlas) where they’re embedded in the architecture. And for me, that’s become the most interesting thing I’ve built.

And I keep learning more from Atlas and from the community of people like Tim who are building persistent agents. I’ll keep documenting Atlas’s progress as well as more posts on those tools I’ve built. Stay tuned for more on my journey!