Skip to content

2026-01-14 - Northzone Presentation

  • Software engineer making videogames and the Xbox platform
  • Non-profit co-founder
  • Big tech: software engineering, business development & strategy, program management
  • Office of the CTO:
    • Strategic technology partnerships: OpenAI, AMD, Anthropic, University of Washington, etc.
    • Cross-company AI programs
  • I will say things here that will undoubtedly be shown to be wrong
  • Model training isn’t that interesting, either as a business or an investment. Model capabilities are not a linear function of training resources: they’re related to the log of the resources used to train and run it
  • The cost of AI drops 10x per year on an ‘capability-equivalent’ basis
  • Arthur C. Clarke: 2 fallacies.
  • What you shouldn’t be surprised by: improving capabilities.
  • Don’t bet on being better at the frontier labs on general capabilities (Models that can reason over tables)
  • Cost goes down. Benchmark results go up.
  • Synthetic benchmarks are basically useless. But that’s OK, because labour markets aren’t affected by synthetic benchmarks. They’re affected by changes to business processes.
  • There are only 2 real AI native apps: ChatGPT (2022) and Claude Code (2025)
  • Companies buy solutions, not benchmark results (except devs, who vacilate between hardcore benchmark chasing and ghostly programming vibes)
  • Measuring the performance of frontier models is a sort of wave-particle duality experiment: at any one point in time there’s a leader, but squint your eyes and they’re all much of a muchnessk
  • Agents:
    • nobody has shown any useful multi-agent scenarios
  • Capability overhang: models can do way more than they’re being used for.
  • Coding is the perfect use for LLMs?
  • I would not invest in ‘neoclouds’.
  • Perhaps the most creative part of the AI industry is the infra companies’ financial engineering
  • Training your own models is mostly interesting for domains with certain expertise or data → things that won’t be subsumed by frontier models.
  • Better predictions increase the return to judgment (i.e., determining the relative payoffs from different outcomes). Understand the objective, not just the prediction.
  • In 30 years’ time, AI for scientific R&D will have the greatest positive impact on society and the world in which we live
  • An agent is a piece of software you can delegate a task to.
  • There is no such thing as an AI strategy; there is only business process optimization. Go back to the whiteboard, map out your value chain - especially the messy, human-centric parts involving unstructured data that you previously ignored - find the bottlenecks, identify the waste. Once you have a streamlined, logical, and robust business process, then apply AI.
  • There will be job losses; we can’t predict what the new jobs will be

Segue: Jobs, Tasks, Decisions, the Labour Market

Section titled “Segue: Jobs, Tasks, Decisions, the Labour Market”
  • Insert diagram about the interplay between workflows/tasks/decisions/jobs
  • Explain how most current work is about solutions to replace specific tasks / help with specific decisions. But we need to redesign workflows.
  • Shifts in the labour market will come from changes to business processes
  • Most of the current changes are AI-washing of poor historical management decisions, and are disproportionally affecting the early-in-career job market.
  • Look at this in the context of software development: the cost of writing code has dropped to zero.
  • Reasoning is table-stakes
  • The standard model training process stabilised (again): pre-training → supervised fine-tuning → reinforcement learning from human feedback → reinforcement feedback from verifiable rewards
    • was: 1. Pretraining (GPT-2/3 of ~2020) / 2. Supervised Finetuning (InstructGPT ~2022) / 2. Reinforcement Learning from Human Feedback (RLHF ~2022)
    • add: 3. Reinforcement Learning from Verifiable Rewards (RLVR)
  • Models have got better at reasoning over longer tasks: both 1-shot and within an agent framework
  • The US Administration retreated from any sort of AI safety oversight.
  • Vibe coding entered the mainstream
  • Claude Code really happened: cost of experimentation drove to zero (at least for code generation)
  • People settled into whether they should be using prompt engineering (always), do fine-tuning (occasionally), or training from Scratch (rarely)
  • We’ve made surprisingly good progress on the issue of hallucination, but it’s simply inherent to these artifacts
  • Pattern of 3 broad areas of enterprise use: organizational efficiency, development of AI-powered products, human effectiveness.
  • Meta dropping the open source ball, China picking it up and running with it.
  • Export controls: win the battle, lose the war?
  • AI-attributed downsizing: the acceptable face of job losses?
  • A surprising amount of de facto standards emerged: Anthropic’s Model Context Protocol, AGENTS.md, Agentic AI Foundation)
  • In July 2025, OpenAI said what their priorities are: “knowledge, health, creative expression, economic freedom, time, and support”.
  • The L-word: liability. Internalizing the currently un-priced risk of AI into products. AI models are designed to be stochastic -can we price the associated risk like a financial option?
  • I’ve also thought of sovereign AI as scaremongering designed to sell more GPUs. But that is predicated on the global order of trade not collapsing, so… ¯_ (ツ)_/¯
  • Embodied AI: David Silver’s “Era of Experience”.
  • Connection of learned models with laws of physics. Maybe this comes through tool use.
  • Entitlements: super thorny issue.
  • The Agentic Web: an agent is
  • Synthetic people in real people scenarios: polling, mystery shoppers,
  • Personal, portable memory
  • Codifying the product-driven research loop (insert image)
  • Resolution of the myriad of legal cases in flight in the US with regard to AI
  • Solving continual learning.
  • Device form factors.
  • Robots
  • We’re going to learn a lot more about the importance of profit per token
  • We’re going to learn what a modern software development team looks like: one human for 8 hours per day + N software agents for 24 hours per day
  • IPOs
  • Companies that redesign business processes: OpenAI is still a product company focused on ChatGPT. If you can live where the work is done, access and record proprietary data and systems of record, improve with customer use, or capture distribution before incumbents bundle you as a feature then that’s interesting.
  • Real multi-player experiences

What I Can Help with / Mistakes I See People Make

Section titled “What I Can Help with / Mistakes I See People Make”
  • Experimentation: just run the damn experiment.
  • People don’t realise that OpenAI cares almost exclusively about ChatGPT: their models are designed to support their primary product, and the fact it’s useful for you as a developer is a happy coincidence
  • Thinking AI is any different from other technologies when thinking about organizational change.
  • AI is about driving the cost of prediction to zero; don’t stop there, work out what the complements (data, judgment, action) and substitutes (human prediction) are.
  • Mistaking the rising tide of frontier model improvements for sustainable competitive advantage
  • Organizations’ data and experiences across stakeholders are scarce and valuable.
  • Set a North Star, develop a set of evals, use AI capabilities to run experiments, measure progress towards your North Star

  • High Priority:
    • Big tech: SW engineering, Biz Dev & Strategy, Program Management.
    • Office of the CTO: Strategic partnerships (OpenAI, AMD, Anthropic, etc.) and cross-company AI programs.
  • Low Priority:
    • Software engineer making videogames/Xbox (mention as the “foundation”).
    • Non-profit co-founder.
    • Disclaimer: “I will say things here that will undoubtedly be shown to be wrong.”
  • High Priority:
    • Model capabilities are a log function of training resources (not linear).
    • Cost of AI drops 10x per year on a “capability-equivalent” basis.
  • Low Priority:
    • Model training isn’t that interesting as a business/investment anymore.
    • Cost goes down, benchmark results go up (the general trend).
  • High Priority:
    • AI Strategy = Business Process Optimization (whiteboard the value chain, find bottlenecks).
    • Better predictions increase the return to judgment (determining payoffs vs. just outcomes).
    • AI is about driving cost of prediction to zero; identify complements (data/action) and substitutes.
  • Low Priority:
    • OpenAI is a product company (ChatGPT); build where the work is done/systems of record.
    • Understand the objective, not just the prediction.
  • High Priority:
    • Synthetic benchmarks are useless; labor markets are affected by business process changes, not scores.
    • Companies buy solutions, not benchmark results.
  • Low Priority:
    • The “Wave-particle duality” of frontier models (they are all “much of a muchness” when you squint).
    • Devs vacillate between benchmarks and “ghostly programming vibes.”
  • High Priority:
    • The standard training process stabilized (Pre-training \rightarrow SFT \rightarrow RLHF \rightarrow RLVR).
    • Reasoning is now table-stakes.
  • Low Priority:
    • Progress on hallucinations (still inherent artifacts).
    • De facto standards (MCP, AGENTS.md).
    • Arthur C. Clarke: 2 fallacies / What you shouldn’t be surprised by.
  • High Priority:
    • The cost of experimentation/code generation has driven to zero.
    • “Vibe Coding” entered the mainstream.
  • Low Priority:
    • Only 2 real AI-native apps: ChatGPT (2022) and Claude Code (2025).
    • Coding is the “perfect” use case for LLMs.
  • High Priority:
    • Internalizing un-priced risk: Can we price stochastic risk like a financial option?
    • Resolution of US legal cases (copyright/fair use).
  • Low Priority:
    • US Administration retreat from safety oversight.
    • AI-attributed downsizing: “The acceptable face of job losses.”
    • The “Entitlements” issue.
  • High Priority:
    • The shift from “Growth at all costs” to Profit Per Token.
    • Creative financial engineering of infra companies (and why to avoid “Neoclouds”).
  • Low Priority:
    • The 2026 IPO landscape for AI-native firms.
  • High Priority:
    • The Modern Dev Team: 1 Human (8hrs) + N Agents (24hrs).
    • Workflow Redesign: Move from replacing tasks to redesigning how work flows.
  • Low Priority:
    • Definition: An agent is software you can delegate a task to.
    • Nobody has shown useful multi-agent scenarios yet.
    • Job losses vs. unpredictable new jobs.
  • High Priority:
    • Don’t bet on general capabilities vs. frontier labs.
    • Moats: Domain-specific data/expertise that won’t be subsumed.
    • Mistaking a “rising tide” of model improvement for a sustainable moat.
  • Low Priority:
    • Meta vs. China (Open Source vs. Export Controls).
    • Sovereign AI (predicated on global trade stability).

11. The Advisor’s Checklist (Common Mistakes)

Section titled “11. The Advisor’s Checklist (Common Mistakes)”
  • High Priority:
    • Mistake: Treating AI as different from other tech regarding org change.
    • Solution: Set a North Star \rightarrow Develop Evals \rightarrow Run experiments.
    • Capability Overhang: Models can do way more than companies are currently using them for.
  • Low Priority:
    • “Just run the damn experiment.”
    • Knowing when to Prompt vs. Fine-tune vs. Train (Rarely).
    • The 3 broad areas: Org Efficiency, Product, Human Effectiveness.
  • High Priority:
    • Vet the “AI Strategy”: Is it just a process fix? (Commercial Acumen).
    • Evaluating “Evals”: Ensuring portfolio companies aren’t chasing “ghostly vibes.”
    • Understanding the OpenAI roadmap: Why their focus on ChatGPT is a risk/opportunity for you.
  • Low Priority:
    • Vision: Scientific R&D impact in 30 years.
    • Future bets: Personal portable memory, Embodied AI, Multiplayer experiences.
    • OpenAI’s stated priorities (Knowledge, Health, etc.).

Would you like me to refine the “Slide 3: No AI Strategy” talking points to specifically target how a VC should grill a founder on this topic?

The Narrative: AI is a tool, not a destination. A founder claiming to have an “AI Strategy” is often masking a lack of understanding of their own value chain. As an advisor, I help you see through the “AI-washing” to find the actual process optimization.

When a founder pitches an “AI-first” solution, here are the questions you should use to test their commercial depth:

  1. “Can you draw your value chain and circle exactly where the bottleneck was 12 months ago?”

    • Why: If they can’t identify the specific human-centric or data-heavy bottleneck that existed before they added AI, they are a solution looking for a problem.
  2. “As the cost of this prediction drops to near-zero, what happens to the value of the human judgment in this loop?”

    • Why: This tests their understanding of Prediction Machines. If AI does the predicting, the person (or system) that makes the final decision (Judgment) becomes the most valuable part of the stack.
  3. “Which part of this workflow involves the ‘messy’ unstructured data your competitors can’t access?”

    • Why: Standard AI on standard data is a commodity. Moats are built on the “ignored” data that was previously too expensive to process manually.

Red Flag (The Hype)Green Flag (The Acumen)
“Our model is 20% more accurate than GPT-4.""We reduced the time-to-decision from 4 days to 4 minutes."
"We are an AI-native company.""We redesigned the underwriting workflow to capitalize on unstructured PDF data."
"We use multi-agent swarms.""We identified that 60% of our cost was manual data entry; we’ve automated 90% of it."
"Our strategy is to be the AI layer for [Industry].""Our strategy is to own the system of record and use AI to increase the ‘Profit per User’.”

I often use this framework to evaluate if a company is actually creating value. As prediction becomes cheap, the complements—Data, Judgment, and Action—become the areas where you should actually be investing.

The Insight to Leave Them With:

“Don’t invest in the prediction; invest in the company that owns the Judgment and the Action. The prediction is a commodity that drops 10x in price every year. The decision-making power and the ability to execute on that decision are where the durable margins live.”


Would you like me to create a similar “Grill” guide for Slide 10 (Where the Moats Are) to help them distinguish between “Frontier Tide” and “Sustainable Advantage”?