Building Agents for Framer

Koen Bok

Published Jun 16, 2026

0 min read

A detailed background article for product builders behind the Framer Agents release, covering what Framer learned about canvas-native AI, CMS, analytics, code, quality, collaboration, speed, costs, community, and external agents.

This is the technical companion to our Framer Agents announcement. If you want the high-level version, read the announcement post.

Here’s the context. Over the past year, we watched agents change programming more than the previous thirty years combined. We’re convinced every creative tool is heading for the same shift, so rather than wait for it to happen to us, we built it ourselves. What follows is everything we figured out along the way.

A quick disclaimer before we start: this is our first design agent. It’s very advanced, and it’s also only the beginning. Six months from now, the agent we’re shipping today will feel slow, dumb, and overpriced. But give it a week and you’ll find it strange that you ever designed without one. This is the biggest change to creative tool interfaces since MacPaint, PageMaker, and Photoshop set the standard back in the 80s. The GPT moment for creative tools is happening right now.

Why a canvas, not a pile of code

Models are already excellent at writing code. The problem is that a giant pile of code is hard to edit, hard to understand, and hard to own if you’re not a programmer. Design is different. For an agent to be useful to a designer, it has to work directly on a canvas, the same place you work, so you can collaborate with it, adjust what it makes, debug what breaks, and ship the result with confidence.

That turns out to be a much harder problem than it sounds, because three very different skills all have to live in the same tool. An agent needs to create, iterate, and debug, and each of those demands a different mindset and workflow. It also has to be collaborative, since the best work comes out of teams, not individuals.

All of that only works with good guardrails. And a canvas worth designing on is connected to everything around it: your CMS, your data, your code, analytics, hosting. To be genuinely useful, the agent has to understand that whole system, not just the visual layer.

Teaching a model to see a canvas

Expressing the tree (and edits): project trees are very verbose, a lot like HTML. Modifications to a design might require updating just a single property on hundreds of nodes at varying depths. Streaming diffs like coding agents is an inefficient system to express this. Instead, the model emits a stream of patch commands where every token spent is directly used, instead of 50% being spent expressing the “replaced” values. The downside is that we have to teach the model this format, which makes the system prompt a bit longer. But the savings are much bigger than the extra instructions, so it’s a net win.

The right context is everything, but because we can’t send the whole project to the agent we make a smart selection based on the page you are editing, your selection, the context tool and previous edits. But sometimes we you want to do advanced things like multiple operations across the entire project. In that case we switch to another mechanism where the agent queries and updates the tree with snippets of JavaScript in the background. That way it can programatically modify your project faster than any model can update it using text.

The hard part is the feedback loop. To design well, the agent has to predict how elements will actually render from its understanding of what properties do in combination, and that gets genuinely tricky with something like inline layout. We close that gap three ways.

The agent gets layout information back as rectangles showing where everything landed on screen. A linter runs a suite of hard design rules covering layout, type, contrast, and accessibility, and hands back instant, absolute feedback on whatever the agent just produced. And when it needs to, the agent can see: we render what it made into actual pixels using a browser on a server. That last one is the slowest path, so it only happens on request, though you can always ask for it, or just paste a screenshot yourself.

Beyond seeing its own output, the agent reads the patterns already in your project. It looks at the layouts, styles, and colors you’re using across the site and biases toward the ones you’ve defined strictly as styles and swatches. If you don’t have those yet, it’ll help you create them or clean up the ones you have. It also understands how real designers keep a project maintainable, pushing repeating patterns into components and navigation templates and offering to set those up so things stay clean as the site grows.

One small but real tip: the agent loves images. Paste a screenshot of the thing you’re pointing at and your results jump noticeably.

Everything the canvas connects to

The CMS behaves mostly like a database, but one tuned hard for websites, which means it’s optimized for reading. Under the hood it gets fairly advanced, with multi-references, column types, unique slugs, fast index-based querying, and search. The agent handles all of it, so you can hand off CMS management entirely. Because it’s strong at understanding and referencing data, it can suggest good titles and slugs based on what an article actually says, and since it knows the rest of Framer and a fair amount of SEO, it’ll set up redirects for old slugs and tune new titles while it’s at it.

It also ships with tools to fetch other sites, download and parse data, and read CSVs, so turning messy content into clean structured data your whole team can maintain is one of the things it’s best at. A few weeks back we rebuilt the CMS UI around a grid, mostly so you could select cells, rows, and columns and pass them straight to the agent as context. It works better than we expected.

Analytics is where this gets fun. The Framer platform already tracks a lot about your visitors, done in a GDPR-friendly way, and the dashboard gives you a solid overview. But a dashboard can only answer the questions someone thought to build into it, which is exactly why we think the agent is the future of analytics. Built on our ClickHouse setup, it can query anything we track, any way you want, in near real time.

Ask about a specific referral or campaign cut by country, or how one page’s popularity shifted over time for a particular group of visitors, or just for the broad trends. It gets genuinely powerful when you point it at a single page and ask how to improve the CTA, the conversion rate, or the SEO. The suggestions come back grounded in your real numbers instead of vibes.

Settings is the unglamorous one, but it matters. The agent can control most of your site settings, which helps when you’re wiring up a domain through some awkward DNS provider, chasing down an optimization error, or generating JSON-LD metadata from your CMS for specific pages. All of that is now something you can just ask for.

Code is where the ceiling disappears. Framer sites are full React and can render any React component, so anything you can’t do visually you can write. Custom login state in the nav, a dynamic pricing calculator, hand-rolled WebGL effects, whatever you need. What makes this nicer in Framer is that components can expose controls you configure visually in the UI, and you can reuse them anywhere. We put real effort into making the agent good at writing these components. It adds the controls automatically, debugs its own errors, optimizes for performance, and stays SSR-compatible so your pages still pre-render.

There’s also a quieter capability worth calling out. Agents love terminals, because a terminal hands them every unix tool at once: pull something off the internet, write to disk, transform data with a quick script. It turns out the browser can do most of that on its own through fetch, eval, and OPFS, the browser file system. So the agent can scrape data from a site, filter it, format it, and drop it into your CMS without ever leaving the browser.

Start for free

Start for free

Getting the quality right

The hardest part of this whole project wasn’t capability. It was quality, and quality has a few separate problems hiding inside it.

The first is creativity, and it’s a strange one to complain about. Claude tends to produce a fairly narrow band of creative outcomes, all of them extremely polished. The work looks great, but spend enough time with it and you’ll notice everything starts to rhyme. Our biggest lever here was starting from a design plan, basically the plan mode you’ve seen in Claude or Codex, where we help the model choose layout, fonts, colors, and what content to generate before it commits. That alone produces more variety and far fewer half-finished results.

We also got the agent good at recreating and remixing images and whole sites. We debated quietly seeding it with background images so you’d get more surprising output by default, but we’re a pro tool, so we decided designers should learn to drive this themselves and keep full control. The practical takeaway: if you want more creativity out of the model, show it images of the direction you want to explore.

The second is taking direction, because humans jump between levels of abstraction constantly and the agent has to keep up. ”Make exactly this thing red” is just the property panel in words. ”Fix the layout” means first working out which layout you mean, then proposing a change you’ll actually like. ”Make it nice” forces the agent to get creative, ask for direction, or float a few options. And ”fix this bug” drops it out of creative mode entirely and into problem-solving mode. Same tool, four completely different jobs.

Underneath the direction-following sits a skill-based system. The agent figures out what kind of task you’re attempting and dynamically loads the detailed manual for it. We’re planning to open that skill system up to everyone soon.

Then there’s stability, which is genuinely hard because agents are non-deterministic. Ask the same thing twice and you get two different answers. So every night we run a large set of evals, each one multiple times. Some are trivial, like checking whether a prompt actually changed an element’s height. Others are full end-to-end runs judged by a second model. The point is to prove the agent got better or worse instead of trusting our gut after three good sessions in a row, which is a very easy trap with agents.

We’ve also made the harness semi self-improving. Each day a sample of real sessions gets analyzed for quality to surface things like ”here’s the goal, but it got stuck.” Another model proposes a fix, usually some mix of system prompt, skills, and tool changes, and we re-run the evals to confirm it actually helps. We can run that loop fast enough that the harness gets visibly better day over day.

One thing to note: Creatives are right to worry about AI stealing their work, so let’s be clear: we don’t train the agent on your designs unless you explicitly allow it. The model only knows what you give it, and that context won’t show up anywhere else. In return, respect other people’s work too. The agent can take inspiration from public design, but you still need to use judgment. Don’t copy someone’s design. It backfires, it can get you in legal trouble, and AI doesn’t change those rules.

What it feels like to use

Framer is built for pros, so we expose all the controls and stay transparent. Pick the model you want, prompt however you like, and pull up full debug info on any message.

We’re launching with a small set of frontier models rather than all of them, and the reason is more interesting than it sounds. Models today differ in ways that go well beyond their output. They call tools differently and they stream differently, so each one needs the harness tuned for it before it’s worth shipping. We treat models as tools, and the plan is to widen that toolbox steadily over time.

For streaming, the feeling we’re chasing is looking over a designer’s shoulder. Fast enough to be fun, with the option to follow along without being forced into full spectator mode, and real-time progress on the canvas as it happens. That last requirement shaped the architecture: instead of dropping one giant edit, the model makes many small edits that add up to the final design.

Collaboration works about how you’d hope. Take turns with the agent, or keep working while it works, even with several people in the project at once. You can also run multiple agents on the same design simultaneously, though if you want to work across branches you’ll need a second tab.

Branching is also our answer to the trust problem. Agents can undo their changes, but they’re new and they make a lot of edits at once, so people don’t yet trust them the way they trust normal editing. So we added rollback on every message, and then went further. You and your agents always work on a copy of your project, never the live version. Happy with the result? Merge it. Not happy? Throw it away. Copies are cheap and switching between them is instant, which means you can experiment as recklessly as you want without ever putting your live site at risk.

The last piece is errors, and here we’ll be honest. There isn’t enough compute to go around, and inference is unreliable compared to almost any other kind of computing. That’s fine for some products, but in Framer a stall knocks you straight out of flow. So we automatically switch providers based on speed and availability and lean on well-tuned retry logic to keep the agent moving whenever it’s possible.

Making it fast

Project trees are verbose, very much like HTML, and verbose means many tokens (expensive). We shrink them two ways. First we shorten property names, so backgroundColor becomes bg. Second we drop any value that already matches the default, so there’s no reason to spell out a white background when white is the default anyway. The cost is a longer system prompt to teach the model this format, but the token savings dwarf the extra instructions, so it nets out clearly in our favor. Trees and JSON are both heavy enough that we ended up inventing our own language to express the tree and its edits, stripping every character it can and leaning on those same shorthand names and known defaults.

A separate speed question is tool calling versus inline completion. Models usually have both a tool-calling path and a normal completion path. Tool calling forces the output into a specific JSON schema, which makes it far more reliable, but it’s typically slower because the model has to stay inside that structure and the workflow adds overhead. You can also just ask the completion endpoint for JSON and it’ll usually comply, which is faster, at the cost of the occasional malformed response or missing field. If you can repair those mistakes the way a browser repairs broken HTML, the speed gain is worth it, and that’s the route we took for some models.

Thinking depth surprised us. For coding, high versus low thinking matters a lot. For our design evals it made no material difference, almost certainly because design isn’t really a logic problem. We also realized that if we shipped a thinking setting, everyone would crank it to max anyway, which here just burns money for no benefit, so we cut it entirely for now.

Caching does a lot of quiet work. When the same tool output gets sent back to the model more than once, it pays to keep it stable and cache-friendly, since cached input tokens are usually cheaper and often faster, especially for large repeated context. Keeping earlier messages untouched lets us cache more than 90% of the tokens in a typical session.

As for raw speed, most of the best models run around 50 tokens per second, which works out to a full page in roughly 40 to 100 seconds. That’s already decent. Once you’ve watched it run at 150 tokens per second, though, there’s no going back, and we’ll have that soon.

What it actually costs

Let’s not dance around it: tokens are expensive right now, especially on frontier models, and unfortunately we can’t get around charging for those. As of writing, generating a single page with GPT-5.5 runs about $3, and a medium edit lands around $0.50. Some early testers have spent up to $300 in tokens building a complete site to the point where they were happy shipping it.

Whether that’s expensive depends entirely on what you’re doing. For the person who spent $300, the agent saved two to three weeks of work, so it was an easy trade. If you’re just tinkering on a personal site, it’ll start to feel pricey fast.

Costs vary by model, and you can see the differences in the menu, but the honest reality is that design is harder than simple code, so for now great results mean frontier models. Models also make mistakes, and paying for a mistake feels wrong, even if it mirrors how the real world works, so if you’re unhappy with a result for any reason you’re covered (just click ”Mark as Bad” in the changes menu and we’ll refund your credits).

All that said, here’s what we actually believe: expensive tokens are a temporary blip. We think tokens with real design intelligence are going to be very cheap, or even free, and sooner than most people expect. We have concrete plans to get there. More on that in the next few weeks.

A place for the people using this

A lot more people are about to be building in Framer, and they need somewhere to do more than browse a list of resources. They need one place to share ideas, find talent, show off work, get distribution, and make money. For years we leaned on a patchwork of other platforms to cover that. Starting today, we’re shipping our own.

We also think the chance to earn recurring revenue from templates and resources is about to grow a lot, for a few connected reasons. Agents lower the bar to building a Framer site. The old path often meant picking a template and hiring a freelancer to customize it, and that customization and maintenance work can now go to an agent, which widens the audience for the platform considerably.

At the same time, agents are only as good as the context you feed them, so we expect templates to become design systems for agents, functioning as a style guide or brand book that other pages get generated from. That packages the real human design skill behind a great site into something reusable, which makes templates more valuable, not less, and might even open the door to paying for uniqueness, the guarantee that not too many other people are running the same system.

Big releases coming there. We can also see the marketplace filling out with paid AI resources like advanced skills, tools, and plugins, especially for larger teams that are already paying.

When you’d reach for your own agent instead

Our built-in agent is probably the best way to design in Framer, but there are real reasons to bring your own local harness like Codex or Claude Code. A local agent can touch everything on your machine, from heavy-duty scraping tools to local files and emails you might want to pull from when publishing. It can run on a server to automate things, like converting one system’s data into another’s format or refreshing your site with the latest numbers on a schedule. And it spends the tokens you may have already paid for through a Claude or Codex plan.

So we expose the exact same Framer agent capabilities to any external agent. Install the @framer/agent package and you’re set. It works differently from MCP under the hood, but it gives you the same functionality an MCP server would, done better.

The upsides versus the in-app agent are real, but so are the trade-offs. We can’t hook into a local agent’s streaming, so the design updates in big blocks instead of in real time, which is noticeably more jarring, even though the design capability itself is just as strong. And without a canvas open on our side, we can’t feed the model our rich canvas context, so it leans harder on guessing your intent, which usually means you’ll need to prompt more specifically.

In closing

Building design agents has been a blast for the whole team, and we’ve got a long list of improvements queued up for the rest of the year. Pair those with models that keep getting smarter, faster, and cheaper, and design as we know it is going to change completely. This is just the start.