zkVMs, Circuits, and the Optimization Game
Mauro Toscano, Aligned co-founder and CTO, explores the engineering tradeoffs between ZK circuits and zkVMs and how to think about designing and building these systems

By Mauro Toscano, Aligned co-founder and CTO. Follow Mauro here.
Over the last year, in online discussions and zk conferences, the following questions keep coming up.
“Should we build circuits or zkVMs?”
“If zkVMs, should they use popular CPU instructions like RISC-V?”
These debates often get framed as binary choices, where we’re forced to pick a side with little data and without fully weighing the trade-offs of each alternative.
In practice, optimization, engineering constraints, and security issues all complicate the picture. In this post I’ll argue that the right answer isn’t just about chasing raw theoretical speed. It’s about where the bottlenecks are, how fast teams can iterate, and how resilient the ecosystem becomes.
What follows is an exploration of both discussions: circuits vs. zkVMs, and the future of zkVM architectures, whether custom-designed for zero knowledge or based on general-purpose RISC designs.
There will be some high level overview of how the systems work along the way, but not a deep dive into the math or cryptography. The focus is on scenarios and trade-offs, with an eye toward how time and resources shape the decisions ahead.
ZK Circuits vs zkVMs
The first question that always comes up is if circuits are the way to go, or VMs.
The trivial answer is “circuits can be faster.” That’s true, but it hides most of the story.
Circuits are to zkVMs what assembly is to higher level languages like Rust or C. You can usually squeeze more performance out of assembly, but it’s harder, more error-prone, and takes more time.
Think about building a full Ethereum execution client. What are the odds you make a mistake doing it all in assembly versus in Rust? And what’s the cost in time, money, and people?
Sure, if time was infinite, a great team could probably outdo the compiler. But perfect is the enemy of shipped. By the time you've hand-crafted every optimization, the market may have moved and the code, while fast, may be unmaintainable.
So the VM versus circuit question is a lot like this. Do we use higher level tools that let us move faster, or do we build everything from scratch chasing raw speed.
Of course, I still haven't given enough information to answer this. If circuits gave you twenty times the performance for three times the cost, you might go for them.
So, this brings us to the question: is the performance gain worth the additional complexity?
The fact that many teams are moving from custom circuits for zk-EVMs to simpler architectures points to how teams are currently weighing these factors, and over the next sections, we will explore how those trade-offs can play out.
Optimizing What Matters
In practice, engineering choices are rarely all or nothing. Like with assembly and high level languages, sometimes the right move is to combine both.
In zkVMs this usually means adding custom circuits for the heavy operations, and then letting the general VM handle the rest. To see why that mix can make such a difference, it helps to look at optimizations more carefully, and that is where Amdahl’s law comes in.
Back in 1967 Gene Amdahl came up with a very simple rule of thumb for computer systems. It states that the speedup you get from optimizing one part of the system depends on how much time you actually spend there. If 80 percent of your time is spent in one place, that is where optimizations matter. If it is only 5 percent, it almost does not matter how much faster you make it.
Now think of a zkVM. Let's imagine we have one that is ten times slower to prove than a custom circuit, and we discover that 80 percent of proving time is spent on keccak.
In consequence, we decide to focus our optimization efforts on this bottleneck. We work for some weeks and create an optimized circuit just for keccak and then we embed it in our VM. With this change, it is no longer 10 times slower; now it's only about three times slower than the custom circuit.
Our work continues, and we find another hotspot, modexp. We add a circuit for that too. We repeat this process, and after a couple of optimizations, our VM is no longer 10x slower. Now it's competitive and the overhead is not as relevant as before.
If we ship this machine in one year instead of several, we win.
In practice, this is exactly what we've seen. Custom circuits, like the ones used to represent the EVM, looked better on paper, but in reality they are being replaced by zk-VMs.
They are easier to build, easier to optimize, and because teams can iterate so much faster, in the end they also run faster.
What used to feel unbearably slow is now fast enough to prove Ethereum execution in real time. And that change did not come from one clever trick, but from steady optimizations on a simple and general design.
A small focused team with a simple design outperforms most specialized projects.
The Hidden Cost: Trace Generation
Up to this point, I've focused mainly on proving performance. But there's more to the story than just proving performance. When you actually build a zkVM, proving is just one part of a larger pipeline that may look like:
- Generate the trace, or witness, that you want to prove
- Split that trace into parts and prove each part
- Join the proven parts together
Of course this is simplified, but the key point is that before you can prove anything you first have to generate the trace.
Generating the trace is basically running the computation step by step and writing up a log that has enough information to verify the state at each step. For something like an EVM, that just means executing the blocks and saving a bit of extra data.
This part turns out to matter a lot. Once the prover itself is well optimized, trace generation often becomes the bottleneck. The main reason is that trace generation is inherently sequential. You have to execute each instruction in order to know what the next state will be.
This bottleneck explains why many of the strongest teams working on zkVMs are now putting significant effort into optimizing trace generation. However, this effort is only straightforward if the trace always has the same structure. If you build a perfect circuit for a system, you also need to build an equally optimized trace generator that matches it.
That is why the pipeline as a whole matters, not just the prover itself.
The Instruction Set Question
What about a machine with a custom instruction set? Could we build a virtual machine that's optimized specifically for zero knowledge proofs? Would that give us the best of both worlds?
This approach presents a different set of tradeoffs than the circuits versus VMs debate.
While a zkEVM circuit is usually more complex than a RISC-V zkVM, a custom zk-friendly instruction set could potentially be even simpler and faster than RISC-V. The challenge is that we'll need some way for people to actually build applications in our VM.
Today that usually means one of two things:
- Creating a custom language with a custom compiler
- Creating a custom compiler for a language like Rust to the special instruction set
There's also a third path that's technically viable, though not widely pursued today. Rather than choosing between standard RISC-V or a completely custom solution, you could target RISC-V first and then translate that into your own more efficient instruction set.
This is similar to how many modern CPUs work. The assembly we see is more of an API than the real instructions. The CPU translates those into a simpler internal set that actually gets executed. In the same way, we could have a translator that converts RISC-V into custom instructions.
The hard part with this design is the translation step. Either it has to automatically figure out how to make use of the native features of your specialized VM, or the developer needs some way to pass those hints down through the high level language.
Beyond the tooling complexity, we have to weigh the performance benefits. Some estimates suggest a custom zk-friendly VM could be three to six times faster than RISC-V.
Still, more measurements may be needed to help guide decisions. And once again, the balance between engineering complexity and potential benefits will be one of the key considerations in making a decision between each alternative.
Why ZK Diversity Matters
Up to this point we have talked mostly about engineering complexity and efficiency. But those are not the only things that matter. Once these systems are securing real value, we also have to think about security.
The simpler the system, the easier it is to spot issues and fix them. The more moving parts, the bigger the attack surface.
If we are going to use these verifiable machines to secure Ethereum and our financial systems, the code needs to be something people can actually check. But even so, bugs will happen.
This is why diversity is key. Even if one design proves to be the fastest, Ethereum and the wider ecosystem will be safer if there is no single point of failure. We do not want a future where one bug in a compiler or a shared library is enough to break everything.
As a community, we need multiple teams building different designs. Diversity in architectures, languages, and implementations is what keeps the system resilient.
Having multiple competitive options is what creates real security.