Everything in this article is my personal opinion. I have never talked with any one from Vercel.
Since the last time I wrote about deploying Next.js on the Cloudflare Edge infra, Next.js has evolved and version 13 has just been released with many new shining features. But not too much has changed at the edge support part.
Can we deploy Next.js at the edge today? Yes. But strictly speaking, only on Vercel. So if you are fine with only being able to deploy your Next.js website on Vercel, you are good to go.
For others who don’t like vendor lock-in, or Vercel doesn’t provide everything you need at the edge (for me, it’s Cloudflare KV, D1), I’ll explain in this article why it’s hard to deploy Next.js at the edge with platforms other than Vercel.
A little background
What is deploying a site at the edge?
Generally speaking, it means rendering the page at the data center that’s closest to the visitor. In this way, you get the best TTFB globally, instead of waiting for a server in Washington DC to generate a response for a visitor in Cape Town. Edge computing is supported mostly by CDN providers since they already have many data centers across the globe.
Since we have so many data centers serving our server, an edge model like AWS [email protected] which supports full node runtime is problematic for most sites with an ordinary amount of traffic. Node runtime has a long cold start time. We need a much lighter runtime that has little cold start time. And here it is — the edge runtime.
Just think of it as a browser js runtime with no DOM. It can not access the filesystem or spawn processes, but it’s super fast and cheap to create.
You deploy your server by uploading a script or a binary that exports a function. It takes an HTTP request, does whatever you want, and returns a response. The infra will handle the rest stuff for you. But your function can only do what the edge runtime allows you to do. Simply put, no node modules. To be clear, you can use all npm packages you want as long as they don’t use node modules.
Is Next.js edge compatible?
To better understand this, we need to first get into the deployment model of Next.js, which is extremely simple. You use either
next build && next export or
next build && next start. The former generates a static website so you can deploy it with any HTTP server. The latter starts a Next.js server to serve contents built with
next build and do all kinds of Next.js magic for you.
This is fundamentally different from many other modern meta-frameworks like remix, sveltekit or qwik. In other frameworks, when you build your website, the output is a full server that takes requests, matches them in the router, finds the right handler and generates the response.
For Next.js, only handlers are generated for each endpoint (a page or an API route), Next.js expects someone will take the responsibility to figure out which handler should take the request and route it there. Generally, you would use the server provided by Next.js to do it locally.
Now, which part is edge-compatible?
For other frameworks, it’s the output, which is the full server, you just need a light wrapper to pass the request to the server handler and return the response. That’s why these frameworks are born with support for almost all edge platforms.
For Next.js, it’s also the output, the handlers. The Next.js server itself is not edge-compatible.
So here is the deal, to deploy a Next.js website (no matter at the edge or not), you either have an environment that can run the Next.js server (i.e., a node runtime), or you can do everything a Next.js server does with your target platform.
Then how does Vercel host our Next.js websites at the edge? Well, Vercel will take the output of
next build and put it on their infrastructure which can do everything the Next.js server does. Vercel’s infra is at the edge (if configured to be), doing what Next.js server is doing — finding the right handler to handle the request.
What’s missing with using Vercel on the edge?
Depending on your needs.
There are a lot of things you can do at the edge with Vercel. For example, you can rewrite the URL based on whether the user is on a mobile device so you can cache two versions of the pages at Cape Town which are rendered on your web server in the US.
However, when you are trying to render pages at the edge, you will most likely get a big problem, where is your DB? If your DB is not at the edge, then the server at Cape Town would still have to call your DB server in Washinton DC to get the data. There is little point in rendering pages at the edge then.
That’s the problem Cloudflare KV and D1 try to solve.
Netlify, fastly and Cloudflare all claim support for Next.js deployment, don’t they?
First, I’m only talking about the edge support of these platforms in this article.
It should be clear now, to run Next.js on edge, you either use Vercel or implement a Next.js server compatible with edge runtime.
Let’s take a look at them and I’ll explain why I don’t recommend any of them.
fastly: fastly implements a full Next.js server (code). Take a look at the 1k+ lines of code and answer the question: how to keep this server’s behavior in sync with the official implementation? I’ll explain why this is so hard later because I have done the same thing.
Cloudflare and netlify: They provide a very light server that supports a small set of features provided by Next.js. The good news is, this server is powerful enough for most use cases (I hope). Bad news, it’s confusing to know what is supported and what’s not. Things may work fine locally (with
next dev) and not after deployment.
Problems with writing your custom Next.js server
A full server
Writing a custom Next.js full server is hard, super hard.
Why? For me, the most difficult part is, I guess most likely due to lacking middleware support back in the day, Next.js chose to support custom headers/redirects/rewrites/basepath/locale configurations. Next.js does pay the price for making this (IMHO, disastrous, though some of them may be necessary to support static site generation) decision, these configurations probably are not used by most users and can be easily and cleanly implemented with a middleware, but they cause a tremendous amount of issues that are hard to track and fix. Just give you a recent issue so you can have a feel of it.
So how does the official Next.js server handle these configurations?
Well, the Next.js server implementation is, emmm…, not very clean. It’s a base server with 2k+ lines and a node server deriving the base server with 2k+ lines, excluding all the imported modules.
Logics are intertwined. Many codes are parsing the URL and headers in different ways. Most of the time you don’t know parsing URLs in so many different ways in so many different places again and again handles what edge case since the comment is so sparse. And git blame got messed up when they renamed and broke the original node server into these two base/node servers two months ago.
The original node server is divided into base and node servers so we can have a web server derived from the base server that is compatible with edge runtime. But don’t get excited, this web server is only used to serve one endpoint. Every endpoint will be built into a web server that only serves this endpoint without doing routing when we run
next build. You still need a server to do routing at the edge.
So basically, to have a Next.js server at the edge, you need to write a server implementing all routing logic in the node server.
Now you can see why fastly’s solution is difficult to maintain. They need to keep the logic of their codes in sync with the 2k+ lines of the node server as it fixes all kinds of bugs and adds new features. Otherwise, you’ll have the dev and prod environments with different behavior.
I did try to implement a full Next.js edge server, and the result is ignext. I’ll explain what I did later. But let’s take a look at another approach.
A light server
If implementing a full server is hard, how about just implementing part of it? For example, get rid of custom headers/redirects/rewrites so things are simpler and users can still do them with middleware manually.
The problem is, users won’t be sure what is supported and what isn’t. Since the prod and dev (use
next dev) environments are fundamentally different, it makes the project super hard to test. You are creating a flavor of Next.js without providing a dev server!
Let’s be honest, most websites don’t have good test coverage since it’s just too hard to test. To make things worse, the only meaningful way to test the website, in this case, is to test it on prod/staging, which will be an integration test that requires extra test settings, usually a costly test service like Checkly. Developers used to test during development but it’s no longer reliable since you won’t be so sure about what you see in dev is what you’ll get in prod.
That’s why I don’t recommend netlify or Cloudflare’s solution.
By the way, the tests of all these three adapters, just like the tests of your website, are quite sparse. So I’m not sure if their adapters are working correctly with their claimed features.
Can Next.js change?
Then, is it possible, that Next.js maintains a server compatible with the edge runtime as other frameworks do so we just need a super light adapter for different platforms?
Well, probably yes.
That’s how I planned with ignext, if I can come up with a good abstraction that can support the node/edge servers with most logic shared maybe we can convince Next.js to adopt it.
To begin with, I copied code from the Next.js base and node servers and refactored them into some smaller modules so navigating through them is possible. Then I created an
IgnextServer that supports most features (I left some obvious implementations as TODO) of a full Next.js server in an edge-compatible way.
To spare you the details, let’s just say there are a lot of challenges, but nothing is blocking us from moving it to the edge. If you are also working on this and want to know more, I can share more details.
Since we proved it’s possible, the next step should be finding the correct abstraction and proposing it to the Next.js team.
But that’s when I found the biggest problem with deploying Next.js at the edge.
It’s not the server, but the fundamental design.
Before I talk about which part of the design conflicts with the edge runtime, let’s talk about another important part of all frameworks nowadays, developer experience (DX).
Till now, I haven’t mentioned too much about how dev server should work in edge runtime. But the rule of thumb is, you should have the same development and production environment whenever possible. Generally speaking, a dev server should have the same behavior as a production server, it is just a little more verbose and does more checkings and assertions to ensure the implementation of the server and your website is correct.
If you use Next.js with
next dev, it uses a dev server that is derived from the node server so they share almost the same logic serving a request, the most significant behavior change you can notice is dev server would inject more stuff for hot reloading. If we deploy the website with
next build, we should be confident that if things work in dev, they work in production.
Now, if we are deploying on Vercel, we expect Vercel would have the same behavior as a Next.js server since Vercel leads Next.js development, and everything is fine.
What about, say, Cloudflare? If we have a fully working Next.js server for edge runtime now, how do we run a dev server?
next dev starts the dev node server in a node environment (the node server will execute edge-only stuff (middleware/edge API) in a sandbox with edge runtime which is the same as Vercel) while in production we would have the whole server run in edge runtime. Even if we have an edge server and a node server sharing all routing logic, we are still running dev and production in totally different environments. It’s just like developing a page, testing only on Firefox and expecting it works perfectly on Chrome and Safari. We hope it’s true, but it’s not.
You may say this is the risk you are willing to take since if the edge and node servers share most code, you are confident the behavior would be the same.
Here comes the next problem, if you are deploying on Cloudflare Pages or Workers, you probably want to use KV, D1, durable object, etc to fully utilize edge computing potential (otherwise you should just use Vercel). The problem is, they are only available on Cloudflare edge runtime, which means it’s available either when you deploy them, or when the code is run with
wrangler (the local dev tool from Cloudflare). So if you try to access KV in
getServerSideProps, it won’t work with
next dev since the code will be run in a sandboxed edge runtime created by the Next.js node server.
Don’t blame Cloudflare for this. Maybe they could provide a library handling this in any js development environment, but they are just enforcing the right thing here: you should develop in the same environment you deploy.
So the right thing to do is to run a dev edge server in the edge runtime when we are developing. Can we do that?
Things are quite complicated with Next.js. The dev server does too many things including instructing webpack to do compilation on the fly. When a request comes in, it first checks if webpack contains that entrypoint and updates the webpack config if not. With the release of Next.js 13, they claim the new Turobopack is even faster in development because “Turbopack only bundles the minimum assets required in development, so startup time is extremely fast.” I’m guessing (haven’t checked the code) that means all modules are lazily compiled so if a request never hits the dev server, that endpoint will never be compiled.
Here is the problem, the compilation is bound to the request. We need to use
wrangler to create the production environment locally for us to run our edge server inside of it, but we won’t get our code compiled unless the request hit the Next.js dev server which cannot serve the request.
It’s nice Next.js trying to do things to make life easier, but sometimes that’s the problem. I learned the same lesson with SpringBoot. Either it works perfectly, or you are doomed.
So what about the other way around, what if we keep running
next build when things change and use the built artifacts with
wrangler to run our edge server locally?
Well, that works. But you lose super fast compilation/recompilation, HMR and every other DX stuffs that Next.js provides with the
next dev. I’ve been there so I cannot go back. Imagine a SPA where a simple CSS class name change causes the whole page refreshes and all states lost.
Someone may say this is still fixable, Next.js just needs to add some config to dev server to compile everything eagerly (maybe it’s already possible with some config, or maybe it’s the current behavior that I got it wrong when reading the 1k+ lines of dev server). There are also some manifest files not persisted to disk during development, that’s also fixable.
What is the real issue here?
So all of these are fixable, what’s the problem?
The problem is these things are actually hard to fix since they contradict with Next.js’s design. Some people may suggest this is all because Vercel wants Next.js to be exclusively deployable on Vercel. I don’t really buy this cynical idea.
The real issue making our goal hard is, IMHO, Next.js is designed to be a full solution for writing your website, while many new competitors not, at least not yet.
Let’s take a look at it. Who handles cache for you? Who optimizes the images for you? Who manages different runtimes for different handlers for you? Next.js does all. Remix does none. Remix (and sveltekit, solidstart, qwik city and many others) is just generating webpages for a website. Next.js is a full solution works out-of-the-box with all best practices packed in to build a website. They are not exactly trying to solve the same problem.
It makes sense. Six years ago, Next.js was so refreshing, but the only way to deploy it was using a node server. I don’t think people had edge runtime back then. Then Next.js server evolved to Vercel service, doing everything Next.js server is doing for you as PaaS. All of this is a super natural progress.
Now, Next.js still aims to provide a full solution, so it manages runtime for you instead of letting youself do it.
For Next.js to provide a edge server is like ask Next.js to provide a subset of features for the user. There is no way Next.js guarantees things work as documented on other platforms. A good example for how stubborn Next.js is, if you use
next export before next 12.3, you cannot even
next export images used in
<Image> without configuring an image optimization service. (I understand the reasoning, but still feel it quite ridiculous)
So in Next.js perspective, you get all or you get none, they don’t want you to be stucked in the middle with any unexpected behavior or suboptimal performance.
I will post this on Next.js discusstion board to see if I can get any response after they survive the Next.js 13 bug bombardment. But personally, I feel very little point in trying to get Next.js work on other edge platforms. It’s just not how it works.
Acknowledgements: Wendell Misiedjan helped me a LOT with ignext. He is not involved in writing this article.