Apr 15, 2025·8 min read

Node.js rate limiting libraries for public APIs compared

Compare Node.js rate limiting libraries for public APIs, from simple middleware to Redis-backed options that hold up behind proxies.

Why one client can exhaust a public API

A public API shares the same pool of CPU, database time, cache space, and network bandwidth across everyone who calls it. That sounds obvious, but the failure mode is easy to miss: one noisy client can eat that pool fast, even when total traffic still looks "normal" on a dashboard.

The damage often starts on one expensive route. A simple GET /search or POST /export endpoint might touch the database, build a response, and hold a worker for a second or two. If one client sends 30 requests at once, that single route can crowd out lighter requests from everyone else. Users who did nothing wrong start seeing slow responses or timeouts.

A few patterns cause this over and over:

Burst traffic from a buggy app release, a cron job, or many browser tabs opening at once
Retry loops that keep firing after timeouts, 500 errors, or dropped connections
Simple bot abuse such as scraping, signup spam, or brute force probing on public endpoints

Short spikes and steady abuse hurt in different ways. A short spike fills queues, burns connection slots, and may pass in a minute, but it still creates a wave of failures. Steady abuse is quieter and often worse. A bot that sends just enough traffic to stay under the radar can keep your API busy all day and raise costs without causing one obvious outage.

Auth does not fix this by itself. An authenticated user can still flood a route. API keys get leaked. A partner integration can ship bad retry logic. Some attackers use many accounts instead of one. If the server accepts the request and starts work, overload can happen whether the caller is anonymous or logged in.

This is one reason rate limits need to match real capacity, not just block obvious attacks. Teams that run lean systems learn this quickly. Oleg Sotnikov often writes about reducing waste at the architecture level, and the same idea applies here: protect the shared parts first, especially the routes that do the most work per request.

Without that guardrail, one client does not need to "take down" the whole API to cause real damage. It only needs to keep the busiest path busy long enough that everyone else pays for it.

What to compare in Node.js rate limit tools

Two packages can both say "rate limiter" and still behave very differently under real traffic. For a public API, the first check is simple: what does the tool count, and where does it store that count?

Some tools count every request by IP. That works for open endpoints, but it can punish shared networks in offices, schools, or mobile carriers. Others can count by API key, user ID, or even by route, which gives you more control. A login endpoint, a search endpoint, and a paid data endpoint often need different rules.

The rule itself matters just as much. A fixed window is easy to reason about, but it can allow a burst right before and right after the window resets. A sliding window smooths that out, though it often costs more in storage or logic. A token bucket is a good middle ground when you want to allow short bursts but stop steady abuse.

Storage decides whether the limits stay accurate when your app grows. In-memory counters are fine for one small server or local testing. They break down once you run multiple Node.js instances, containers, or regions. For shared limits, Redis is the usual choice because every app instance can read and update the same counter.

When you compare Node.js rate limiting libraries, check these points:

what identity they can track: IP, API key, user ID, session, or route
which algorithm they use and whether you can tune burst behavior
whether storage stays correct across clusters and restarts
how they send 429 responses, rate limit headers, and retry timing
how they read client addresses behind proxies and with IPv6

That last point causes a lot of trouble. If you run behind Cloudflare, nginx, or a load balancer, the app might only see the proxy IP unless you configure trusted forwarded headers correctly. IPv6 can also create surprises because one user may appear under many nearby addresses. A package that handles proxy-friendly throttling well will save you from both false blocks and missed abuse.

Middleware options for Express and Fastify

If your API already runs on Express or Fastify, framework middleware is often the fastest way to add abuse control. You can wire it in at the app level for broad limits, then add stricter rules only on routes that attract spam, like search, signup, or token creation.

In Express, express-rate-limit is the usual starting point. It is simple, readable, and good for hard caps such as "100 requests per minute per IP". For a small public API on one server, that may be enough to stop one impatient client from eating capacity meant for everyone else.

express-slow-down fits a different job. Instead of blocking right away, it adds delay after a client crosses a threshold. That works well on noisy routes where you want to make abuse expensive before you return a 429. A common setup is soft friction first, then a hard block for clients that keep pushing.

A practical Express pattern looks like this:

use express-rate-limit globally with a moderate cap
add express-slow-down on routes that get hammered
place a stricter hard limit on the same noisy routes

That mix feels better for normal users. Someone who refreshes a page a few extra times may only see a small delay, while a script that keeps firing gets stopped.

If you run Fastify, use @fastify/rate-limit. It fits Fastify's plugin model, so it is easy to apply limits per route or across the whole app. That keeps the code clean, especially when different endpoints need different rules.

Framework middleware stays easy to manage because it lives close to the routes. You do not need a separate gateway just to test a simple policy, and your team can usually understand the setup in one file. That matters when you want to tune limits quickly after real traffic starts.

The catch is the default store. Most middleware starts with in-memory counters, and that breaks once you run more than one app instance. Each replica tracks its own requests, so a client can spread traffic across nodes and slip past the intended cap. Restarts also wipe the counters, which gives abusers a fresh start.

That is why in-memory limits are fine for local testing, low-risk internal tools, or a single tiny deployment. For a public API with replicas, shared counters matter. Redis or another central store turns separate app instances into one consistent rate limiter.

Packages for shared counters and Redis

If your API runs on more than one Node.js instance, in-memory limits break fast. A client can send requests to instance A, then B, then C, and each server thinks the client is still under the cap. Shared counters fix that. Every instance checks the same count, so the limit stays consistent.

rate-limiter-flexible is a common pick when you need that shared view. It works well with Redis and gives you more control than basic middleware stores. That extra control matters when abuse is uneven, like one IP that suddenly spikes, calms down, then spikes again five minutes later.

A simple Redis-backed setup helps with three things:

one limit across all app instances
blocks that survive app restarts
custom penalties for repeat offenders
cleaner handling for short bursts

The gap between packages usually shows up in how they punish and recover. With rate-limiter-flexible, you can set a normal allowance, then add a block duration when a client crosses the line. You can also add penalty points after bad behavior, which is useful for failed login attempts or scraping patterns. Some lighter Express middleware setups can count requests in Redis too, but they often stop at "too many requests" and do not give you much room for adaptive rules.

Burst handling matters more than many teams expect. Public APIs often get small, legitimate spikes from mobile reconnects, retries, or batch jobs. A harsh limit can block normal users. A better setup allows a short burst, then slows or blocks sustained overuse. Shared counters help because every server sees the same burst instead of treating it as three smaller ones.

Redis is worth the extra setup when limits need to stay accurate under load. That usually means multiple app instances, long block periods, or any case where a restart should not wipe the slate clean. If you run one small internal service, memory is often enough. If you run a public API behind a load balancer, quick setup should not win over consistent enforcement.

This is the tradeoff many teams miss: the easy option saves 20 minutes on day one, then leaks capacity every day after that.

Making limits work behind proxies and load balancers

Review Your API Limits

Get a practical review of route caps, identifiers, and 429 behavior.

Book Review

Many Node.js rate limiting libraries look fine in local testing, then fail the moment you put the API behind Cloudflare, nginx, or a load balancer. The usual bug is simple: the app counts the proxy IP, not the real client. Then one busy office, mobile carrier, or gateway can make unrelated users hit the same limit.

Trust the right headers

Your app should read the client address from trusted proxy headers, but only from proxies you control. If you trust every forwarded header from the open internet, anyone can fake their IP and walk around the limit.

In practice, set your proxy trust rules first, then make the limiter use the resolved client IP. Test it with a real request path, not just localhost. A request that goes through Cloudflare and nginx can look very different from a request that hits Express directly.

Use an identifier that fits real traffic

IP alone is often too blunt. A company may send hundreds of users through one outbound gateway. A mobile network may put many people behind the same address. If your API uses keys or tokens, combine them with IP when it makes sense. That gives you a fairer limit and makes abuse harder.

A simple rule works well for many public APIs:

use API key when the client is authenticated
fall back to IP for anonymous traffic
mix API key and IP for sensitive routes like login or search
log the final identifier for every block event

That last point matters more than people think. When support gets a complaint, you need to see exactly what the limiter counted. Log the chosen identifier, the route, and the reason for the block. Otherwise false positives turn into guesswork.

If you run more than one Node.js instance, keep counters outside the app. In-memory limits break fast because each node sees only part of the traffic. A shared store such as Redis gives every server the same count, so the limit stays consistent across replicas and regions.

A quick sanity check helps: send ten requests through the full proxy chain, then confirm that your logs show the same client identity on every node. If they do not, fix that before you tune the numbers.

How to choose a package step by step

Pick a package by matching it to how your API actually runs. When you compare Node.js rate limiting libraries, ignore feature lists for a minute and look at your traffic path. An Express app on one VM can use a simple in-memory limiter. An API spread across several Node.js instances cannot.

If requests can land on different servers, every server needs the same counters. That usually means Redis. This is where packages separate into two groups: simple middleware that works well on one instance, and tools built for shared counters across many instances.

The client identifier matters just as much as the package. IP address is easy, but it often breaks in offices, mobile networks, and any setup behind Cloudflare, nginx, or another proxy. If your API uses login or API keys, use those when you can. For mixed traffic, a blend often works better: API key for signed requests, IP for anonymous ones.

Do not give every route the same budget. A health check or small read endpoint can handle far more traffic than a route that runs a heavy search, creates a report, or calls an LLM. If one endpoint costs 50 times more, limit that endpoint on its own. That keeps cheap traffic moving while you protect the parts that burn CPU, database time, or money.

Then decide how hard the limiter should hit. Some packages only block. Others can slow a client first and block later. For public APIs, that is often the better choice. Small bursts can pass, repeated abuse gets slower, and obvious attacks stop.

Before rollout, test with real proxy headers, not just localhost. Send requests through the same load balancer, CDN, or reverse proxy you use in production and confirm the app sees the right client address. If that part is wrong, even good API abuse control will block innocent users or miss the noisy ones.

A simple rule works well: one server means memory may be enough, more than one server means Redis, and any public endpoint behind proxies needs careful identifier tests before you trust it.

A simple public API example

Make 429s Useful

Set clear retry timing so good clients back off instead of piling on.

Fix Responses

Imagine a public search API with two plans. Free users can run normal search requests. Paid users can search too, and they can also start export jobs that build CSV files in the background.

Most of the time, this works fine. Then one customer ships a bad retry loop after a timeout bug. Their app repeats the same search request over and over, sometimes 20 times for one click. Search latency climbs, workers stay busy, and other customers start to feel it.

A good limit setup treats search and exports differently. Search is frequent and usually light, so free traffic does not need an instant block. A soft delay works better. After a small burst limit, the API still answers, but it waits 200 to 500 ms before sending the response. Real users can keep searching. A scraper or broken client slows down fast.

Export jobs need a harder rule. Each job can hit the database, queue workers, and keep CPU busy for minutes. If free users can create unlimited exports, one script can eat a lot of shared capacity. Put a strict cap on job creation and reject extra requests with a clear 429 response.

A simple set of rules might look like this:

Free search: 60 requests per minute per API key, then add a short delay
Anonymous search: 30 requests per minute per IP, then add the same delay
Free export jobs: 3 per hour, then reject new jobs until the window resets
Paid export jobs: a much higher cap, but still a hard limit

Count by API key first. That keeps unrelated users behind the same office or mobile carrier IP from getting mixed into one bucket. When traffic is anonymous, fall back to IP so the API still has a basic rule.

Store counters in Redis, not in local memory. If the API runs on three Node.js instances, each one needs to see the same request counts and cooldowns. Redis makes that possible, so a client blocked on one instance does not slip through another.

Many Node.js rate limiting libraries fit this pattern well. Good limits keep the API usable for everyone, even when one client behaves badly.

Mistakes that cause false blocks or missed abuse

A rate limiter can fail in two ways. It can block normal users, or it can let abusive traffic slip through. Many teams do both by accident, even with decent Node.js rate limiting libraries.

One common mistake is trusting every forwarded header. If your app accepts X-Forwarded-For from the open internet, a client can spoof its IP and dodge limits with a fake address on each request. Only trust forwarded headers from your own proxy or load balancer. If that trust chain is wrong, your limiter is blind.

A single global limit also causes trouble. Login, search, and heavy report exports do not cost the same. If you give all routes one shared budget, a normal user can hit the cap just by running a few expensive reports. At the same time, an attacker may avoid real pressure points because your rules are too broad. Set stricter limits on expensive or sensitive routes, and softer ones on cheap reads.

In-memory counters look fine in local testing, then break in production. If you run several pods or servers, each instance only sees part of the traffic. One client can bounce across instances and get a much higher real limit than you intended. Shared storage, often Redis, fixes that because every instance reads and writes the same counters.

IP-only tracking can also punish the wrong people. A whole office, school, or mobile carrier may appear as one IP. If one person behaves badly, everyone behind that address can get blocked. That is a bad experience for public APIs. Track more than IP when you can, such as API key, user ID, session, or a combined fingerprint.

Error handling matters too. A plain 500 hides what happened and makes clients retry harder. Return 429 for limit hits, and tell the client how long to wait. Even a simple wait value or Retry-After header helps well-behaved clients back off instead of piling on.

A safer setup usually follows a few rules:

trust proxy headers only from known infrastructure
use separate limits for login, search, writes, and heavy jobs
store counters in shared backend storage
identify clients with API keys or user IDs when possible
return clear 429 responses with retry timing

If a shared office gets blocked after one burst, or an attacker keeps rotating fake IPs without friction, the limiter is not strict. It is just pointed at the wrong thing.

Quick checks before rollout

Harden Busy Endpoints

Protect search, login, and export routes without slowing the whole API.

Start Audit

A rate limit that looks fine in staging can still fail in production for simple reasons. One route gets a strict cap even though it runs a cheap read. Another route stays too open even though it triggers a heavy database query or sends email. Match each rule to the real cost of the route, not to a rough guess.

A public /search endpoint, for example, may need a tighter limit than /status, even if both are GET requests. Login, password reset, export, and webhook endpoints usually need their own rules too. One shared limit for the whole API is easy to ship, but it often blocks normal users and misses the expensive abuse.

Before you switch it on

Check the client identifier next. If your app sits behind Cloudflare, nginx, or a load balancer, make sure it reads the right IP or token and does not rate limit the proxy itself. In Express, that usually means setting proxy trust correctly and testing with real forwarded headers.

Then check where counters live. If limits reset on every deploy or differ across app instances, the protection is mostly fake. For public APIs, counters should usually live in Redis or another shared store so every server sees the same numbers.

A short rollout checklist helps:

Give each expensive route its own rule.
Verify the app records the real client address.
Store counters outside process memory.
Log the rule name, identifier, and every 429 result.
Return clear retry timing, usually with a Retry-After header.

Logs matter more than many teams expect. When a customer says, "your API blocked us," you need to see which rule fired, which identifier matched, and how often that happened. Without that data, tuning turns into guesswork.

Clear responses help good clients recover fast. A plain 429 body with the wait time, plus consistent headers, lets SDKs and scripts back off instead of hammering harder. That small detail saves support time and protects shared capacity at the same time.

Next steps for your team

Start small. Pick one or two routes that can burn the most shared capacity, such as login, search, or a public export endpoint. You will learn more from one noisy path than from blanket rules across every endpoint.

A simple rollout usually works best:

Add one limit for short bursts and one for steady traffic.
Store counters in Redis if you run more than one app instance.
Log the client identifier you block, whether that is an IP, API key, user ID, or a mix.
Return clear 429 responses with retry timing.
Keep the first thresholds a bit loose.

Then watch the data for a full week. False positives show up fast when office networks, mobile carriers, and shared proxies all look like one client. Review every spike in 429s, check which identifiers got blocked, and compare that with normal traffic before you tighten anything.

Set alerts early. A sudden jump in 429s can mean your rules are too strict, but it can also mean someone found an expensive route and started hammering it. Track both the count of blocked requests and the count of blocked identifiers. Those two numbers tell a better story than raw request volume alone.

If you are still comparing Node.js rate limiting libraries, keep the rest of the stack in view. Limits, storage, proxies, and rollout order need to fit together. When that gets messy, a short consultation with Oleg Sotnikov can save a lot of trial and error. His work as a Fractional CTO includes helping teams choose lean infrastructure, proxy-safe limits, and rollout plans that protect public APIs without blocking normal users.