Postgres row locking mistakes that cause hidden stalls
Postgres row locking mistakes often hide inside long transactions and mixed write paths. Learn where stalls start, how to spot them, and what to fix.

What hidden stalls look like in real apps
Users rarely report a database lock. They report a spinner that hangs for 8 seconds, a payment that looks stuck, or a form that saves on the second try but not the first. A minute later, the same action feels fine again.
Developers usually see something different. CPU looks normal, the database host looks calm, and the query plan for the slow request does not look wrong. One request turns red in the dashboard, then the next one is green, and nobody can reproduce the problem on demand.
That is why row locks confuse so many teams. A slow query and a waiting query look almost the same from the outside, but they fail for different reasons. One query spends time doing work. The other spends most of its time blocked, waiting for another transaction to release a row.
The app feels slow either way, but the fix is different. An index will not help if the query already found the row quickly and then sat there waiting.
In a real app, the delay looks random because normal traffic mixes many request paths together. Most updates finish fast. Then one request opens a transaction, touches a row, and keeps that transaction open a little too long while the app does extra work. Every other request that needs the same row starts lining up behind it.
A small example shows the pattern. One user updates an order status. During that transaction, the app also writes an audit record, calls another service, and formats a response before commit. Three other users touch the same order in those few seconds. They all look slow, even though their own SQL is fine.
Logs often miss the real cause. Many systems record execution time only after the query finishes, so you get a slow query entry without any clue that 95 percent of that time was lock waiting. App logs can be even less helpful. They tell you the request took 9 seconds, not that the first 8.7 seconds were spent blocked behind another write.
That gap between what people feel and what logs show is why hidden stalls keep surprising teams.
How row locks turn into a queue
One write can make other writes wait. The pattern is simple: one request updates a row, Postgres locks that row, and every other request that wants to change the same row has to pause until the first transaction ends.
That pause is usually harmless. If the first transaction finishes in a few milliseconds, nobody notices. If it stays open while the app does extra work, the line behind it grows fast.
Teams often expect a lock problem to look dramatic, like a whole table freezing. Most of the time, that is not what happens. A row lock blocks one row, not the whole table, so the slowdown feels inconsistent. Some requests stay fast. Others pile up because they all touch the same record.
Hot rows show up in ordinary places: a single account balance, one inventory item with heavy traffic, one job record that workers keep retrying, or one tenant settings row updated on every action.
Now imagine Request A updates that row and then spends 800 ms calling other code before it commits. Request B arrives and waits. Then C waits behind B. Then D joins the same line. Users do not see "locked row." They see slow pages, timeouts, and odd spikes in latency.
The queue gets worse when each waiting request also holds other resources open. App workers stay busy. Connection pool slots stay occupied. New requests start waiting for a database connection before they even reach the locked row. That is how one slow transaction spreads into a wider stall.
Short locks rarely hurt. Long locks do, because waiting is contagious. The row lock itself is small. The real problem is how long the app holds it.
The fix often starts outside the database. Keep the transaction tight. Update the row as late as possible, commit right away, and move slow work like network calls, large loops, or extra reads out of the locked section.
Where teams stretch transaction scope by accident
A transaction often starts small and ends up carrying half the request. The code updates one row, then fetches related data, checks permissions again, calls another service, retries a failed step, and only then commits. Every extra step keeps locks alive longer than the team expects.
Small delays that add up
Trouble usually starts when code opens a transaction too early. A request handler may begin the transaction before input validation, before cache checks, or before it even knows whether it needs to write. From there, the code drifts into work that does not need database protection at all.
Common examples are easy to miss: calling a payment, email, or internal HTTP API before commit; retrying business logic inside the same transaction; reading extra tables just in case; or building audit records, notifications, and search updates before releasing the lock.
When one of those steps slows down by 300 ms, the lock also waits 300 ms. If ten requests hit the same account, order, or inventory row, that small delay turns into a queue.
When write paths get too wide
Teams also stretch transaction scope when web requests and background jobs touch the same rows for different reasons. A user clicks "cancel order" while a job recalculates totals or sends shipment updates on that same order. Each task looks harmless on its own. Together they wait on each other, and the slowdown feels random because it depends on timing.
A wide write path makes this worse. One user action may update the main row, insert an event, refresh counters, mark notifications, write audit history, and recalculate derived fields. None of that is unusual. The problem is doing all of it before commit in one long chain.
A better review starts with one plain question: what must happen inside the transaction, and what can move after commit? The answer is usually smaller than the code suggests.
How to review your write path step by step
Take one slow request and follow it from the moment your app receives it until Postgres commits. Do not start with the whole app. One request is enough to show where time slips away.
Write the path down as a timeline. Include controller code, service calls, ORM actions, raw SQL, and the final commit. If you skip this and trust memory, you usually miss the helper call or extra query that keeps the transaction open longer than expected.
Make a timeline of the transaction
For each step, note the exact moment the transaction starts. Then list every query that runs before the update or delete that matters. Read queries count too if they happen inside the same transaction, because they still keep the transaction alive while later writes wait behind a lock.
A simple review might look like this:
- request arrives
- transaction opens
- app loads user and account rows
- app calls another service for pricing or fraud checks
- app updates the order row and commits
That middle service call is where teams get surprised. The row lock may not look expensive in code, but the transaction stays open while the app waits on the network.
Mark anything that is not direct database work. That includes API calls, cache misses, retries, sleeps, template rendering, large JSON work, and loops over many items. Even 200 milliseconds matters when the same row gets updated often.
Move as much of that work as you can outside the transaction. Compute totals first. Fetch remote data first. Build audit messages after commit if they do not affect correctness. If you need row locks, keep the locked part short and boring.
Compare competing code paths
Then check whether two requests can touch the same rows in a different order. One path might update accounts and then invoices. Another might update invoices and then accounts. That mismatch creates waiting, and sometimes deadlocks.
Pick one shared rule for write order and keep it everywhere. If several services touch the same tables, write that order down in code comments or review notes so it does not drift.
This review is plain work, but it pays off fast. Teams usually find one transaction that does too much, one network call in the wrong place, or two write paths that fight each other for the same row.
A simple example of a stall chain
Picture an order service that marks an order as "paid" and reduces stock for one inventory row. The code opens a transaction, updates the order, updates the inventory item, and now holds row locks on both records.
That part is normal. The trouble starts when the request keeps doing extra work before it commits.
A common chain looks like this:
- Request A begins a transaction.
- It updates
orders.id = 8241andinventory.id = 17. - It then calculates shipping, writes an audit record, renders a PDF invoice, and waits for a slow API call.
- Only after all of that does it commit.
Those extra steps may take 300 ms on a good day and 4 seconds on a bad one. During that whole time, the row locks stay in place.
Now Request B arrives. It is small. It just needs to change the same order row because the customer refreshes the checkout page, or an admin tool retries payment status. Request B reaches the update and stops. It is not slow by itself. It is waiting for Request A to let go.
Then Request C shows up and wants the same inventory row because another buyer is checking out the last unit. It waits too. If your app retries, you can end up with Request D and E waiting behind them.
Soon the team sees random slowness. API latency jumps. A few workers look stuck. The database may still look healthy at first glance. CPU stays low. Disk looks fine. Most queries are fast. You only have a small set of blocked sessions, and they may come and go fast enough to miss in a dashboard.
That is why row locks feel sneaky. One request does a little too much inside the transaction, and every request that touches the same row joins the line.
The backlog grows outside the database too. App workers stay busy while they wait. Connection pools fill up. New requests start timing out before they even reach the update. What started as one slow write turns into a stall chain across the whole write path.
Mistakes that create lock trouble
Most lock trouble starts outside the database. A request opens a transaction, updates an order row, and then waits on something else: a payment API, a cache call, or a slow piece of app code. While that request waits, the row stays locked. Other requests pile up behind it, and the slowdown looks random because the database is idle part of the time.
A common version shows up in checkout or admin flows. The app starts BEGIN, changes a status, then spends 500 ms asking another service for a result. That half second is enough to create a queue if several users hit the same rows.
Another mistake is splitting one change into read first, update later. The code loads a row, checks a value in the app, then sends an update as a second statement. That pattern stretches the transaction and creates more chances for waits. If one SQL statement can check the condition and update the row, use it. Fewer round trips usually mean less lock time.
Shared counters and status rows also cause pain quickly. Teams often keep one row for things like daily_totals, job_count, or last_processed_id, then update it on every request. That row becomes a hotspot. Even tiny writes can block each other when everyone touches the same record.
Lock order matters too. One handler updates accounts and then invoices. Another updates invoices and then accounts. Under load, those two paths can trap each other in a wait or a deadlock. The fix is boring, which is probably why teams skip it: choose one order for related writes and keep it the same everywhere.
The last trap is hidden writes. Triggers, stored procedures, and helper functions can update extra tables that the calling code never shows. A simple status change may also write an audit row, bump a counter, or refresh a summary table. When you review lock waits, trace the full write path, not just the query you see in the handler.
If a request needs 20 ms of SQL, keep the transaction close to 20 ms. Anything extra usually turns into hidden stalls later.
How to confirm the problem in production
Random slow requests often start with one transaction that stayed open longer than anyone expected. If you only look at average query time, you can miss it. A 15 ms update can still hold a row lock for 20 seconds if the app does other work before commit.
Start with running transactions, not just slow queries. In row lock cases, transaction age tells a clearer story than query duration. Sort active sessions by xact_start, and pay close attention to sessions marked idle in transaction.
SELECT pid,
state,
now() - xact_start AS tx_age,
wait_event_type,
wait_event,
query
FROM pg_stat_activity
WHERE datname = current_database()
ORDER BY xact_start NULLS LAST;
Then split sessions into two groups: sessions doing work and sessions waiting. If wait_event_type shows Lock, the database is not busy on CPU. It is waiting for another session to finish.
Next, match blocked queries to the session that holds the lock. pg_blocking_pids(pid) helps you find the blocker quickly. Read both statements together. The blocked query often looks innocent. The blocker is usually an earlier write that touched the same row and then stayed open while the app did something else.
A few fields make the picture much easier to read: pid, transaction age, wait_event_type, blocking session pid, and the current query text.
After that, compare database timing with app traces. Line up the request start time, the transaction start time, and the moment the request slowed down. If the trace shows a call to another service, a retry loop, or extra business logic inside the same transaction, you likely found the queue starter.
Capture the statement that begins the chain. Teams often log the blocked update and stop there. That misses the real cause. The first statement that locks the row is the one that matters, even if it ran quickly.
Quick checks before you ship
A lot of lock trouble starts with code that looks harmless in review. One extra query, one API call, or one loop inside a transaction can turn a fast write into a queue when traffic picks up.
Before a release, check a few basics:
- Keep each transaction tight. Do the minimum read and write work, then commit.
- Touch shared rows in the same sequence every time.
- Pull outside calls out of the transaction body.
- Test busy endpoints with concurrent traffic, not just one request at a time.
That row-order rule sounds small, but it matters. If a billing job updates a customer row and then an invoice row, a webhook should not update the invoice first and the customer second. That is how deadlocks and hidden stalls show up in apps that looked stable in basic testing.
This is also where Postgres row locking stops feeling abstract. You do not need a huge system to hit the problem. A signup flow, coupon counter, seat reservation, or inventory update can do it if the transaction stays open too long.
Before you merge, ask two blunt questions: "What rows does this request lock?" and "What work happens before commit that does not need to?" If the answers are fuzzy, the code probably needs one more pass.
What to change next
Do not chase every slow query at once. Pick one write path that people hit all day, map every step inside the transaction, and write down where the app reads, writes, retries, or waits before commit.
That single map usually explains more than another week of guesswork. Teams fix lock trouble faster by shrinking transaction time than by throwing more CPU or memory at the database.
A good first pass is simple. Choose one hot path, such as checkout, invoice update, or job claim. List every SQL statement in order. Mark anything that happens inside the transaction but does not need to. Note which rows many requests touch at the same time.
Then make lock waits visible in your normal review flow. If the team already checks error rates, deploy health, and response times, add lock wait time to that same habit. When waits stay hidden, people blame random slowness, app servers, or traffic spikes. Once waits show up on a chart or in query logs, the pattern gets obvious fast.
Code review needs a clear rule too. Transactions should stay short, and reviewers should push back when code does extra work before commit. Common examples are calling another service, loading more rows than needed, doing heavy validation late, or holding a transaction open while the app builds a response.
A simple team rule works well: open the transaction as late as possible, touch the fewest rows you can, and commit as soon as the write is done. If a step can happen before or after the transaction, move it.
Do not turn this into a huge cleanup project. Fix one write path, measure the change, and then move to the next path with the same method. Two or three small fixes often remove the worst stalls.
If you want a second set of eyes, Oleg Sotnikov at oleg.is works as a fractional CTO and startup advisor. He helps startups and small teams review transaction scope, write paths, and production infrastructure so problems like this get fixed before they spread through the app.
Frequently Asked Questions
What is the difference between a slow query and a lock wait?
A slow query spends time doing work. A lock wait finds the row fast, then sits there until another transaction commits.
They look similar in dashboards because both show up as long request time. The fix differs, so check whether the session waits on Lock before you tune SQL or add indexes.
How do I confirm that row locks cause the slowdown?
Start with pg_stat_activity and sort by xact_start. Look for old transactions, idle in transaction sessions, and requests where wait_event_type is Lock.
Then match blocked sessions to the blocker with pg_blocking_pids(pid). Compare that with your app trace so you can see what the blocking request did before commit.
What is a hot row?
A hot row is one record that many requests update over and over. Common examples are one inventory item, one account balance, one order, or one shared counter.
Even tiny writes turn into a queue when many requests hit that same row and one transaction holds it too long.
Should I move API calls out of the transaction?
Yes. Open the transaction right before the write you need, then commit as soon as that write finishes.
If you call payment, email, fraud, or another internal API inside the transaction, you keep the row lock open while the network waits. That delay spreads fast under concurrent traffic.
Why is idle in transaction a problem?
Because it keeps the transaction open while the app does nothing useful. Any row locks from that transaction stay alive, and other requests pile up behind them.
Even a short pause hurts when many requests touch the same record. Fix the code path so it commits or rolls back right away.
Can one locked row slow down the whole app?
Yes. Waiting requests still tie up app workers and database connections. Once the pool fills, new requests wait before they even reach the locked row.
That is why one long transaction can show up as wider slowness across the app, even when CPU and disk look fine.
What should I change first to reduce lock time?
Shrink transaction scope first. Validate input, fetch remote data, and do heavy app work before you begin the transaction whenever you can.
Inside the transaction, touch the fewest rows possible and commit right after the write. That change often removes the worst stalls faster than infrastructure changes.
Does read first, update later make lock waits worse?
Yes. When code reads a row, checks a value in the app, and updates later, it adds round trips and keeps the transaction open longer.
If one SQL statement can check the condition and update the row, use that. You cut lock time and reduce the chance of two requests racing each other.
How do I avoid deadlocks when two paths touch the same tables?
Pick one write order for shared tables and keep it everywhere. If one path updates accounts then invoices, every other path should follow that same order.
Review jobs, webhooks, admin actions, and request handlers together. Mixed order creates waits and can turn into deadlocks under load.
What should I check before I ship a write-heavy change?
Ask two plain questions before merge: what rows does this request lock, and what work happens before commit that does not need to.
Then run concurrent traffic against busy write paths, not just one request at a time. Lock trouble often hides in code that looks fine in single-request tests.