awkward cancellation of parallel queries on standby.

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: awkward cancellation of parallel queries on standby.
Date: 2023-03-26 15:12:48
Message-ID: CAMkU=1zCi_8E3aqi2iKBa8HOA8MFJ=7GrQ5AS_uiw2NGagS4oA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

When a parallel query gets cancelled on a standby due to
max_standby_streaming_delay, it happens rather awkwardly. I get two errors
stacked up, a query cancellation followed by a connection termination.

I use `pgbench -R 1 -T3600 -P5` on the master to generate a light but
steady stream of HOT pruning records, and then run `select
sum(a.abalance*b.abalance) from pgbench_accounts a join pgbench_accounts b
using (bid);` on the standby not in a transaction block to be a
long-running parallel query (scale factor of 20)

I also set max_standby_streaming_delay = 0. That isn't necessary, but it
saves wear and tear on my patience.

ERROR: canceling statement due to conflict with recovery
DETAIL: User query might have needed to see row versions that must be
removed.
FATAL: terminating connection due to conflict with recovery
DETAIL: User query might have needed to see row versions that must be
removed.

This happens quite reliably. In psql, these sometimes both show up
immediately, and sometimes only the first one shows up immediately and then
the second one appears upon the next communication to the backend.

I don't know if this is actually a problem. It isn't for me as I don't do
this kind of thing outside of testing, but it seems untidy and I can see it
being frustrating from a catch-and-retry perspective and from a log-spam
perspective.

It looks like the backend gets signalled by the startup process, and then
it signals the postmaster to signal the parallel workers, and then they
ignore it for a quite long time (tens to hundreds of ms). By the time they
get around responding, someone has decided to escalate things. Which
doesn't seem to be useful, because no one can do anything until the workers
respond anyway.

This behavior seems to go back a long way, but the propensity for both
messages to show up at the same time vs. in different round-trips changes
from version to version.

Is this something we should do something about?

Cheers,

Jeff

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2023-03-26 15:19:18 Re: Disable vacuuming to provide data history
Previous Message Justin Pryzby 2023-03-26 14:51:25 Re: CREATE INDEX CONCURRENTLY on partitioned index