max_standby_streaming_delay setting not cancelling query on replica

From: Ben Snaidero <bensnaidero(at)geotab(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: max_standby_streaming_delay setting not cancelling query on replica
Date: 2023-11-01 16:58:17
Message-ID: CAEPE5bN8zBwGsG0NkdBB43DxXXruQGCvf4HgcFunB2KvN-tvVQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

We are running a postgres streaming replica with
max_standby_streaming_delay set to 900 seconds (15min). We encountered an
issue in our environment where we had a long running query that was running
against the replica for just over 4 hours causing 4 hours of replication
lag. Looking at pg_stat_activity for this query it was stuck in
Client:ClientWrite wait state for pretty much all of this time (it ran for
less than 1 minute before going into ClientWrite wait state. We capture
pg_stat_activity every minute and only first capture shows a DataFileRead
wait and there was only 1 other capture during the 4 hours where it was
active with no wait event). From what we could tell the client process
tried to send cancellation and disconnected (our client application uses
npgsql) so there was no process to consume these results and after manually
cancelling the query the replication lag came back down so this query was
definitely the cause of the lag.

Question: Why did the max_standby_streaming_delay setting not cancel this
query?

I looked at the code in standby.c and if there are conflicting locks it
should be cancelled. Unfortunately at the time this issue occurred we
weren't collecting pg_locks to see what locks are being held but given the
state the query was in would it have released the ACCESS SHARE lock it
acquired while executing the query given it just has to send data to the
client now? I would think if it still held this lock then the query would
be cancelled. If it didn't hold it anymore then maybe that is why
max_standby_streaming_delay setting didn't cause it to be cancelled. Any
other ideas?

Thanks,
Ben

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2023-11-01 17:19:37 Re: MERGE ... RETURNING
Previous Message Bruce Momjian 2023-11-01 15:36:01 Re: "box" type description