Confused about max_standby_streaming_delay

From: Robert Inder <robert(at)interactive(dot)co(dot)uk>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Confused about max_standby_streaming_delay
Date: 2017-09-06 16:43:28
Message-ID: CAKqjJm9v5t2g=dDPwo_sZZSUdektrYY=de7W+tWBLMTtjtmOsw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

A server running PostgreSQL 9.4 on Centos 6.8, with a live server and a hot
standby, is supporting about 20 customer organisations, each with their own
linux user and its own installation/copy of the system, talking to its own
database.

The system has a backup script which is a wrapper round pg_dump.
For most installations, it takes "a couple of minutes" to run, though
others take longer.

On the standby server, we have a directory full of cron jobs -- one for
each installation -- to invoke this backup script
Each script is run as soon as the previous one is finished. So when all
goes to plan, we get a dump of
database_a at 5 minutes past the hour,
database_b at 7 minutes past,
database_c at 9 minutes past
and so on.

H
​owever, we o​
ccasionally
​ have​
a dump die with a message
​ like​

Error message from server: ERROR: canceling statement due to
conflict with recovery

I believe this means that an update from the live server wants to change a
table that is being dumped, and so the dump is aborted.

And I've read that the answer to this is to set max_standby_streaming_delay
in postgresql94.conf.
So I've set it to "600s" -- ten minutes.

I thought this would mean that when there was a conflict with an update
from the live server, Postgres would give the dump 10 minutes "grace" in
which to finish before killing it.

Ten minutes may or may not be enough. But in a case where it isn't enough,
and the dump is abandonned, I would expect to see something like

the dump of database_a finishing at 5 minutes past the hour,
the dump of database_b
starting after the dump of database_a,
having a conflict,
being given 10 minutes to complete, and then
being abandonned
the dump of database_c starting after the dump of database_b and
finishing (say) 3 minutes later

So the dump of database_c should finish at around 18 minutes past the hour.

BUT that is not what I am seeing.
On occasions where the dump of database_b is being abandonned, the
successful dump of
database_c is timestamped less than 10 minutes after the dump of database_a

Which does not fit with the dump of database_b being given 10 minutes in
which to finish

Have I misunderstood something? Or is Postgres not actually configured the
way I think it is?

Robert.

--
Robert Inder, 0131 229 1052 / 07808 492
213
Interactive Information Ltd, 3, Lauriston Gardens, Edinburgh EH3 9HH
Registered in Scotland, Company no. SC 150689
Interactions speak louder than
words

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Nico Williams 2017-09-06 16:54:34 Re: Schema/table replication
Previous Message Adrian Heath 2017-09-06 16:31:43 Hot standby failing with page # of relation # is uninitialized