Re: BUG #16440: pg_basebackup intermittently hangs waiting for input unless run with --checkpoint=fast option

From: Curt Kolovson <curt(at)kolovson(dot)org>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #16440: pg_basebackup intermittently hangs waiting for input unless run with --checkpoint=fast option
Date: 2020-05-15 18:47:32
Message-ID: CANhYJV7ESTcU0CAyNVHbWfLwu3YUHaZ_T6EbF3EGLPEhocyppw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I should be more precise when I say it is "repeatable" when it occurs. We
have seen this problem only occasionally. Let's say we have a primary and
we are trying to clone it to make 3 standbys. It may succeed on the 1st
standby, and then hang on pg_basebackup on the 2nd standby, and then
succeed on the 3rd standby. On the 2nd standby, it will repeatedly hang
when retried. The only way we have found to work around this issue is to
either run it with the --checkpoint=fast option or restarting postgres on
the primary.

On Fri, May 15, 2020 at 11:39 AM Curt Kolovson <curt(at)kolovson(dot)org> wrote:

> We are using the default checkpoint settings: checkpoint_timout (5 min) *
> checkpoint_completion_target (0.5) = 2.5 min. OK on your point that we
> should be looking at the primary CPU, not the client. You're right that
> "indefinitely" is not a good term to use. We waited over 10 min. It
> appeared to be hung. The database was small (81 MB) and there was very
> little activity. I tried doing a checkpoint on the primary while this was
> going on, and it had no effect. The strange thing about this problem is
> that it is intermittent. We only see it happen occasionally. But when it
> occurs, it is repeatable.
>
> Curt
>
>
> On Fri, May 15, 2020 at 10:51 AM Magnus Hagander <magnus(at)hagander(dot)net>
> wrote:
>
>> On Fri, May 15, 2020 at 7:49 PM PG Bug reporting form <
>> noreply(at)postgresql(dot)org> wrote:
>>
>>> The following bug has been logged on the website:
>>>
>>> Bug reference: 16440
>>> Logged by: Curt Kolovson
>>> Email address: ckolovson(at)gmail(dot)com
>>> PostgreSQL version: 10.9
>>> Operating system: Linux 4.9.219-1.ph2 x86_64 (VMware Photon OS 2.0)
>>> Description:
>>>
>>> We have noticed what appears to be an intermittent bug in pg_basebackup.
>>> Here is what we are occasionally seeing:
>>> $ /opt/vmware/vpostgres/current/bin/pg_basebackup -l "repmgr base
>>> backup"
>>> -D /var/vmware/vpostgres/current/pgdata -h 172.18.50.48 -p 5432 -U
>>> repmgr -X
>>> stream --verbose --progress
>>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
>>>
>>> And then it hangs indefinitely at this point. It makes no progress (0
>>> CPU),
>>> so it is hanging on some type of input. Here is ps output:
>>> postgres 3386 3370 0 15:01 ? 00:00:00
>>> /opt/vmware/vpostgres/current/bin/pg_basebackup -l repmgr base backup -D
>>> /var/vmware/vpostgres/current/pgdata -h 172.18.50.48 -p 5432 -U repmgr -X
>>> stream
>>>
>>> We notice that this behavior only occurs intermittently, but when it
>>> does,
>>> it happens repeatedly on that system. The only workarounds we have found
>>> are
>>> either to run it with the --checkpoint=fast option, or to restart
>>> postgres
>>> on the primary.
>>>
>>> We are using synchronous streaming replication.
>>>
>>
>> Define "indefinitely". How long did you wait, and what's the value for
>> your checkpoint_timeout?
>>
>> It's perfectly normal for it to be waiting quite some time, as it waits
>> for the slow speed checkpoint to complete on the server. (And if you want
>> to look at the processes and status of the server, not the client, to see
>> what it's doing)
>>
>> --
>> Magnus Hagander
>> Me: https://www.hagander.net/ <http://www.hagander.net/>
>> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
>>
>

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2020-05-15 21:07:33 BUG #16441: Cannot multi-insert into generated column with DEFAULT value
Previous Message Curt Kolovson 2020-05-15 18:39:44 Re: BUG #16440: pg_basebackup intermittently hangs waiting for input unless run with --checkpoint=fast option