From: | Nick B <nbedxp(at)gmail(dot)com> |
---|---|
To: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, Magnus Hagander <magnus(at)hagander(dot)net>, Oleksii Kliukin <alexk(at)hintbits(dot)com>, sfrost(at)snowman(dot)net |
Subject: | Re: pg_basebackup, walreceiver and wal_sender_timeout |
Date: | 2019-01-29 16:11:30 |
Message-ID: | CAPHA_mkS-70+FWku4tQiMR+NVJe826Y6oCEG69YaJtWi2C2Ebw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Greetings,
I also would like to thank everyone for looking into this.
On Sat, Jan 26, 2019 at 01:45:46PM +0100, Magnus Hagander wrote:
> One workaround you could perhaps look at here is to run pg_basebackup
> with --no-sync. That way there will be no fsyncs issued while running. You
> will then of course have to take care of syncing all the files to disk
> after it's done, but a network filesystem might be happier in dealing with
> a large "batch-sync" like that rather than piece-by-piece sync.
Thanks for the pointer. I actually was not aware of the existence of this
flag. I've ran two rounds of tests with --no-sync and backup failed at a
much later point in time, which suggests that the bottleneck is in fact the
metadata server of ceph. We're now looking into ways of improving this.
(This is a 15TB cluster with a few hundred thousands tables which on
average generates 4 WAL segments per second, so throttling transfer rate is
not a good option either).
On Sat, Jan 26, 2019 at 4:23 AM Michael Paquier
<michael(at)paquier(dot)xyz> wrote:
> The docs could be improved to describe that better..
I had an off-list discussion of a possible documentation update with
Stephen Frost and he voiced an opinion that the behaviour I was trying to
describe sounds a lot like a bug and documenting that is not a good
practice.
Upon further examination of WalSndKeepaliveIfNecessary I found out that the
implementation of "requesting an immediate reply" is done by setting the
socket into non-blocking mode and issuing a flush. I find it hard to
believe there is a scenario where client can react to that keep-alive on
time (unless of course I misunderstood something). So the question is, will
we ever wait the actual wal_sender_timeout before terminating the
connection?
Regards,
Nick.
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2019-01-29 16:14:32 | Re: Why does execReplication.c lock tuples? |
Previous Message | Petr Jelinek | 2019-01-29 16:04:55 | Re: Why does execReplication.c lock tuples? |