Re: pg_basebackup may fail to send feedbacks.

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_basebackup may fail to send feedbacks.
Date: 2015-03-02 11:21:36
Message-ID: CAHGQGwG1tJHpG03oZgwoKxt5wYD5v4S3HuTgSx7RotBhHnjwJw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Tue, Feb 24, 2015 at 6:44 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello, the attached is the v4 patch that checks feedback timing
> every WAL segments boundary.
>
> At Fri, 20 Feb 2015 17:29:14 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20150220(dot)172914(dot)241732690(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
>> > Some users may complain about the performance impact by such
>> > frequent calls and we may want to get rid of them from
>> > walreceiver loop in the future. If we adopt your idea now,
>> > I'm afraid that it would tie our hands in that case.
>> >
>> > How much impact can such frequent calls of gettimeofday()
>> > have on replication performance? If it's not negligible,
>> > probably we should remove them at first and find out another
>> > idea to fix the problem you pointed. ISTM that it's not so
>> > difficult to remove them. Thought? Do you have any numbers
>> > which can prove that such frequent gettimeofday() has only
>> > ignorable impact on the performance?
>>
>> The attached patch is 'the more sober' version of SIGLARM patch.
>
> I said that checking whether to send feedback every boundary
> between WAL segments seemed too long but after some rethinking, I
> changed my mind.
>
> - The most large possible delay source in the busy-receive loop
> is fsyncing at closing WAL segment file just written, this can
> take several seconds. Freezing longer than the timeout
> interval could not be saved and is not worth saving anyway.
>
> - 16M bytes-disk-writes intervals between gettimeofday() seems
> to be gentle enough even on platforms where gettimeofday() is
> rather heavy.

Sounds reasonable to me.

So we don't need to address the problem in walreceiver side so proactively
because it tries to send the feedback every after flushing the WAL records.
IOW, the problem you observed is less likely to happen. Right?

+ now = feGetCurrentTimestamp();
+ if (standby_message_timeout > 0 &&

Minor comment: should feGetCurrentTimestamp() be called after the check of
standby_message_timeout > 0, to avoid unnecessary calls of that?

ProcessXLogDataMsg(PGconn *conn, char *copybuf, int len,
XLogRecPtr *blockpos, uint32 timeline,
char *basedir, stream_stop_callback stream_stop,
- char *partial_suffix, bool mark_done)
+ char *partial_suffix, bool mark_done,
+ int standby_message_timeout, int64 *last_status)

Maybe it's time to refactor this ugly coding (i.e., currently many arguments
need to be given to each functions. Looks ugly)...

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2015-03-02 11:37:55 Re: remove pg_standby?
Previous Message Dean Rasheed 2015-03-02 10:16:27 Re: INSERT ... ON CONFLICT UPDATE and RLS