Re: Streaming Replication patch for CommitFest 2009-09

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming Replication patch for CommitFest 2009-09
Date: 2009-09-18 05:50:06
Message-ID: 3f0b79eb0909172250m71c942f8n820c94bc8a264176@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Thu, Sep 17, 2009 at 8:32 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Some random comments:

Thanks for the comments.

> I don't think we need the new PM_SHUTDOWN_3 postmaster state. We can
> treat walsenders the same as the archive process, and kill and wait for
> both of them to die in PM_SHUTDOWN_2 state.

OK, I'll use PM_SHUTDOWN_2 for walsender instead of PM_SHUTDOWN_3.

> I think there's something wrong with the napping in walsender. When I
> perform px_xlog_switch(), it takes surprisingly long for it to trickle
> to the standby. When I put a little proxy program in between the master
> and slave that delays all messages from the slave to the master by one
> second, it got worse, even though I would expect the master to still
> keep sending WAL at full speed. I get logs like this:

Probably this is because XLOG records following XLOG_SWITCH are
sent to the standby, too. Though those records are obviously not used
for recovery, they are sent because walsender doesn't know where
XLOG_SWITCH is.

The difficulty is that there might be many XLOG_SWITCHs in the XLOG
files which are going to be sent by walsender. How should walsender
get to know those location? One possible solution is to make walsender
parse the XLOG files and search XLOG_SWITCH. But this is overkill,
I think.

I don't think that XLOG switch is often requested and is sensitive to
response time in many cases. So it's not worth changing walsender
to skip the XLOG following XLOG_SWITCH, I think. Thought?

> 2009-09-17 14:14:09.932 EEST LOG: xlog send request 0/38000428; send
> 0/38000000; write 0/38000000
> 2009-09-17 14:14:09.932 EEST LOG: xlog read request 0/38000428; send
> 0/38000428; write 0/38000000
>
> It looks like it's having 100 or 200 ms naps in between. Also, I
> wouldn't expect to see so many "read request" acknowledgments from the
> slave. The master doesn't really need to know how far the slave is,
> except in synchronous replication when it has requested a flush to
> slave. Another reason why master needs to know is so that the master can
> recycle old log files, but for that we'd really only need an
> acknowledgment once per WAL file or even less.

You mean that the new protocol for asking the standby about the completion
location of replication is required? In synchronous case, the backend should
not wait for one acknowledgement per XLOG file, for its performance.

> Why does XLogSend() care about page boundaries? Perhaps it's a leftover
> from the old approach that read from wal_buffers?

That is for not sending a partially-filled XLOG *record*, which simplifies the
logic that startup process waits for the next XLOG record available, i.e.,
startup process doesn't need to take care of a partially-sent record.

> Do we really need the support for asynchronous backend libpq commands?
> Could walsender just keep blasting WAL to the slave, and only try to
> read an acknowledgment after it has requested one, by setting
> XLOGSTREAM_FLUSH flag. Or maybe we should be putting the socket into
> non-blocking mode.

Yes, that is required, especially for synchronous replication. The receiving of
the acknowledgement should not keep the subsequent XLOG-sending waiting.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2009-09-18 06:09:15 Re: FSM search modes
Previous Message Heikki Linnakangas 2009-09-18 05:47:24 Re: Streaming Replication patch for CommitFest 2009-09