Re: Streaming Replication patch for CommitFest 2009-09

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming Replication patch for CommitFest 2009-09
Date: 2009-09-24 08:20:56
Message-ID: 3f0b79eb0909240120t6a7d56b3gfc7119af8bcc287b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Sorry for the delay.

On Mon, Sep 21, 2009 at 4:51 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Having gone through the patch now in more detail, I think it's in pretty
> good shape. I'm happy with the overall design, except that I haven't
> been able to make up my mind if walreceiver should indeed be a
> stand-alone program as discussed, or a postmaster child process as in
> the patch you submitted. Putting that question aside for a moment,
> here's some minor things, in no particular order:

Thanks for the comments.

> - The async API in PQgetXLogData is quite different from the other
> commands. It's close to the API from PQgetCopyData(), but doesn't return
> a malloc'd buffer like PQgetCopyData does. I presume that's to optimize
> away the extra memcpy step?

Yes. This is for preventing extra memcpy.

> I don't think that's really necessary, I
> don't recall any complaints about that in PQgetCopyData(), and if it
> does become an issue, it could be optimized away by mallocing the buffer
> first and reading directly to that.

OK. I'll change PQgetXLogData() to return a malloc'd buffer, and will
remove PQmarkConsumed().

> - Can we avoid sprinkling XLogStreamingAllowed() calls to places where
> we check if WAL-logging is required (nbtsort.c, copy.c etc.). I think we
> need a new macro to encapsulate (XLogArchivingActive() ||
> XLogStreamingAllowed()).

Yes. I'll introduce a new macro XLogIsNeeded() which encapsulates
(XLogArchivingActive() || XLogStreamingAllowed()).

> - Is O_DIRECT ever a good idea in walreceiver? If it's really direct and
> doesn't get cached, the startup process will need to read from disk.

Good point. I agree that O_DIRECT is useless if walreceiver works
with the startup process. It might be useful if only stand-alone walreceiver
program is executed in the standby.

> - Can we replace read/write_conninfo with just a long-enough field in
> shared mem? Would be simpler. (this is moot if we go with the
> stand-alone walreceiver program and pass it as a command-line argument)

Yes, if we can decide the length of conninfo. Since I could not decide
that, I used read/write_conninfo to tell walreceiver the conninfo. Is the
fixed size 1024B enough for conninfo?

> - walreceiver shouldn't die on connection error, just to be restarted by
> startup process. Can we add error handling a la bgwriter and have a
> retry loop within walreceiver? (again, if we go with a stand-alone
> walreceiver program, it's probably better to have startup process
> responsible to restart walreceiver, as it is now)

Error handling a la bgwriter? You mean that PG_exception_stack
should be set up to handle an ERROR exception?

Anyway, I'll change walreceiver to retry connecting to the primary
after an error occurs in PQstartXLogStreaming()/PQgetXLogData()/
PQputXLogRecPtr(). Should we set an upper limit of the number of
the retries?

> - pq_wait in backend waits until you can read or write at least 1 byte.
> There is no guarantee that you can send or read the whole message
> without blocking. We'd have to put the socket in non-blocking mode for
> that. I'm not sure what the implications of this are.

Umm... AFAIK, poll and select guarantee that at least the subsequent
recv will not be blocked. If there is only 1 byte available in the buffer,
recv would read that 1 byte and return immediately. I'm not sure if send
will get stuck even after poll is passed. In my environment (RHEL5),
send seems not to be blocked.

> - we should include system_identifier somewhere in the replication
> startup handshake. Otherwise you can connect to server from a different
> system and have logs shipped, if they happen to be roughly at the same
> point in WAL. Replay will almost certainly fail, but we should error
> earlier.

Agreed. I'll do that.

> - I know I said we should have just asynchronous replication at first,
> but looking ahead, how would you do synchronous?

As the previous patch did, I'm going to make walsender read the latest
XLOG from wal_buffers, introduce the signaling between a backend
and walsender, and keep a backend waiting until the specified XLOG
has been written or fsynced in the standby.

> What kind of signaling
> is needed between walreceiver and startup process for that?

I was thinking that the synchronization mode which a client waits
until XLOG has been applied is not necessary right now, so no
signaling is also not required between those processes yet. But,
HS requires this capability?

> - 'replication' shouldn't be a real database.

Agreed. I'll remove that.

> I found the paging logic in walsender confusing, and didn't like the
> idea that walsender needs to set the XLOGSTREAM_END_SEG flag. Surely
> walreceiver knows how to split the WAL into files without such a flag. I
> reworked that logic, I think it's easier to understand now. I kept the
> support for the flag in libpq and the protocol for now, but it should be
> removed too, or repurposed to indicate that pg_switch_xlog() was done in
> the master. I've pushed that to 'replication-orig' branch in my git
> repository, attached is the same as a diff against your SR_0914.patch.
>
> I need a break from this patch, so I'll take a closer look at Simon's
> hot standby now. Meanwhile, can you work on the above items and submit a
> new version, please?

Yeah, sure.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message ning 2009-09-24 08:51:02 "BEGIN TRANSACTION" and "START TRANSACTION": different error handling
Previous Message KaiGai Kohei 2009-09-24 06:26:00 Re: [PATCH] Largeobject access controls