Sorry for the delay.
On Mon, Sep 21, 2009 at 4:51 PM, Heikki Linnakangas
> Having gone through the patch now in more detail, I think it's in pretty
> good shape. I'm happy with the overall design, except that I haven't
> been able to make up my mind if walreceiver should indeed be a
> stand-alone program as discussed, or a postmaster child process as in
> the patch you submitted. Putting that question aside for a moment,
> here's some minor things, in no particular order:
Thanks for the comments.
> - The async API in PQgetXLogData is quite different from the other
> commands. It's close to the API from PQgetCopyData(), but doesn't return
> a malloc'd buffer like PQgetCopyData does. I presume that's to optimize
> away the extra memcpy step?
Yes. This is for preventing extra memcpy.
> I don't think that's really necessary, I
> don't recall any complaints about that in PQgetCopyData(), and if it
> does become an issue, it could be optimized away by mallocing the buffer
> first and reading directly to that.
OK. I'll change PQgetXLogData() to return a malloc'd buffer, and will
> - Can we avoid sprinkling XLogStreamingAllowed() calls to places where
> we check if WAL-logging is required (nbtsort.c, copy.c etc.). I think we
> need a new macro to encapsulate (XLogArchivingActive() ||
Yes. I'll introduce a new macro XLogIsNeeded() which encapsulates
(XLogArchivingActive() || XLogStreamingAllowed()).
> - Is O_DIRECT ever a good idea in walreceiver? If it's really direct and
> doesn't get cached, the startup process will need to read from disk.
Good point. I agree that O_DIRECT is useless if walreceiver works
with the startup process. It might be useful if only stand-alone walreceiver
program is executed in the standby.
> - Can we replace read/write_conninfo with just a long-enough field in
> shared mem? Would be simpler. (this is moot if we go with the
> stand-alone walreceiver program and pass it as a command-line argument)
Yes, if we can decide the length of conninfo. Since I could not decide
that, I used read/write_conninfo to tell walreceiver the conninfo. Is the
fixed size 1024B enough for conninfo?
> - walreceiver shouldn't die on connection error, just to be restarted by
> startup process. Can we add error handling a la bgwriter and have a
> retry loop within walreceiver? (again, if we go with a stand-alone
> walreceiver program, it's probably better to have startup process
> responsible to restart walreceiver, as it is now)
Error handling a la bgwriter? You mean that PG_exception_stack
should be set up to handle an ERROR exception?
Anyway, I'll change walreceiver to retry connecting to the primary
after an error occurs in PQstartXLogStreaming()/PQgetXLogData()/
PQputXLogRecPtr(). Should we set an upper limit of the number of
> - pq_wait in backend waits until you can read or write at least 1 byte.
> There is no guarantee that you can send or read the whole message
> without blocking. We'd have to put the socket in non-blocking mode for
> that. I'm not sure what the implications of this are.
Umm... AFAIK, poll and select guarantee that at least the subsequent
recv will not be blocked. If there is only 1 byte available in the buffer,
recv would read that 1 byte and return immediately. I'm not sure if send
will get stuck even after poll is passed. In my environment (RHEL5),
send seems not to be blocked.
> - we should include system_identifier somewhere in the replication
> startup handshake. Otherwise you can connect to server from a different
> system and have logs shipped, if they happen to be roughly at the same
> point in WAL. Replay will almost certainly fail, but we should error
Agreed. I'll do that.
> - I know I said we should have just asynchronous replication at first,
> but looking ahead, how would you do synchronous?
As the previous patch did, I'm going to make walsender read the latest
XLOG from wal_buffers, introduce the signaling between a backend
and walsender, and keep a backend waiting until the specified XLOG
has been written or fsynced in the standby.
> What kind of signaling
> is needed between walreceiver and startup process for that?
I was thinking that the synchronization mode which a client waits
until XLOG has been applied is not necessary right now, so no
signaling is also not required between those processes yet. But,
HS requires this capability?
> - 'replication' shouldn't be a real database.
Agreed. I'll remove that.
> I found the paging logic in walsender confusing, and didn't like the
> idea that walsender needs to set the XLOGSTREAM_END_SEG flag. Surely
> walreceiver knows how to split the WAL into files without such a flag. I
> reworked that logic, I think it's easier to understand now. I kept the
> support for the flag in libpq and the protocol for now, but it should be
> removed too, or repurposed to indicate that pg_switch_xlog() was done in
> the master. I've pushed that to 'replication-orig' branch in my git
> repository, attached is the same as a diff against your SR_0914.patch.
> I need a break from this patch, so I'll take a closer look at Simon's
> hot standby now. Meanwhile, can you work on the above items and submit a
> new version, please?
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
In response to
pgsql-hackers by date
|Next:||From: ning||Date: 2009-09-24 08:51:02|
|Subject: "BEGIN TRANSACTION" and "START TRANSACTION": different error handling|
|Previous:||From: KaiGai Kohei||Date: 2009-09-24 06:26:00|
|Subject: Re: [PATCH] Largeobject access controls|