Re: Streaming Replication patch for CommitFest 2009-09

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming Replication patch for CommitFest 2009-09
Date: 2009-09-21 07:51:53
Message-ID: 4AB73099.1060202@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Having gone through the patch now in more detail, I think it's in pretty
good shape. I'm happy with the overall design, except that I haven't
been able to make up my mind if walreceiver should indeed be a
stand-alone program as discussed, or a postmaster child process as in
the patch you submitted. Putting that question aside for a moment,
here's some minor things, in no particular order:

- The async API in PQgetXLogData is quite different from the other
commands. It's close to the API from PQgetCopyData(), but doesn't return
a malloc'd buffer like PQgetCopyData does. I presume that's to optimize
away the extra memcpy step? I don't think that's really necessary, I
don't recall any complaints about that in PQgetCopyData(), and if it
does become an issue, it could be optimized away by mallocing the buffer
first and reading directly to that.

- Can we avoid sprinkling XLogStreamingAllowed() calls to places where
we check if WAL-logging is required (nbtsort.c, copy.c etc.). I think we
need a new macro to encapsulate (XLogArchivingActive() ||
XLogStreamingAllowed()).

- Is O_DIRECT ever a good idea in walreceiver? If it's really direct and
doesn't get cached, the startup process will need to read from disk.

- Can we replace read/write_conninfo with just a long-enough field in
shared mem? Would be simpler. (this is moot if we go with the
stand-alone walreceiver program and pass it as a command-line argument)

- walreceiver shouldn't die on connection error, just to be restarted by
startup process. Can we add error handling a la bgwriter and have a
retry loop within walreceiver? (again, if we go with a stand-alone
walreceiver program, it's probably better to have startup process
responsible to restart walreceiver, as it is now)

- pq_wait in backend waits until you can read or write at least 1 byte.
There is no guarantee that you can send or read the whole message
without blocking. We'd have to put the socket in non-blocking mode for
that. I'm not sure what the implications of this are.

- we should include system_identifier somewhere in the replication
startup handshake. Otherwise you can connect to server from a different
system and have logs shipped, if they happen to be roughly at the same
point in WAL. Replay will almost certainly fail, but we should error
earlier.

- I know I said we should have just asynchronous replication at first,
but looking ahead, how would you do synchronous? What kind of signaling
is needed between walreceiver and startup process for that?

- 'replication' shouldn't be a real database.

I found the paging logic in walsender confusing, and didn't like the
idea that walsender needs to set the XLOGSTREAM_END_SEG flag. Surely
walreceiver knows how to split the WAL into files without such a flag. I
reworked that logic, I think it's easier to understand now. I kept the
support for the flag in libpq and the protocol for now, but it should be
removed too, or repurposed to indicate that pg_switch_xlog() was done in
the master. I've pushed that to 'replication-orig' branch in my git
repository, attached is the same as a diff against your SR_0914.patch.

I need a break from this patch, so I'll take a closer look at Simon's
hot standby now. Meanwhile, can you work on the above items and submit a
new version, please?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
sr-paging-rework.patch text/x-diff 32.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Boszormenyi Zoltan 2009-09-21 09:40:04 Re: SELECT ... FOR UPDATE [WAIT integer | NOWAIT] for 8.5
Previous Message Peter Eisentraut 2009-09-21 07:20:43 Re: Linux LSB init script