Breaking down of patch into sections works very well for review. Should
allow us to get different reviewers on different parts of the code -
review wranglers please take note: Dave, Josh.
Can you confirm that all the docs on the Wiki page are up to date? There
are a few minor discrepancies that make me think it isn't.
Examples: "For example, to make a single multi-statement transaction
replication asynchronously when the default is the opposite, issue SET
LOCAL synchronous_commit TO OFF within the transaction."
Do we mean synchronous_replication in this sentence? I think you've
copied the text and not changed all of the necessary parts - please
re-read the whole section (probably the whole Wiki, actually).
"wal_writer_delay" - do we mean wal_sender_delay? Is there some ability
to measure the amount of data to be sent and avoid the delay altogether,
when the server is sufficiently busy?
The reaction to replication_timeout may need to be configurable. I might
not want to keep on processing if the information didn't reach the
standby. I would prefer in many cases that the transactions that were
waiting for walsender would abort, but the walsender kept processing.
How can we restart the walsender if it shuts down? Do we want a maximum
wait for a transaction and a maximum wait for the server? Do we report
stats on how long the replication has been taking? If the average rep
time is close to rep timeout then we will be fragile, so we need some
way to notice this and produce warnings. Or at least provide info to an
external monitoring system.
How do we specify the user we use to connect to primary?
Definitely need more explanatory comments/README-style docs.
For example, 03_libpq seems simple and self-contained. I'm not sure why
we have a state called PGASYNC_REPLICATION; I was hoping that would be
dynamic, but I'm not sure where to look for that. It would be useful to
have a very long comment within the code to explain how the replication
messages work, and note on each function who the intended client and
02_pqcomm: What does HAVE_POLL mean? Do we need to worry about periodic
renegotiation of keys in be-secure.c? Not sure I understand why so many
new functions in there.
04_recovery_conf is a change I agree with, though I think it may not
work with EXEC_BACKEND for Windows.
05... I need dome commentary to explain this better.
06 and 07 are large and will take substantial review time. So we must
get the overall architecture done first and then check the code that
08 - I think I get this, but some docs will help to confirm.
09 pg_standby changes: so more changes are coming there? OK. Can we
refer to those two options as failover and switchover? There's no need
to change definitions that many Postgres people already use. This change
can be done without making any change to server behaviour, so this
change can have benefit to 8.2 and 8,3 people also.
01_signal_handling: I've looked at the LWlock acquires and releases in
the patch and am fairly happy, except for the ProcArrayLock acquire
during this sub-patch. Do we really need to do things this way? Is the
actual state important? Could we just do this with a counter which
cycles? So callers increment counter atomically and the reader just
polls to see if anybody has incremented? Or could we protect that part
of the proc with a different lock? Touching ProcArrayLock is bad news.
Anyway, feeling very positive about this. Hope we can get this reviewed
and committed in next 3-4 weeks.
I have many clues as to how to structure my own work also. Thanks.
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
pgsql-hackers by date
|Next:||From: Heikki Linnakangas||Date: 2008-12-01 12:08:07|
|Subject: New to_timestamp implementation is pretty strict|
|Previous:||From: Michael Meskes||Date: 2008-12-01 11:42:11|
|Subject: Looking for someone with MinGW|