Re: Sync Rep: First Thoughts on Code

From: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Sync Rep: First Thoughts on Code
Date: 2008-12-02 12:37:12
Message-ID: 3f0b79eb0812020437nd36d38dw1a89b0a77c7353d4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, Simon.

Thanks for taking many hours to review the code!!

On Mon, Dec 1, 2008 at 8:42 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> Can you confirm that all the docs on the Wiki page are up to date? There
> are a few minor discrepancies that make me think it isn't.

Documentation is ongoing. Sorry for my slow progress.

BTW, I'm going to add and change the sgml files listed on wiki.
http://wiki.postgresql.org/wiki/NTT%27s_Development_Projects#Documentation_Plan

>
> Examples: "For example, to make a single multi-statement transaction
> replication asynchronously when the default is the opposite, issue SET
> LOCAL synchronous_commit TO OFF within the transaction."
> Do we mean synchronous_replication in this sentence? I think you've
> copied the text and not changed all of the necessary parts - please
> re-read the whole section (probably the whole Wiki, actually).

Oops! It's just typo. Sorry for the confusion.
I will revise this section.

>
> "wal_writer_delay" - do we mean wal_sender_delay?

Yes. I will fix it.

> Is there some ability
> to measure the amount of data to be sent and avoid the delay altogether,
> when the server is sufficiently busy?

Why is the former ability required?

The latter is possible, I think. We can guarantee that the WAL is sent (in
more detail, called send(2)) once at least per wal_sender_delay. Of course,
it's dependent on the scheduler of a kernel.

>
> The reaction to replication_timeout may need to be configurable. I might
> not want to keep on processing if the information didn't reach the
> standby.

OK. I will add new GUC variable (PGC_SIGHUP) to specify the reaction for
the timeout.

> I would prefer in many cases that the transactions that were
> waiting for walsender would abort, but the walsender kept processing.

Is it dangerous to abort the transaction with replication continued when
the timeout occurs? I think that the WAL consistency between two servers
might be broken. Because the WAL writing and sending are done concurrently,
and the backend might already write the WAL to disk on the primary when
waiting for walsender.

> How can we restart the walsender if it shuts down?

Only restart the standby (with walreceiver). The standby connects to
the postmaster on the primary, then the postmaster forks new walsender.

> Do we want a maximum
> wait for a transaction and a maximum wait for the server?

ISTM that these feature are too much.

> Do we report
> stats on how long the replication has been taking? If the average rep
> time is close to rep timeout then we will be fragile, so we need some
> way to notice this and produce warnings. Or at least provide info to an
> external monitoring system.

Sounds good. How about log_min_duration_replication? If the rep time
is greater than it, we produce warning (or log) like log_min_duration_xx.

>
> How do we specify the user we use to connect to primary?

Yes, I need to add new option to specify the user name into
recovery.conf. Thanks for reminding me!

>
> Definitely need more explanatory comments/README-style docs.

Completely agreed ;-)
I will write README together with other documents.

>
> For example, 03_libpq seems simple and self-contained. I'm not sure why
> we have a state called PGASYNC_REPLICATION; I was hoping that would be
> dynamic, but I'm not sure where to look for that. It would be useful to
> have a very long comment within the code to explain how the replication
> messages work, and note on each function who the intended client and
> server is.
>

OK. I will re-consider whether PGASYNC_REPLICATION is removable, and
write the comment about it.

> 02_pqcomm: What does HAVE_POLL mean?

It identifies whether poll(2) is available or not on the platform. We
use poll(2)
if it's defined, otherwise select(2). There is similar code at pqSocketPoll() in
fe-misc.c.

> Do we need to worry about periodic
> renegotiation of keys in be-secure.c?

What is "keys" you mean?

> Not sure I understand why so many
> new functions in there.

It's because walsender waits for the reply from the standby and the
request from the backend concurrently. So, we need poll(2) or select(2)
to make walsender wait for them, and some functions for non-blocking
receiving.

>
> 04_recovery_conf is a change I agree with, though I think it may not
> work with EXEC_BACKEND for Windows.

OK. I will examine and fix it.

>
> 05... I need dome commentary to explain this better.
>
> 06 and 07 are large and will take substantial review time. So we must
> get the overall architecture done first and then check the code that
> implements that.
>
> 08 - I think I get this, but some docs will help to confirm.

Yes. I need more documentation.

>
> 09 pg_standby changes: so more changes are coming there? OK. Can we
> refer to those two options as failover and switchover?

You mean failover trigger and switchover one? ISTM that those names
and features might not suit.

Naming always bother me, and the current name "commit/abort trigger"
might tend to cause confusion. Is there any other suitable name?

> There's no need
> to change definitions that many Postgres people already use. This change
> can be done without making any change to server behaviour, so this
> change can have benefit to 8.2 and 8,3 people also.

Agreed.

>
> 01_signal_handling: I've looked at the LWlock acquires and releases in
> the patch and am fairly happy, except for the ProcArrayLock acquire
> during this sub-patch. Do we really need to do things this way? Is the
> actual state important? Could we just do this with a counter which
> cycles? So callers increment counter atomically and the reader just
> polls to see if anybody has incremented? Or could we protect that part
> of the proc with a different lock? Touching ProcArrayLock is bad news.

Agreed. I will add new lock for proc.signalFlags.

>
> Anyway, feeling very positive about this. Hope we can get this reviewed
> and committed in next 3-4 weeks.
>
> I have many clues as to how to structure my own work also. Thanks.

Thanks again!

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Teodor Sigaev 2008-12-02 12:37:44 Re: GIN index build speed
Previous Message Philip Warner 2008-12-02 12:02:12 PiTR and other architectures....