Re: Sync Rep: First Thoughts on Code

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Sync Rep: First Thoughts on Code
Date: 2008-12-02 19:08:06
Message-ID: 1228244886.14591.45.camel@dell.linuxdev.us.dell.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2008-12-02 at 13:09 +0000, Simon Riggs wrote:
> > Is it dangerous to abort the transaction with replication continued when
> > the timeout occurs? I think that the WAL consistency between two servers
> > might be broken. Because the WAL writing and sending are done concurrently,
> > and the backend might already write the WAL to disk on the primary when
> > waiting for walsender.
>
> The issue I see is that we might want to keep wal_sender_delay small so
> that transaction times are not increased. But we also want
> wal_sender_delay high so that replication never breaks. It seems better
> to have the action on wal_sender_delay configurable if we have an
> unsteady network (like the internet). Marcus made some comments on line
> dropping that seem relevant here; we should listen to his experience.
>
> Hmmm, dangerous? Well assuming we're linking commits with replication
> sends then it sounds it. We might end up committing to disk and then
> deciding to abort instead. But remember we don't remove the xid from
> procarray or mark the result in clog until the flush is over, so it is
> possible. But I think we should discuss this in more detail when the
> main patch is committed.
>

What is the "it" in "it is possible"? It seems like there's still a
problem window in there.

Even if that could be made safe, in the event of a real network failure,
you'd just wait the full timeout every transaction, because it still
thinks it's replicating.

If the timeout is exceeded, it seems more reasonable to abandon the
slave until you could re-sync it and continue processing as normal. As
you pointed out, that's not necessarily an expensive operation because
you can use something like rsync. The process of re-syncing might be
made easier (or perhaps less costly), of course.

If we want to still allow processing to happen after a timeout, it seems
reasonable to have a configurable option to allow/disallow non-read-only
transactions when out of sync.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2008-12-02 19:15:19 Re: PiTR and other architectures....
Previous Message Heikki Linnakangas 2008-12-02 18:47:19 Re: cvs head initdb hangs on unixware