Re: Sync Rep: First Thoughts on Code

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Sync Rep: First Thoughts on Code
Date: 2008-12-02 19:21:18
Message-ID: 1228245678.20796.410.camel@hp_dx2400_1
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Tue, 2008-12-02 at 11:08 -0800, Jeff Davis wrote:
> On Tue, 2008-12-02 at 13:09 +0000, Simon Riggs wrote:
> > > Is it dangerous to abort the transaction with replication continued when
> > > the timeout occurs? I think that the WAL consistency between two servers
> > > might be broken. Because the WAL writing and sending are done concurrently,
> > > and the backend might already write the WAL to disk on the primary when
> > > waiting for walsender.
> >
> > The issue I see is that we might want to keep wal_sender_delay small so
> > that transaction times are not increased. But we also want
> > wal_sender_delay high so that replication never breaks. It seems better
> > to have the action on wal_sender_delay configurable if we have an
> > unsteady network (like the internet). Marcus made some comments on line
> > dropping that seem relevant here; we should listen to his experience.
> >
> > Hmmm, dangerous? Well assuming we're linking commits with replication
> > sends then it sounds it. We might end up committing to disk and then
> > deciding to abort instead. But remember we don't remove the xid from
> > procarray or mark the result in clog until the flush is over, so it is
> > possible. But I think we should discuss this in more detail when the
> > main patch is committed.
> >
>
> What is the "it" in "it is possible"? It seems like there's still a
> problem window in there.

Marking a transaction aborted after we have written a commit record, but
before we have removed it from proc array and marked in clog. We'd need
a special kind of WAL record to do that.

> Even if that could be made safe, in the event of a real network failure,
> you'd just wait the full timeout every transaction, because it still
> thinks it's replicating.

True, but I did suggest having two timeouts.

There is considerable reason to reduce the timeout as well as reason to
increase it - at the same time.

Anyway, lets wait for some user experience following commit.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2008-12-02 20:13:02 Re: pg_stop_backup wait bug fix
Previous Message Simon Riggs 2008-12-02 19:15:19 Re: PiTR and other architectures....