Re: Hot standby, conflict resolution

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot standby, conflict resolution
Date: 2009-01-26 15:02:09
Message-ID: 1232982129.2327.1698.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Sun, 2009-01-25 at 16:19 +0000, Simon Riggs wrote:
> On Fri, 2009-01-23 at 21:30 +0200, Heikki Linnakangas wrote:
>
> > Ok, then I think we have a little race condition. The startup process
> > doesn't get any reply indicating that the target backend has
> > processed
> > the SIGINT and set the cached conflict LSN. The target backend might
> > move ahead using the old LSN for a little while, even though the
> > startup
> > process has already gone ahead and replayed a vacuum record.
> >
> > Another tiny issue is that it looks like a new conflict LSN always
> > overwrites the old one. But you should always use the oldest
> > conflicted
> > LSN in the checks, not the newest.
>
> That makes it easier, because it is either not set, or it is set and
> does not need to be reset as new conflict LSNs appear.
>
> I can see a simple scheme emerging, which I will detail tomorrow
> morning.

Rather than signalling, we could use a hasconflict boolean for each proc
in a shared data structure. It can be read without spinlock, but should
only be written while holding spinlock.

Each time we read a block we check if hasconflict is set. If it is, we
grab spinlock, recheck if it is set, if so read the conflict details,
clear the flag and drop the spinlock.

The aim of this type of conflict resolution was to reduce the footprint
of users that would be effected and defer it as much as possible. We've
spent time getting the latestCompletedXid, but we know deriving that
value is very difficult in the btree case at least. So what I would like
to do is pass the relid of a conflict across as well and use that to
reduce the footprint, now that we are performing the test inside the
buffer manager.

We would keep a relid cache with a very small number of relids, perhaps
just one, maybe as many as 4 or 8, so that we can fit relids and
associated LSNs in a single cache line. We can match the relid using a
simple for loop, which we know is well optimised when there is no
dependency between the elements of the loop and the loop has a
compile-time fixed number of iterations.

I would be inclined to make this a separate shared memory area rather
than try to weld that onto PGPROC. We could index that using backendid.

If the relid cache overflows, we just apply a general LSN value.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2009-01-26 15:05:14 Re: 8.4 release planning (was Re: [COMMITTERS] pgsql: Automatic view update rules)
Previous Message Robert Haas 2009-01-26 14:32:42 Re: 8.4 release planning (was Re: [COMMITTERS] pgsql: Automatic view update rules)