Skip site navigation (1) Skip section navigation (2)

Re: Hot Standby: Relation-specific deferred conflict resolution

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hot Standby: Relation-specific deferred conflict resolution
Date: 2010-01-30 11:38:08
Message-ID: 1264851488.13782.1858.camel@ebony (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
On Fri, 2010-01-29 at 14:52 +0000, Greg Stark wrote:

> Can you explain what it does in
> more detail so we can understand why it's necessary for a sensible set
> of features?

I've slimmed down the patch to make it clearer what it does, having
committed some refactoring.

Problem: Currently when we perform conflict resolution we do not use the
relid from the WAL record, we target all users regardless of which
relations they have accessed or intend to access. So changes to table X
can cause cancelation of someone accessing table Y because **they might
later in the transaction access table X**. That is too heavy handed and
is most often overkill. This is the same problem you and I have
discussed many times, over the last 14 months, though the problem itself
has been discussed on hackers many times over last 20 months and many
potential solutions offered by me.

An example of current behaviour, using tables A, B and C 
T0: An AccessExclusiveLock is applied to B
T1: Q1 takes snapshot, takes lock on A and begins query 
T2: Q2 takes snapshot, queues for lock on B behind AccessExclusiveLock
T3: Cleanup on table C is handled that will conflict with both snapshots
T4: Q3 takes snapshot, takes lock on C and begins query (if possible)
T5: Cleanup on table C is handled that will conflict with Q3

Current: At T3, current conflict resolution will wait for
max_standby_delay and then cancel Q1 and Q2. Q3 can begin processing
immediately because the snapshot it takes will always be same or later
than the xmin that generated the cleanup at T3. At T5, Q3 will be
quickly cancelled because all the standby delay was used up at T3 and
there is none left to spend on delaying for Q3.

Proposed Resolution: 
as presented to hackers in 12/2009

Let's look at the effect first, then return to the detail.

In this proposal, the above sequence of actions will look like this:
Conflict resolution will wait at T3 until we hit max_standby_delay, at
which point we learn that Q1 and Q2 do not conflict and we let them
continue on their way. At T5, Q3 will be cancelled without much delay, 
because we have now used up most of max_standby_delay. 

So in both approaches, Q3 that accessed table C will be canceled fairly
quickly. The key to this is that in the new proposal, Q1 and Q2 will not
be canceled: they will continue to completion.

How it works: When we process a snapshot conflict we check which queries
have snapshots that conflict. We then wait for max_standby_delay and the
check lock conflicts. (We do it this way because of a timing issue
described on the link above, pointed out by Greg). When we check for
lock conflicts we also set a latestRemovedXid on "that relation", so
that we capture all current lockers *and* allow all future lockers to
check the latestRemovedXid against their snapshot. In either case, if a
lock conflict occurs then we will cancel the query.

I mention "that relation" because *where* we record the xid limit for
each relation is an important aspect of the design. In the current patch
we take a simple approach, others are possible. If there is already a
lock in the shared lock table, then we add the latestRemovedXid to that.
If not, we keep track of the latestRemovedXid for the whole lock
partition. So we aren't tracking each relation separately in most cases,
except for when a table is being frequently accessed, or access for a
long period.

There is also an optimization added here. When we defer cancelation of
queries the same query keeps re-appearing in the conflict list for later
WAL records. As a result there is a mechanism to avoid constant
re-listing of a conflict.

The attached patch is for review and discussion only at this stage.

I'm working on other areas now while discussion takes place, or not.

 Simon Riggs 

Attachment: relation_specific_conflict_res.v3.patch
Description: text/x-patch (23.1 KB)

In response to

pgsql-hackers by date

Next:From: Cédric VillemainDate: 2010-01-30 12:19:44
Subject: Re: PG 9.0 and standard_conforming_strings
Previous:From: Ivan Sergio BorgonovoDate: 2010-01-30 10:57:27
Subject: development setup and libdir

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group