Re: Block level concurrency during recovery

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Block level concurrency during recovery
Date: 2008-11-03 12:58:26
Message-ID: 1225717107.3971.832.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Thu, 2008-10-23 at 09:57 +0100, Simon Riggs wrote:
> On Thu, 2008-10-23 at 09:09 +0300, Heikki Linnakangas wrote:
>
> > However, we require that in b-tree vacuum, you take a cleanup lock on
> > *every* leaf page of the index, not only those that you modify. That's a
> > problem, because there's no trace of such pages in the WAL.
>
> OK, good. Thanks for the second opinion. I'm glad you said that, cos I
> felt sure anybody reading the patch would say "what the hell does this
> bit do?". Now I can add it.

Heikki,

When we discussed this before, I was glad that you'd mentioned that
aspect since it allowed me to say "if two of us think that then it must
be true".

I didn't include that in the final patch because it felt wrong. I didn't
have a rational explanation for that then, just a bad feeling. So, after
lots of sleep, here's my rational explanation of why we do *not* need
that during hot standby queries:

VACUUM with a btree index proceeds like this:
1. Scan table
2. Remove rows from btree identified in (1)
3. Remove rows from heap identified in (1)

The purpose of the additional locking requirements during (2) for btrees
is to ensure that we do not fail to find the rows identified in (1),
because the rows can move after (1) and during (2) because of block
splits.

Requoting verbatim from the README: "The tricky part of this is to avoid
missing any deletable tuples in the presence of concurrent page splits:
a page split could easily move some tuples from a page not yet passed
over by the sequential scan to a lower-numbered page already passed
over." In recovery there are no concurrent page splits and the WAL
records represent already successfully identified deletable tuples.

On a standby server the rows will not move other than via WAL records.
So there is no possibility that a WAL record will fail to find the row
it was looking for. On the master we were looking for a tuple that
pointed to a htid, whereas in WAL replay we look directly at the index
tuple via its tid, not via the htid it points to. Therefore we do not
need the additional locking.

That seems logical to me, so I will leave that out.

Any alternative views?

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2008-11-03 13:02:24 Re: pre-MED
Previous Message Hitoshi Harada 2008-11-03 12:41:26 Re: Windowing Function Patch Review -> Performance Comparison.