Skip site navigation (1) Skip section navigation (2)

Re: B-tree parent pointer and checkpoints

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: B-tree parent pointer and checkpoints
Date: 2010-11-08 13:40:10
Message-ID: 4CD7FDBA.1020506@enterprisedb.com (view raw or flat)
Thread:
Lists: pgsql-hackers
On 02.11.2010 16:40, Heikki Linnakangas wrote:
> On 02.11.2010 16:30, Tom Lane wrote:
>> Heikki Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>>> I think we can fix this by requiring that any multi-WAL-record actions
>>> that are in-progress when a checkpoint starts (at the REDO-pointer) must
>>> finish before the checkpoint record is written.
>>
>> What happens if someone wants to start a new split while the checkpoint
>> is hanging fire?
>
> You mean after CreateCheckPoint has determined the redo pointer, but
> before it has written the checkpoint record? The new split can go ahead,
> and the checkpoint doesn't need care about it. Recovery will start at
> the redo pointer, so it will see the split record, and will know to
> finish the incomplete split if necessary.
>
> The logic is the same as with inCommit. Checkpoint will fetch the list
> of in-progress splits some time after determining the redo-pointer. It
> will then wait until all of those splits have finished. Any new splits
> that begin after fetching the list don't affect the checkpoint.
>
> inCommit can't be used as is, because it's tied to the Xid, but
> something similar should work.

Here's a first draft of this, using the inCommit flag as is. It works, 
but suffers from starvation if you have a lot of concurrent 
multi-WAL-record actions. I tested that by running INSERTs to a table 
with tsvector field with a GiST index on it from five concurrent 
sessions, and saw checkpoints regularly busy-waiting for over a minute.

To avoid that, we need something a little bit more complicated than a 
boolean flag. I'm thinking of adding a counter beside the inCommit flag 
that's incremented every time a new multi-WAL-record action begins, so 
that the checkpoint process can distinguish between a new action that 
was started after deciding the REDO pointer and an old one that's still 
running.

(inCommit is a misnomer now, of course. Will need to find a better name..)

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Attachment: split-delay-checkpoint-1.patch
Description: text/x-diff (14.3 KB)

In response to

Responses

pgsql-hackers by date

Next:From: Aidan Van DykDate: 2010-11-08 15:10:18
Subject: Re: Protecting against unexpected zero-pages: proposal
Previous:From: Shigeru HANADADate: 2010-11-08 12:08:26
Subject: Re: SQL/MED estimated time of arrival?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group