Re: Hot standby and GiST page splits (was Re: WIP: Fast GiST index build)

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot standby and GiST page splits (was Re: WIP: Fast GiST index build)
Date: 2011-08-01 10:13:54
Message-ID: CA+U5nMKgR2mQpvXd+3AvHyVBvyUh7FEMHQUiZWDuip+LYVqZzg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 1, 2011 at 10:38 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 27.07.2011 17:43, Alexander Korotkov wrote:
>>
>> OK, thanks. I also found behaviour of GiST(without patch) with streaming
>> replication that seems strange for me. On master there are only few
>> rightlinks are InvalidBlockNumber while on slave there are a lot of them.
>> I
>> hack gevel for getting index structure on slave (accessing tree without
>> AccessExclusiveLock).
>>
>> On master:
>> # create table test as (select point(random(),random()) from
>> generate_series(1,100000));
>> # create index test_idx on test using gist(point);
>> # \copy (select gist_tree('test_idx')) to 'tree1r.txt';
>>
>> On slave:
>> # \copy (select gist_tree('test_idx')) to 'tree2r.txt';
>>
>> In bash:
>> # cat tree1r.txt | sed 's/\\n/\n/g'>  tree1.txt
>> # cat tree2r.txt | sed 's/\\n/\n/g'>  tree2.txt
>> # diff tree1.txt tree2.txt
>>
>> 2,89c2,89
>> <      1(l:1) blk: 324 numTuple: 129 free: 2472b(69.71%) rightlink:637
>> (OK)
>> <          1(l:2) blk: 242 numTuple: 164 free: 932b(88.58%) rightlink:319
>> (OK)
>> <          2(l:2) blk: 525 numTuple: 121 free: 2824b(65.39%) rightlink:153
>> (OK)
>> <          3(l:2) blk: 70 numTuple: 104 free: 3572b(56.23%) rightlink:551
>> (OK)
>> <          4(l:2) blk: 384 numTuple: 106 free: 3484b(57.30%) rightlink:555
>> (OK)
>> <          5(l:2) blk: 555 numTuple: 121 free: 2824b(65.39%) rightlink:74
>> (OK)
>> <          6(l:2) blk: 564 numTuple: 109 free: 3352b(58.92%) rightlink:294
>> (OK)
>> <          7(l:2) blk: 165 numTuple: 108 free: 3396b(58.38%) rightlink:567
>> (OK)
>> .....
>> ---
>>>
>>>     1(l:1) blk: 324 numTuple: 129 free: 2472b(69.71%)
>>> rightlink:4294967295
>>
>> (InvalidBlockNumber)
>>>
>>>         1(l:2) blk: 242 numTuple: 164 free: 932b(88.58%)
>>
>> rightlink:4294967295 (InvalidBlockNumber)
>>>
>>>         2(l:2) blk: 525 numTuple: 121 free: 2824b(65.39%)
>>
>> rightlink:4294967295 (InvalidBlockNumber)
>>>
>>>         3(l:2) blk: 70 numTuple: 104 free: 3572b(56.23%)
>>
>> rightlink:4294967295 (InvalidBlockNumber)
>>>
>>>         4(l:2) blk: 384 numTuple: 106 free: 3484b(57.30%)
>>
>> rightlink:4294967295 (InvalidBlockNumber)
>>>
>>>         5(l:2) blk: 555 numTuple: 121 free: 2824b(65.39%)
>>
>> rightlink:4294967295 (InvalidBlockNumber)
>>>
>>>         6(l:2) blk: 564 numTuple: 109 free: 3352b(58.92%)
>>
>> rightlink:4294967295 (InvalidBlockNumber)
>>>
>>>         7(l:2) blk: 165 numTuple: 108 free: 3396b(58.38%)
>>
>> rightlink:4294967295 (InvalidBlockNumber)
>> .....
>>
>> Isn't it a bug?
>
> Yeah, it sure looks like a bug. I was certain that I had broken this in the
> recent changes to GiST handling of page splits, but in fact it has been like
> that forever.
>
> The rightlinks are not needed after crash recovery, because all the
> downlinks should be there. A scan will find all pages through the downlinks,
> and doesn't need to follow any rightlinks. I'm not sure why we explicitly
> clear them, it's not like the rightlinks would do any harm either, but for
> crash recovery that's harmless.
>
> But a scan during hot standby can see those intermediate states, just like
> concurrent scans can on the master. The locking on replay of page split
> needs to be fixed, too. At the moment, it locks and writes out each page
> separately, so a concurrent scan could "overtake" the WAL replay while
> following rightlinks, and miss tuples on the right half.
>
> Attached is a patch for that for 9.1/master. The 9.0 GiST replay code was
> quite different, it will require a separate patch.

Hmm, I was assured no changes would be required for Hot Standby for
GIST and GIN. Perhaps we should check GIN code also.

Does the order of locking of the buffers matter? I'm sure it does.

Did you want me to write the patch for 9.0?

And what does NSN stand for? :-)

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2011-08-01 10:44:23 Re: Hot standby and GiST page splits (was Re: WIP: Fast GiST index build)
Previous Message Heikki Linnakangas 2011-08-01 09:38:13 Hot standby and GiST page splits (was Re: WIP: Fast GiST index build)