Re: gist vacuum gist access

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Костя Кузнецов <chapaev28(at)ya(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gist vacuum gist access
Date: 2014-09-08 08:19:51
Message-ID: CAPpHfdtxftKcM43EBbj=KteW-3LTTRTkm_ABEt0pMvkfVwQwSA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 8, 2014 at 12:08 PM, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
wrote:

> On Mon, Sep 8, 2014 at 11:13 AM, Heikki Linnakangas <
> hlinnakangas(at)vmware(dot)com> wrote:
>
>> On 09/07/2014 05:11 PM, Костя Кузнецов wrote:
>>
>>> hello.
>>> i recode vacuum for gist index.
>>> all tests is ok.
>>> also i test vacuum on table size 2 million rows. all is ok.
>>> on my machine old vaccum work about 9 second. this version work about
>>> 6-7 sec .
>>> review please.
>>>
>>
>> If I'm reading this correctly, the patch changes gistbulkdelete to scan
>> the index in physical order, while the old code starts from the root and
>> scans the index from left to right, in logical order.
>>
>> Scanning the index in physical order is wrong, if any index pages are
>> split while vacuum runs. A page split could move some tuples to a
>> lower-numbered page, so that the vacuum will not scan those tuples.
>>
>> In the b-tree code, we solved that problem back in 2006, so it can be
>> done but requires a bit more code. In b-tree, we solved it with a "vacuum
>> cycle ID" number that's set on the page halves when a page is split. That
>> allows VACUUM to identify pages that have been split concurrently sees
>> them, and "jump back" to vacuum them too. See commit
>> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=
>> 5749f6ef0cc1c67ef9c9ad2108b3d97b82555c80. It should be possible to do
>> something similar in GiST, and in fact you might be able to reuse the NSN
>> field that's already set on the page halves on split, instead of adding a
>> new "vacuum cycle ID".
>
>
> Idea is right. But in fact, does GiST ever recycle any page? It has
> F_DELETED flag, but ISTM this flag is never set. So, I think it's possible
> that this patch is working correctly. However, probably GiST sometimes
> leaves new page unused due to server crash.
> Anyway, I'm not fan of committing patch in this shape. We need to let GiST
> recycle pages first, then implement VACUUM similar to b-tree.
>

Another note. Assuming we have NSN which can play the role of "vacuum cycle
ID", can we implement sequential (with possible "jump back") index scan for
GiST?

------
With best regards,
Alexander Korotkov.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2014-09-08 08:46:29 Re: gist vacuum gist access
Previous Message Alexander Korotkov 2014-09-08 08:08:51 Re: gist vacuum gist access