Re: Fillfactor for GIN indexes

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fillfactor for GIN indexes
Date: 2015-07-09 13:33:41
Message-ID: CAPpHfdux70vzCDoyeaf=QahsJ8stzU7xFEdFtkiLbjY7OaFLWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

On Wed, Jul 8, 2015 at 10:27 PM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:

> In dataPlaceToPageLeaf-function:
>
> if (append)
>> {
>> /*
>> * Even when appending, trying to append more items than
>> will fit is
>> * not completely free, because we will merge the new
>> items and old
>> * items into an array below. In the best case, every new
>> item fits in
>> * a single byte, and we can use all the free space on
>> the old page as
>> * well as the new page. For simplicity, ignore segment
>> overhead etc.
>> */
>> maxitems = Min(maxitems, freespace +
>> GinDataPageMaxDataSize);
>> }
>>
>
> Hmm. So after splitting the page, there is freespace +
> GinDataPageMaxDataSize bytes available on both pages together. freespace
> has been adjusted with the fillfactor, but GinDataPageMaxDataSize is not.
> So this overshoots, because when leafRepackItems actually distributes the
> segments on the pages, it will fill both pages only up to the fillfactor.
> This is an upper bound, so that's harmless, it only leads to some
> unnecessary work in dealing with the item lists. But I think that should be:
>
> maxitems = Min(maxitems, freespace + leaf->maxdatasize);
>

Good catch! Fixed.

>
> else
>> {
>> /*
>> * Calculate a conservative estimate of how many new
>> items we can fit
>> * on the two pages after splitting.
>> *
>> * We can use any remaining free space on the old page to
>> store full
>> * segments, as well as the new page. Each full-sized
>> segment can hold
>> * at least MinTuplesPerSegment items
>> */
>> int nnewsegments;
>>
>> nnewsegments = freespace / GinPostingListSegmentMaxSize;
>> nnewsegments += GinDataPageMaxDataSize /
>> GinPostingListSegmentMaxSize;
>> maxitems = Min(maxitems, nnewsegments *
>> MinTuplesPerSegment);
>> }
>>
>
> This branch makes the same mistake, but this is calculating a lower bound.
> It's important that maxitems is not set to higher value than what actually
> fits on the page, otherwise you can get an ERROR later in the function,
> when it turns out that not all the items actually fit on the page. The
> saving grace here is that this branch is never taken when building a new
> index, because index build inserts all the TIDs in order, but that seems
> pretty fragile. Should use leaf->maxdatasize instead of
> GinDataPageMaxDataSize here too.
>

Fixed.

> But that can lead to funny things, if fillfactor is small, and BLCKSZ is
> small too. With the minimums, BLCKSZ=1024 and fillfactor=0.2, the above
> formula will set nnewsegments to zero. That's not going to end up well. The
> problem is that maxdatasize becomes smaller than
> GinPostingListSegmentMaxSize, which is a problem. I think
> GinGetMaxDataSize() needs to make sure that the returned value is always >=
> GinPostingListSegmentMaxSize.
>

Fixed.

> Now that we have a fillfactor, shouldn't we replace the 75% heuristic
> later in that function, when inserting to the rightmost page rather than
> building it from scratch? In B-tree, the fillfactor is applied to that case
> too.

Sounds reasonable. Now it works like btree.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
gin_fillfactor_6.patch application/octet-stream 12.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabrízio de Royes Mello 2015-07-09 13:39:35 Re: [HACKERS] Re: [HACKERS] GSoC 2015 proposal: Improve the performance of “ALTER TABLE .. SET LOGGED / UNLOGGED” statement
Previous Message Fujii Masao 2015-07-09 13:32:24 Re: FPW compression leaks information