Re: Corner cases with GiST n-way splits

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Corner cases with GiST n-way splits
Date: 2012-05-10 18:04:22
Message-ID: CAPpHfdu5w3UBQ0ALmFWGg0EqJVOPHMz9QqdXBJeBdgwQW8k_3g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 10, 2012 at 9:14 PM, Heikki Linnakangas <
heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> GiST page splitting has the peculiarity that it sometimes needs to split a
> single page into more than two pages. It happens rarely in practice, but it
> possible (*). With a bad picksplit function, it happens more often.
>
> While testing with a custom gist opclass with truly evil helper functions,
> I found two corner cases with the current implementation when a page is
> split into many halves:
>
> 1. If a page is split into more than 100 pages, you run into the same
> limit of 100 simultaneous lwlocks that Tom Forbes reported with a
> pathological intarray index. This time it's not because we hold locks on
> many different levels, but because of a single split.
>
> 2. When the root page is split, there is no parent page to update, so we
> just create a new root page with the downlinks. However, when you split a
> page into a lot of siblings, it's possible that all the downlinks don't fit
> on a single page. The code is prepared for that situation. You get an
> error, when it tries to add more downlinks on a single page than fit there.
>
> I'm not sure what to do about these. Neither issue is something you'd
> actually bump into in an index that's doing something useful; there's been
> no user complaints about these.
>

If such cases are very rare, we could call genericPickSplit if decide user
picksplit function result to be bad. We're already doing this if user
picksplit puts all the tuples into one page.

GiST can split page into many pages because of nulls and multicolumn
indexes independently on user picksplit function.
Imagine we've following tuples:

tuple key1 key2 key3 key4 key5 ...
1 value value value value value ...
2 NULL value value value value ...
3 NULL NULL NULL value value ...
4 NULL NULL NULL NULL value ...
5 NULL NULL NULL NULL NULL ...
......

In this case splitByKey will find only non-null value in the first key and
splits it into separate page. Then it will do same thing for the second key
etc. However, this process is limited by INDEX_MAX_KEYS.

BTW, I was thinking that it's not very right to let user picksplit decide
what split ratio (ratio of tuples in smaller and greater pages) is
acceptable. We could pass minimal acceptable ratio to use picksplit
function and call genericPickSplit if ratio is not followed. However, the
question arises if we're going to measure ratio in tuples count or in
tuples size. Ratio in tuples size seems to be more desirable while ratio in
tuples count seems to be easier for user picksplit to follow.

------
With best regards,
Alexander Korotkov.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2012-05-10 18:07:06 Re: Gsoc2012 idea, tablesample
Previous Message Robert Haas 2012-05-10 17:51:28 Re: Draft release notes complete