Re: GIN improvements part2: fast scan

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN improvements part2: fast scan
Date: 2014-03-12 17:42:32
Message-ID: CAPpHfduJZiJAiR7qfOt2mH6kNbuBoQMbYJb9+7LgEb5pv53eHA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 12, 2014 at 8:29 PM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com
> wrote:

> On 03/12/2014 12:09 AM, Tomas Vondra wrote:
>
>> Hi all,
>>
>> a quick question that just occured to me - do you plan to tweak the cost
>> estimation fot GIN indexes, in this patch?
>>
>> IMHO it would be appropriate, given the improvements and gains, but it
>> seems to me gincostestimate() was not touched by this patch.
>>
>
> Good point. We have done two major changes to GIN in this release cycle:
> changed the data page format and made it possible to skip items without
> fetching all the keys ("fast scan"). gincostestimate doesn't know about
> either change.
>
> Adjusting gincostestimate for the more compact data page format seems
> easy. When I hacked on that, I assumed all along that gincostestimate
> doesn't need to be changed as the index will just be smaller, which will be
> taken into account automatically. But now that I look at gincostestimate,
> it assumes that the size of one item on a posting tree page is a constant 6
> bytes (SizeOfIptrData), which is no longer true. I'll go fix that.
>
> Adjusting for the effects of skipping is harder. gincostestimate needs to
> do the same preparation steps as startScanKey: sort the query keys by
> frequency, and call consistent function to split the keys intao "required"
> and "additional" sets. And then model that the "additional" entries only
> need to be fetched when the other keys match. That's doable in principle,
> but requires a bunch of extra code.
>
> Alexander, any thoughts on that? It's getting awfully late to add new code
> for that, but it sure would be nice somehow take fast scan into account.

Preparation we do in startScanKey requires knowledge of estimate size of
posting lists/trees. We do this estimate by traversal to leaf pages. I
think gincostestimate is expected to be way more cheap. So, we probably
need so more rough estimate there, don't we?

------
With best regards,
Alexander Korotkov.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-03-12 17:44:29 Re: COPY table FROM STDIN doesn't show count tag
Previous Message Robert Haas 2014-03-12 17:41:05 Re: The case against multixact GUCs