Re: GIN improvements part2: fast scan

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN improvements part2: fast scan
Date: 2013-11-15 07:19:09
Message-ID: CAPpHfduOvUFT+GgCrnBOzRR19eTb9mDsDJwU=S0h477fdgr=RA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 15, 2013 at 12:34 AM, Heikki Linnakangas <
hlinnakangas(at)vmware(dot)com> wrote:

> On 14.11.2013 19:26, Alexander Korotkov wrote:
>
>> On Sun, Jun 30, 2013 at 3:00 PM, Heikki Linnakangas <
>> hlinnakangas(at)vmware(dot)com
>>
>>> wrote:
>>>
>>
>> On 28.06.2013 22:31, Alexander Korotkov wrote:
>>>
>>> Now, I got the point of three state consistent: we can keep only one
>>>> consistent in opclasses that support new interface. exact true and exact
>>>> false values will be passed in the case of current patch consistent;
>>>> exact
>>>> false and unknown will be passed in the case of current patch
>>>> preConsistent. That's reasonable.
>>>>
>>>>
>>> I'm going to mark this as "returned with feedback". For the next version,
>>> I'd like to see the API changed per above. Also, I'd like us to do
>>> something about the tidbitmap overhead, as a separate patch before this,
>>> so
>>> that we can assess the actual benefit of this patch. And a new test case
>>> that demonstrates the I/O benefits.
>>>
>>
>>
>> Revised version of patch is attached.
>> Changes are so:
>> 1) Patch rebased against packed posting lists, not depends on additional
>> information now.
>> 2) New API with tri-state logic is introduced.
>>
>
> Thanks! A couple of thoughts after a 5-minute glance:
>
> * documentation
>

Will provide documented version this week.

> * How about defining the tri-state consistent function to also return a
> tri-state? True would mean that the tuple definitely matches, false means
> the tuple definitely does not match, and Unknown means it might match. Or
> does return value true with recheck==true have the same effect? If I
> understood the patch, right, returning Unknown or True wouldn't actually
> make any difference, but it's conceivable that we might come up with more
> optimizations in the future that could take advantage of that. For example,
> for a query like "foo OR (bar AND baz)", you could immediately return any
> tuples that match foo, and not bother scanning for bar and baz at all.

The meaning of recheck flag when input contains unknown is undefined now. :)
For instance, we could define it in following ways:
1) Like returning Unknown meaning that consistent with true of false
instead of input Unknown could return either true or false.
2) Consistent with true of false instead of input Unknown could return
recheck. This meaning is probably logical, but I don't see any usage of it.

I'm not against idea of tri-state returning value for consisted, because
it's logical continuation of its tri-state input. However, I don't see
usage of distinguish True and Unknown in returning value for now :)

In example you give we can return foo immediately, but we have to create
full bitmap. So we anyway will have to scan (bar AND baz). We could skip
part of trees for bar and baz. But it's possible only when foo contains
large amount of sequential TIDS so we can be sure that we didn't miss any
TIDs. This seems to be very narrow use-case for me.

Another point is that one day we probably could immediately return tuples
in gingettuple. And with LIMIT clause and no sorting we can don't search
for other tuples. However, gingettuple was removed because of reasons of
concurrency. And my patches for index-based ordering didn't return it in
previous manner: it collects all the results and then returns them
one-by-one.

------
With best regards,
Alexander Korotkov.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2013-11-15 08:40:20 Re: [PATCH] pg_upgrade: support for btrfs copy-on-write clones
Previous Message Kyotaro HORIGUCHI 2013-11-15 06:58:47 Re: Using indices for UNION.