Re: GIN improvements part 3: ordering in index

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN improvements part 3: ordering in index
Date: 2013-06-24 22:24:37
Message-ID: CAPpHfdtthB9Pooks7m4XTi=mBmTrfhBBYBOBfe4F8Uv1MiHgnA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 19, 2013 at 1:21 AM, Alexander Korotkov <aekorotkov(at)gmail(dot)com>wrote:

> On Mon, Jun 17, 2013 at 10:27 PM, Heikki Linnakangas <
> hlinnakangas(at)vmware(dot)com> wrote:
>
>> On 17.06.2013 15:56, Alexander Korotkov wrote:
>>
>>> On Sat, Jun 15, 2013 at 3:02 AM, Alexander Korotkov<aekorotkov(at)gmail(dot)com
>>> >**wrote:
>>>
>>> This patch introduces new interface method of GIN which takes same
>>>> arguments as consistent but returns float8.
>>>> float8 gin_ordering(bool check[], StrategyNumber n, Datum query, int32
>>>> nkeys, Pointer extra_data[], bool *recheck, Datum queryKeys[], bool
>>>> nullFlags[], Datum addInfo[], bool addInfoIsNull[])
>>>> This patch implements gingettuple method which can return ordering data
>>>> using KNN infrastructure. Also it introduces>< operator for fts which
>>>> support ordering in GIN index. Some example:
>>>>
>>>> postgres=# explain analyze select * from dblp_titles2 where tsvector @@
>>>> to_tsquery('english', 'statistics') order by tsvector><
>>>> to_tsquery('english', 'statistics') limit 10;
>>>>
>>>> QUERY
>>>> PLAN
>>>>
>>>> ------------------------------**------------------------------**
>>>> ------------------------------**------------------------------**
>>>> -------------------------
>>>> Limit (cost=12.00..48.22 rows=10 width=136) (actual time=6.999..7.120
>>>> rows=10 loops=1)
>>>> -> Index Scan using dblp_titles2_idx on dblp_titles2
>>>> (cost=12.00..43003.03 rows=11868 width=136) (actual time=6.996..7.115
>>>> rows=10 loops=1)
>>>> Index Cond: (tsvector @@ '''statist'''::tsquery)
>>>> Order By: (tsvector>< '''statist'''::tsquery)
>>>> Total runtime: 7.556 ms
>>>> (5 rows)
>>>>
>>>>
>>> Attached version of patch has some refactoring and bug fixes.
>>>
>>
>> Thanks. There are no docs changes and not many comments, that needs to be
>> fixed, but I think I understand how it works:
>>
>> On the first call to gingettuple, the index is first scanned for all the
>> matches, which are collected in an array in memory. Then, the array is
>> sorted with qsort(), and the matches are returned one by one from the
>> sorted array.
>>
>
> Right.
>
> That has some obvious limitations. First of all, you can run out of
>> memory.
>
>
> Yes, it is so. qsort should be replaced with tuplesort.
>

In attached patch qsort is replaced with tuplesort. As expected, it leads
to some performance drawback, but it's not dramatic.
Also, some doc is added for new distance method of GIN.

------
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
gin_ordering.4.patch.gz application/x-gzip 14.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2013-06-24 22:39:12 Re: Patch for removng unused targets
Previous Message Alexander Korotkov 2013-06-24 22:20:59 Re: GIN improvements part2: fast scan