From: | Oleg Bartunov <obartunov(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
Cc: | Alexander Korotkov <aekorotkov(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Collation-aware comparisons in GIN opclasses |
Date: | 2014-09-29 08:08:35 |
Message-ID: | CAF4Au4zYsJUmqH=zSiBrUuN2HTBr8Yhq8a8T+T707a0E0CPCmw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Sep 29, 2014 at 11:48 AM, Heikki Linnakangas <
hlinnakangas(at)vmware(dot)com> wrote:
> On 09/15/2014 06:28 PM, Alexander Korotkov wrote:
>
>> Hackers,
>>
>> some GIN opclasses uses collation-aware comparisons while they don't need
>> to do especially collation-aware comparison. Examples are text[] and
>> hstore
>> opclasses.
>>
>
> Hmm. It would be nice to use the index for inequality searches, at least
> on text[]. We don't support that currently, but it would require
> collation-awareness.
>
> Depending on collation this may make them a much slower.
>>
>> See example.
>>
>> # show lc_collate ;
>> lc_collate
>> ─────────────
>> ru_RU.UTF-8
>> (1 row)
>>
>> # create table test as (select array_agg(i::text) from
>> generate_series(1,1000000) i group by (i-1)/10);
>> SELECT 100000
>>
>> # create index test_idx on test using gin(array_agg);
>> CREATE INDEX
>> Time: *26930,423 ms*
>>
>> # create index test_idx2 on test using gin(array_agg collate "C");
>> CREATE INDEX
>> Time: *5143,682 ms*
>>
>> Index creation with collation "ru_RU.UTF-8" is 5 times slower while
>> collation has absolutely no effect on index functionality.
>>
>
> It occurs to me that practically all of those comparisons happen when we
> populate the red-black Tree, during the index build. The purpose of the
> red-black tree is to collect identical keys together, but there is actually
> no requirement that the order of the red-black tree matches the order of
> the index. It also isn't strictly required that it recognizes equal keys as
> equal. The only requirement is that it doesn't incorrectly put two keys
> that are equal according to the compare-function, into two different nodes.
>
>
Good point, Heikki. I experienced several times this problem, fixed it with
C-locale and forgot again. Now, it's time to fix !
> We could therefore use plain memcmp() to compare the Datums while building
> the red-black tree. Keys that are bit-wise equal are surely considered as
> equal by the compare-function. That makes the index build a lot faster.
> With the attached quick patch:
>
> postgres=# create index test_idx on test using gin(array_agg );
> CREATE INDEX
> Time: 880.620 ms
>
> This is on my laptop. Without the patch, that takes about 4.7 seconds with
> the C locale, so this is much faster than even using the C locale.
>
Hmm, on my MBA I got
17277.734 (patch) vs 39151.562 for ru_RU.UTF-8 and
6131.929 (patch) vs 6131.929 for C
Not much :(
>
> - Heikki
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2014-09-29 08:10:26 | Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} |
Previous Message | Heikki Linnakangas | 2014-09-29 07:48:43 | Re: Collation-aware comparisons in GIN opclasses |