Re: Avoid full GIN index scan when possible

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Marc Cousin <cousinmarc(at)gmail(dot)com>
Subject: Re: Avoid full GIN index scan when possible
Date: 2020-01-08 16:31:52
Message-ID: CAPpHfdthhf8S-UjE46hjy+g7pgGhy=hav1-BUmQq6+hKmY13nA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, Tomas!

Thank you for your feedback!

On Mon, Jan 6, 2020 at 6:22 PM Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:

> Yeah, I can confirm those results, although on my system the timings are
> a bit different (I haven't tested v8):
>
> | Query time, ms
> WHERE condition | master | v9
> ---------------------------------------+--------+---------
> a @> '{}' | 610 | 589
> a @> '{}' and b @> '{}' | 185 | 665
> a @> '{}' and b @> '{}' and c @> '{}' | 185 | 741
>
> So that's something we probably need to address, perhaps by using the
> GIN_CAT_EMPTY_QUERY entries as proposed.
>

Yeah, handling nulls better without regression in some cases is hard.
For now I see at least 3 different ways of nulls handling, assuming there
is another non-excluding scan key:

1) Collect non-null matches by full scan of all non-null entries.
2) Exclude null marches using scan of null entry.
3) Force recheck.

Each method have its own advantages and disadvantages. We probably
would need some cost-based decision making algorithm based on statistics.
I'm not entirely sure it's OK to do this execution time. However, it
probably
could be classified as "adaptive query processing", which is considered as
cool trend in DBMS.

Attached version 10 of patch doesn't change null handling in comparison
with master. It eliminates full index scan only if there is another scan
on the
same column. So, it never adds null item to the scan key. I've rerun tests
from Nikita [1].

| | Query time, ms |
# | WHERE condition | master | v10 |
---+----------------------------------------+--------+-------+
1 | a @> '{}' | 223 | 218 |
2 | a @> '{}' and b @> '{}' | 302 | 308 |
3 | a @> '{}' and b @> '{}' and c @> '{}' | 405 | 404 |
4 | a @> '{}' and a @@ '1' | 59 | 0.3 |
5 | a @> '{}' and a @@ '-1' | 64 | 2.2 |
6 | a @@ '!-1' and a @@ '1' | 63 | 0.3 |
7 | a @@ '!1' and a @@ '-1' | 62 | 3.0 |

It appears that absolute numbers for master are higher than they were
previous time [2]. I've rechecked multiple times that current numbers are
correct. So, it might be I didn't turn off sequential scan previous time.

We can see that cases #1, #2, #3, which have quals over multiple attributes
have the same execution time as in master. That's expected since scanning
strategy is the same. Cases #4, #5, #6, #7 have about the same improvement
as in v9.

I've also rerun many nulls test from Nikita [3].

| Query time, ms |
WHERE condition | master | v10 |
---------------------------------------+--------+-------+
a @> '{}' | 190 | 192 |
a @> '{}' and b @> '{}' | 55 | 57 |
a @> '{}' and b @> '{}' and c @> '{}' | 60 | 58 |

The results are the same as in master again.

I've also tested this on a database storing mailing lists archives with
> a trigram index, and in that case the performance with short values gets
> much better. The "messages" table has two text fields with a GIN trigram
> index - subject and body, and querying them with short/long values works
> like this:
>
> WHERE | master | v9
> --------------------------------------------------------------
> subject LIKE '%aa%' AND body LIKE '%xx%' | 4943 | 4052
> subject LIKE '%aaa%' AND body LIKE '%xx%' | 10 | 10
> subject LIKE '%aa%' AND body LIKE '%xxx%' | 380 | 13
> subject LIKE '%aaa%' AND BODY LIKE '%xxx%' | 2 | 2
>
> which seems fairly nice. I've done tests with individual columns, and
> that seems to be working fine too.
>

Cool, thanks!

So, I think v10 is a version of patch, which can be committed after
some cleanup. And we can try doing better nulls handling in a separate
patch.

Links
1.
https://www.postgresql.org/message-id/f2889144-db1d-e3b2-db97-cfc8794cda43%40postgrespro.ru
2.
https://www.postgresql.org/message-id/CAPpHfdvT_t6ShG2pvptEWceDxEnyNRsm2MxmCWWvxBzQ-pbMuw%40mail.gmail.com
3.
https://www.postgresql.org/message-id/b53614eb-6f9f-8c5c-9df8-f703b0b102b6%40postgrespro.ru

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
0001-Avoid-GIN-full-scan-for-empty-ALL-keys-v10.patch application/octet-stream 27.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-01-08 18:12:05 Re: xact_start for walsender & logical decoding not updated
Previous Message Stephen Frost 2020-01-08 15:58:09 Re: weird libpq GSSAPI comment