Re: [PATCH] fix GIN index search sometimes losing results

From: Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: [PATCH] fix GIN index search sometimes losing results
Date: 2020-05-21 14:25:46
Message-ID: CALT9ZEFyTJb8d3hH7itVLUzdU6yXr40UhzJKYkd-vAnM8N2RKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi All!
1. Generally the difference of my patch in comparison to Tom's patch 0001
is that I tried to move previous logic of GIN's own TS_execute_ternary() to
the general logic of TS_execute_recurse and in case we have index without
positions to avoid diving into phrase operator replacing (only in this
case) in by an AND operator. The reason for this I suppose is speed and
I've done testing of some corner cases like phrase operator with big number
of OR comparisons inside it.

-----------------------------
BEFORE ANY PATCH:
Bitmap Heap Scan on pglist (cost=1715.72..160233.31 rows=114545
width=1234) (actual time=652.294..2719.961 rows=4904 loops=1)
Recheck Cond: (fts @@ '( ''worth'' | ''good'' | ''result'' | ''index'' |
''anoth'' | ''know'' | ''like'' | ''tool'' | ''job'' | ''think'' | ''slow''
| ''articl'' | ''knowledg'' | ''join'' | ''need'' | ''experi'' |
''understand'' | ''free'' | ''say'' | ''comment'' | ''littl'' | ''move'' |
''function'' | ''new'' | ''never'' | ''general'' | ''get'' | ''java'' |
''postgresql'' | ''notic'' | ''recent'' | ''serious'' ) <->
''start'''::tsquery)
Rows Removed by Index Recheck: 108191
Heap Blocks: exact=73789
-> Bitmap Index Scan on pglist_fts_idx (cost=0.00..1687.09 rows=114545
width=0) (actual time=636.883..636.883 rows=113095 loops=1)
Index Cond: (fts @@ '( ''worth'' | ''good'' | ''result'' |
''index'' | ''anoth'' | ''know'' | ''like'' | ''tool'' | ''job'' |
''think'' | ''slow'' | ''articl'' | ''knowledg'' | ''join'' | ''need'' |
''experi'' | ''understand'' | ''free'' | ''say'' | ''comment'' | ''littl''
| ''move'' | ''function'' | ''new'' | ''never'' | ''general'' | ''get'' |
''java'' | ''postgresql'' | ''notic'' | ''recent'' | ''serious'' ) <->
''start'''::tsquery)
Planning Time: 3.016 ms
Execution Time: *2721.002 ms*
-------------------------------
AFTER TOM's PATCH (0001)
Bitmap Heap Scan on pglist (cost=1715.72..160233.31 rows=114545
width=1234) (actual time=916.640..2960.571 rows=4904 loops=1)
Recheck Cond: (fts @@ '( ''worth'' | ''good'' | ''result'' | ''index'' |
''anoth'' | ''know'' | ''like'' | ''tool'' | ''job'' | ''think'' | ''slow''
| ''articl'' | ''knowledg'' | ''join'' | ''need'' | ''experi'' |
''understand'' | ''free'' | ''say'' | ''comment'' | ''littl'' | ''move'' |
''function'' | ''new'' | ''never'' | ''general'' | ''get'' | ''java'' |
''postgresql'' | ''notic'' | ''recent'' | ''serious'' ) <->
''start'''::tsquery)
Rows Removed by Index Recheck: 108191
Heap Blocks: exact=73789
-> Bitmap Index Scan on pglist_fts_idx (cost=0.00..1687.09 rows=114545
width=0) (actual time=900.472..900.472 rows=113095 loops=1)
Index Cond: (fts @@ '( ''worth'' | ''good'' | ''result'' |
''index'' | ''anoth'' | ''know'' | ''like'' | ''tool'' | ''job'' |
''think'' | ''slow'' | ''articl'' | ''knowledg'' | ''join'' | ''need'' |
''experi'' | ''understand'' | ''free'' | ''say'' | ''comment'' | ''littl''
| ''move'' | ''function'' | ''new'' | ''never'' | ''general'' | ''get'' |
''java'' | ''postgresql'' | ''notic'' | ''recent'' | ''serious'' ) <->
''start'''::tsquery)
Planning Time: 2.688 ms
Execution Time: *2961.704 ms*
----------------------------
AFTER MY PATCH (gin-gist-weight-patch-v3)
Bitmap Heap Scan on pglist (cost=1715.72..160233.31 rows=114545
width=1234) (actual time=616.982..2710.571 rows=4904 loops=1)
Recheck Cond: (fts @@ '( ''worth'' | ''good'' | ''result'' | ''index'' |
''anoth'' | ''know'' | ''like'' | ''tool'' | ''job'' | ''think'' | ''slow''
| ''articl'' | ''knowledg'' | ''join'' | ''need'' | ''experi'' |
''understand'' | ''free'' | ''say'' | ''comment'' | ''littl'' | ''move'' |
''function'' | ''new'' | ''never'' | ''general'' | ''get'' | ''java'' |
''postgresql'' | ''notic'' | ''recent'' | ''serious'' ) <->
''start'''::tsquery)
Rows Removed by Index Recheck: 108191
Heap Blocks: exact=73789
-> Bitmap Index Scan on pglist_fts_idx (cost=0.00..1687.09 rows=114545
width=0) (actual time=601.586..601.586 rows=113095 loops=1)
Index Cond: (fts @@ '( ''worth'' | ''good'' | ''result'' |
''index'' | ''anoth'' | ''know'' | ''like'' | ''tool'' | ''job'' |
''think'' | ''slow'' | ''articl'' | ''knowledg'' | ''join'' | ''need'' |
''experi'' | ''understand'' | ''free'' | ''say'' | ''comment'' | ''littl''
| ''move'' | ''function'' | ''new'' | ''never'' | ''general'' | ''get'' |
''java'' | ''postgresql'' | ''notic'' | ''recent'' | ''serious'' ) <->
''start'''::tsquery)
Planning Time: 3.115 ms
Execution Time: *2711.533 ms*

I've done the test several times and seems that difference is real effect,
though not very big (around 7%). So maybe there is some reason to save
PHRASE_AS_AND behavior for GIN-GIST indexes despite migration from GIN's
own TS_execute_ternary() to general TS_execute_recurse.

2. As for shifting from bool to ternary callback I am not quite sure
whether it can be useful to save bool callbacks via bool-ternary wrapper.
We can include this for compatibility with old callers and can drop. Any
ideas?

3. As for patch 0002 which removes TS_EXEC_CALC_NOT flag I'd like to note
that indexes which are written as extensions like RUM index (
https://github.com/postgrespro/rum) use this flag as default behavior of
TS_execute was NOT doing TS_EXEC_CALC_NOT. If we's like to change this
default it can break the callers. At least I propose to
leave TS_EXEC_CALC_NOT definition in ts_utils.h but in general I'd like to
save default behaviour of TS_execute and not apply patch 0002. Maybe it is
only worth to leave notice in a comments in code that TS_EXEC_CALC_NOT left
for compatibilty reasons etc.

I'd appreciate any ideas and review of all aforementioned patches.

Best regards,
Pavel Borisov.

ср, 20 мая 2020 г. в 18:04, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>:

> 1. Really if it's possible to avoid bool callbacks at all and shift
> everywhere to ternary it makes code quite beautiful and even. But I also
> think we are still not obliged to drop support for (legacy or otherwise)
> bool callbacks and also consistent functions form some old extensions (I
> don't know for sur, whether they exist) which expect old style bool result
> from TS_execute.
>
> In my patch I used ternary logic from TS_execute_recurse on, which can be
> called by "new" ternary consistent callers and leave bool TS_execute, which
> works as earlier. It also makes callback function wrapping to allow some
> hypothetical old extension enjoy binary behavior. I am not sure it is very
> much necessary but as it is not hard I'd propose somewhat leave this
> feature by combining patches.
>
> 2. Overall I see two reasons to consider when choosing ternary/boolean
> calls in TS_execute: speed and compatibility. I'd like to make some
> performance tests for different types of queries (plain without weights,
> and containing weights in some or all operands) to evaluate first of these
> effects in both cases.
>
> Then we'll have reasons to commit a certain type of patch or maybe some
> combination of them.
>
> Best regards,
> Pavel Borisov.
>
> вс, 17 мая 2020 г. в 23:53, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>:
>
>> Hi, all!
>> Below is my variant how to patch Gin-Gist weights issue:
>> 1. First of all I propose to shift from previously Gin's own TS_execute
>> variant and leave only two: TS_execute with bool result and bool type
>> callback and ternary TS_execute_recurse with ternary callback. I suppose
>> all legacy consistent callers can still use bool via provided wrapper.
>> 2. I integrated logic for indexes which do not support weights and
>> positions inside (which gives MAYBE in certain cases on negation) inside
>> previous TS_execute_recurse function called with additional flag for this
>> class of indexes.
>> 3. Check function for GIST and GIN now gives ternary result and is called
>> with ternary type callback. I think in future nothing prevents smoothly
>> shifting callback functions, check functions and even TS_execute result to
>> ternary.
>>
>> So I also send my variant patch for review and discussion.
>>
>> Regards,
>> Pavel Borisov
>>
>> вс, 17 мая 2020 г. в 03:14, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
>>
>>> I wrote:
>>> > I think the root of the problem is that if we have a query using
>>> > weights, and we are testing tsvector data that lacks positions/weights,
>>> > we can never say there's definitely a match. I don't see any decently
>>> > clean way to fix this without redefining the TSExecuteCallback API
>>> > to return a tri-state YES/NO/MAYBE result, because really we need to
>>> > decide that it's MAYBE at the level of processing the QI_VAL node,
>>> > not later on. I'd tried to avoid that in e81e5741a, but maybe we
>>> > should just bite that bullet, and not worry about whether there's
>>> > any third-party code providing its own TSExecuteCallback routine.
>>> > codesearch.debian.net suggests that there are no external callers
>>> > of TS_execute, so maybe we can get away with that.
>>>
>>> 0001 attached is a proposed patch that does it that way. Given the
>>> API break involved, it's not quite clear what to do with this.
>>> ISTM we have three options:
>>>
>>> 1. Ignore the API issue and back-patch. Given the apparent lack of
>>> external callers of TS_execute, maybe we can get away with that;
>>> but I wonder if we'd get pushback from distros that have automatic
>>> ABI-break detectors in place.
>>>
>>> 2. Assume we can't backpatch, but it's still OK to slip this into
>>> v13. (This option clearly has a limited shelf life, but I think
>>> we could get away with it until late beta.)
>>>
>>> 3. Assume we'd better hold this till v14.
>>>
>>> I find #3 unduly conservative, seeing that this is clearly a bug
>>> fix, but on the other hand #1 is a bit scary. Aside from the API
>>> issue, it's not impossible that this has introduced some corner
>>> case behavioral changes that we'd consider to be new bugs rather
>>> than bug fixes.
>>>
>>> Anyway, some notes for reviewers:
>>>
>>> * The core idea of the patch is to make the TS_execute callbacks
>>> have ternary results and to insist they return TS_MAYBE in any
>>> case where the correct result is uncertain.
>>>
>>> * That fixes the bug at hand, and it also allows getting rid of
>>> some kluges at higher levels. The GIN code no longer needs its
>>> own TS_execute_ternary implementation, and the GIST code no longer
>>> needs to suppose that it can't trust NOT results.
>>>
>>> * I put some effort into not leaking memory within tsvector_op.c's
>>> checkclass_str and checkcondition_str. (The final output array
>>> can still get leaked, I believe. Fixing that seems like material
>>> for a different patch, and it might not be worth any trouble.)
>>>
>>> * The new test cases in tstypes.sql are to verify that we didn't
>>> change behavior of the basic tsvector @@ tsquery code. There wasn't
>>> any coverage of these cases before, and the logic for checkclass_str
>>> without position info had to be tweaked to preserve this behavior.
>>>
>>> * The new cases in tsearch verify that the GIN and GIST code gives
>>> the same results as the basic operator.
>>>
>>> Now, as for the 0002 patch attached: after 0001, the only TS_execute()
>>> callers that are not specifying TS_EXEC_CALC_NOT are hlCover(),
>>> which I'd already complained is probably a bug, and the first of
>>> the two calls in tsrank.c's Cover(). It seems difficult to me to
>>> argue that it's not a bug for Cover() to process NOT in one call
>>> but not the other --- moreover, if there was any argument for that
>>> once upon a time, it probably falls to the ground now that (a) we
>>> have a less buggy implementation of NOT and (b) the presence of
>>> phrase queries significantly raises the importance of not taking
>>> short-cuts. Therefore, 0002 attached rips out the TS_EXEC_CALC_NOT
>>> flag and has TS_execute compute NOT expressions accurately all the
>>> time.
>>>
>>> As it stands, 0002 changes no regression test results, which I'm
>>> afraid speaks more to our crummy test coverage than anything else;
>>> tests that exercise those two functions with NOT-using queries
>>> would easily show that there is a difference.
>>>
>>> Even if we decide to back-patch 0001, I would not suggest
>>> back-patching 0002, as it's more nearly a definitional change
>>> than a bug fix. But I think it's a good idea anyway.
>>>
>>> I'll stick this in the queue for the July commitfest, in case
>>> anybody wants to review it.
>>>
>>> regards, tom lane
>>>
>>>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-05-21 14:30:40 Re: Trouble with hashagg spill I/O pattern and costing
Previous Message Pantelis Theodosiou 2020-05-21 14:20:40 Re: PostgreSQL 13 Beta 1 Release Announcement Draft