Re: POC, WIP: OR-clause support for indexes

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2016-03-17 17:19:55
Message-ID: 56EAE73B.3010603@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> I also wonder whether the patch should add explanation of OR-clauses
> handling into the READMEs in src/backend/access/*

Oops, will add shortly.
>
> The patch would probably benefit from transforming it into a patch
> series - one patch for the infrastructure shared by all the indexes,
> then one patch per index type. That should make it easier to review, and
> I seriously doubt we'd want to commit this in one huge chunk anyway.
Ok, will do it.

> 1) fields in BrinOpaque are not following the naming convention (all the
> existing fields start with bo_)
fixed

>
> 2) there's plenty of places violating the usual code style (e.g. for
> single-command if branches) - not a big deal for WIP patch, but needs to
> get fixed eventually
hope, fixed

>
> 3) I wonder whether we really need both SK_OR and SK_AND, considering
> they are mutually exclusive. Why not to assume SK_AND by default, and
> only use SK_OR? If we really need them, perhaps an assert making sure
> they are not set at the same time would be appropriate.
In short: possible ambiguity and increasing stack machine complexity.
Let we have follow expression in reversed polish notation (letters represent a
condtion, | - OR, & - AND logical operation, ANDs are omitted):
a b c |

Is it ((a & b)| c) or (a & (b | c)) ?

Also, using both SK_ makes code more readable.

> 4) scanGetItem is a prime example of the "badly needs comments" issue,
> particularly because the previous version of the function actually had
> quite a lot of them while the new function has none.
Will add soon

>
> 5) scanGetItem() may end up using uninitialized 'cmp' - it only gets
> initialized when (!leftFinished && !rightFinished), but then gets used
> when either part of the condition evaluates to true. Probably should be
>
> if (!leftFinished || !rightFinished)
> cmp = ...
fixed

>
> 6) the code in nodeIndexscan.c should not include call to abort()
>
> {
> abort();
> elog(ERROR, "unsupported indexqual type: %d",
> (int) nodeTag(clause));
> }
fixed, just forgot to remove

>
> 7) I find it rather ugly that the paths are built by converting BitmapOr
> paths. Firstly, it means indexes without amgetbitmap can't benefit from
> this change. Maybe that's reasonable limitation, though?
I based on following thoughts:
1 code which tries to find OR-index path will be very similar to existing
generate_or_bitmap code. Obviously, it should not be duplicated.
2 all existsing indexes have amgetbitmap method, only a few don't. amgetbitmap
interface is simpler. Anyway, I can add an option for generate_or_bitmap
to use any index, but, in current state it will just repeat all work.

>
> But more importantly, this design already has a bunch of unintended
> consequences. For example, the current code completely ignores
> enable_indexscan setting, because it merely copies the costs from the
> bitmap path.
I'd like to add separate enable_indexorscan

> That's pretty dubious, I guess. So this code probably needs to be made
> aware of enable_indexscan - right now it entirely ignores startup_cost
> in convert_bitmap_path_to_index_clause(). But of course if there are
> multiple IndexPaths, the enable_indexscan=off will be included multiple
> times.
>
> 9) This already breaks estimation for some reason. Consider this
...
> So the OR-clause is estimated to match 0 rows, less than each of the
> clauses independently. Needless to say that without the patch this works
> just fine.
fixed

>
> 10) Also, this already breaks some regression tests, apparently because
> it changes how 'width' is computed.
fixed too

> So I think this way of building the index path from a BitmapOr path is
> pretty much a dead-end.
I don't think so because separate code path to support OR-clause in index will
significanlty duplicate BitmapOr generator.

Will send next version as soon as possible. Thank you for your attention!

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

Attachment Content-Type Size
index_or-3.patch.gz application/x-gzip 20.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2016-03-17 17:22:25 Re: Re: Add generate_series(date,date) and generate_series(date,date,integer)
Previous Message Robert Haas 2016-03-17 17:17:44 Re: Re: Add generate_series(date,date) and generate_series(date,date,integer)