Re: POC, WIP: OR-clause support for indexes

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2016-03-10 13:04:37
Message-ID: 1457615077.31876.51.camel@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Teodor,

I've looked into v2 of the patch you sent a few days ago. Firstly, I
definitely agree that being able to use OR conditions with an index is
definitely a cool idea.

I do however agree with David that the patch would definitely benefit
from comments documenting various bits that are less obvious to mere
mortals like me, with limited knowledge of the index internals.

I also wonder whether the patch should add explanation of OR-clauses
handling into the READMEs in src/backend/access/*

The patch would probably benefit from transforming it into a patch
series - one patch for the infrastructure shared by all the indexes,
then one patch per index type. That should make it easier to review, and
I seriously doubt we'd want to commit this in one huge chunk anyway.

Now, some review comments from eyeballing the patch. Some of those are
nitpicking, but well ...

1) fields in BrinOpaque are not following the naming convention (all the
existing fields start with bo_)

2) there's plenty of places violating the usual code style (e.g. for
single-command if branches) - not a big deal for WIP patch, but needs to
get fixed eventually

3) I wonder whether we really need both SK_OR and SK_AND, considering
they are mutually exclusive. Why not to assume SK_AND by default, and
only use SK_OR? If we really need them, perhaps an assert making sure
they are not set at the same time would be appropriate.

4) scanGetItem is a prime example of the "badly needs comments" issue,
particularly because the previous version of the function actually had
quite a lot of them while the new function has none.

5) scanGetItem() may end up using uninitialized 'cmp' - it only gets
initialized when (!leftFinished && !rightFinished), but then gets used
when either part of the condition evaluates to true. Probably should be

if (!leftFinished || !rightFinished)
cmp = ...

6) the code in nodeIndexscan.c should not include call to abort()

{
abort();
elog(ERROR, "unsupported indexqual type: %d",
(int) nodeTag(clause));
}

7) I find it rather ugly that the paths are built by converting BitmapOr
paths. Firstly, it means indexes without amgetbitmap can't benefit from
this change. Maybe that's reasonable limitation, though?

But more importantly, this design already has a bunch of unintended
consequences. For example, the current code completely ignores
enable_indexscan setting, because it merely copies the costs from the
bitmap path.

SET enable_indexscan = off;
EXPLAIN SELECT * FROM t WHERE (c && ARRAY[1] OR c && ARRAY[2]);

QUERY PLAN
-------------------------------------------------------------------
Index Scan using t_c_idx on t (cost=0.00..4.29 rows=0 width=33)
Index Cond: ((c && '{1}'::integer[]) OR (c && '{2}'::integer[]))
(2 rows)

That's pretty dubious, I guess. So this code probably needs to be made
aware of enable_indexscan - right now it entirely ignores startup_cost
in convert_bitmap_path_to_index_clause(). But of course if there are
multiple IndexPaths, the enable_indexscan=off will be included multiple
times.

9) This already breaks estimation for some reason. Consider this
example, using a table with int[] column, with gist index built using
intarray:

EXPLAIN SELECT * FROM t WHERE (c && ARRAY[1,2,3,4,5,6,7]);

QUERY PLAN
--------------------------------------------------------------------
Index Scan using t_c_idx on t (cost=0.28..52.48 rows=12 width=33)
Index Cond: (c && '{1,2,3,4,5,6,7}'::integer[])
(2 rows)

EXPLAIN SELECT * FROM t WHERE (c && ARRAY[8,9,10,11,12,13,14]);

QUERY PLAN
--------------------------------------------------------------------
Index Scan using t_c_idx on t (cost=0.28..44.45 rows=10 width=33)
Index Cond: (c && '{8,9,10,11,12,13,14}'::integer[])
(2 rows)

EXPLAIN SELECT * FROM t WHERE (c && ARRAY[1,2,3,4,5,6,7])
OR (c && ARRAY[8,9,10,11,12,13,14]);

QUERY PLAN
--------------------------------------------------------------------
Index Scan using t_c_idx on t (cost=0.00..4.37 rows=0 width=33)
Index Cond: ((c && '{1,2,3,4,5,6,7}'::integer[])
OR (c && '{8,9,10,11,12,13,14}'::integer[]))
(2 rows)

So the OR-clause is estimated to match 0 rows, less than each of the
clauses independently. Needless to say that without the patch this works
just fine.

10) Also, this already breaks some regression tests, apparently because
it changes how 'width' is computed.

So I think this way of building the index path from a BitmapOr path is
pretty much a dead-end.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Grzegorz Sampolski 2016-03-10 13:11:12 Re: pam auth - add rhost item
Previous Message Amit Kapila 2016-03-10 13:00:17 Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Windows service is not starting so there’s message in log: FATAL: "could not create shared memory segment “Global/PostgreSQL.851401618”: Permission denied”