Re: Selectivity estimation for intarray with @@

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Uriy Zhuravlev <u(dot)zhuravlev(at)postgrespro(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Selectivity estimation for intarray with @@
Date: 2015-07-14 17:55:06
Message-ID: CAMkU=1y=NQqtujE11qrkERvLOBUgv4_Yi46aJDSTb_pcGg4k2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 26, 2015 at 4:58 AM, Uriy Zhuravlev <u(dot)zhuravlev(at)postgrespro(dot)ru>
wrote:

> Hello.
>
> Attached patch based on:
>
> http://www.postgresql.org/message-id/CAPpHfdssY+qEPDCOvxx-b4LP3ybR+qS04M6-ARgGKNFk3FrFow@mail.gmail.com
>
> and adds selectivity estimation functions to @@ (port from tsquery). Now we
> support &&, @>, <@ and @@.
> In addition it was written migration to version 1.1 intarray. Because of
> what
> this patch requires my other patch:
> http://www.postgresql.org/message-id/14346041.DNcb5Y1inS@dinodell
>
> Alexander Korotkov know about this patch.
>

Hi Uriy,

This patch looks pretty good.

The first line of intarray--1.1.sql mis-identifies itself as "/*
contrib/intarray/intarray--1.0.sql */"

The real file intarray--1.0.sql file probably should not be included in the
final patch, but I like having it for testing.

It applies and builds cleanly over the alter operator patch (and now the
commit as well), passes make check of the contrib module both with and
without cassert.

I could succesfully upgrade from version 1.0 to 1.1 without having to drop
the gin__int_ops indexes in the process.

I could do pg_upgrade from 9.2 and 9.4 to 9.6devel with large indexes in
place, and then upgrade the extension to 1.1, and it worked without having
to rebuild the index.

It does what it says, and I think we want this.

There were some cases where the estimates were not very good, but that
seems to be limitation of what pg_stats makes available, not of this
patch. Specifically if the elements listed in the query text are not part
of most_common_elems (or worse yet, most_common_elems is null) then it is
left to guess with no real data, and those guesses can be pretty bad. It
is not this patches job to fix that, however.

It assumes all the stats are independent and so doesn't account for
correlation between members. This is also how the core selectivity
estimates work between columns, so I can't really complain about that. It
is easy to trick it with things like @@ '(!300 & 300)'::query_int, but I
don't think that is necessary to fix that.

I have not been creative enough to come up with queries for which this
improvement in selectivity estimate causes the execution plan to change in
important ways. I'm sure the serious users of this module would have such
cases, however.

I haven't tested gist__int_ops as I can't get those indexes to build in a
feasible amount of time. But the selectivity estimates are independent of
any actual index, so testing those doesn't seem to be critical.

There is no documentation change, which makes sense as this internal stuff
which isn't documented to start with.

There are no regression test changes. Not sure about that, it does seem
like regression tests would be desirable.

I haven't gone through the C code.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ted Toth 2015-07-14 17:59:19 Re: security labels on databases are bad for dump & restore
Previous Message Kohei KaiGai 2015-07-14 17:53:20 Re: security labels on databases are bad for dump & restore