Re: Selectivity estimation for intarray with @@

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Uriy Zhuravlev <u(dot)zhuravlev(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Selectivity estimation for intarray with @@
Date: 2015-07-21 15:49:23
Message-ID: 55AE6A03.9060400@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 07/21/2015 03:44 PM, Alexander Korotkov wrote:
> While Uriy is on vacation, I've revised this patch a bit.

I whacked this around quite a bit, and I think it's in a committable
state now. But if you could run whatever tests you were using before on
this, to make sure it still produces the same estimates, that would be
great. I didn't change the estimates it should produce, only the code
structure.

One thing that bothers me slightly with this patch is the way it peeks
into the Most-Common-Elements arrays, which is produced by the built-in
type analyze function. If we ever change what statistics are collected
for arrays, or the way they are stored, this might break. In matchsel,
why don't we just call the built-in estimator function for each element
that we need to probe, and not look into the statistics ourselves at
all? I actually experimented with that, and it did slash much of the
code, and it would be more future-proof. However, it was also a lot
slower for queries that contain multiple values. That's understandable:
the built-in estimator will fetch the statistics tuple, parse the
arrays, etc. separately for each value in the query_int, while this
patch will do it only once for the whole query, and perform a simple
binary search for each value. So overall, I think this is OK as it is.
But if we find that we need to use the MCE list in this fashion in more
places in the future, it might be worthwhile to add some support code
for this in the backend to allow extracting the stats once, and doing
multiple "lightweight estimations" using the extracted stats.

Some things I fixed/changed:

* I didn't like that transformOperator() function, which looked up the
function's name. I replaced it with separate wrapper functions for each
operator, so that the built-in operator's OID can be hardcoded into each.

* I refactored the matchsel function heavily. I think it's more readable
now.

* I got rid of the Int4Freq array. It didn't seem significantly easier
to work with than the separate values/numbers arrays, so I just used
those directly.

* Also use the matchsel estimator for ~~ (the commutator of @@)

* Also use the estimators for the obsolete @ and ~ operators. Not that I
care much about those as they are obsolete, but seems strange not to, as
it's a trivial matter of setting the right estimator function.

* I added an ANALYZE in the regression test. It still won't
systematically test all the cost estimation code, and there's nothing to
check that the estimates make sense, but at least more of the code will
now run.

- Heikki

Attachment Content-Type Size
intarray-sel-3-heikki.patch application/x-patch 33.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-07-21 16:12:32 Re: TABLESAMPLE patch is really in pretty sad shape
Previous Message Alexander Korotkov 2015-07-21 15:38:43 Re: Selectivity estimation for intarray with @@