Re: PoC/WIP: Extended statistics on expressions

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PoC/WIP: Extended statistics on expressions
Date: 2021-02-18 01:31:44
Message-ID: d571f2de-6246-4f6e-ca1d-32cc53a2016a@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Attached is a rebased patch series, merging the changes from the last
review into the 0003 patch, and with a WIP patch 0004 reworking the
tracking of expressions (to address the inefficiency due to relying on
MaxHeapAttributeNumber).

The 0004 passes is very much an experimental patch with a lot of ad hoc
changes. It passes make check, but it definitely needs much more work,
cleanup and testing. At this point it's more a demonstration of what
would be needed to rework it like this.

The main change is that instead of handling expressions by assigning
them attnums above MaxHeapAttributeNumber, we assign them system-like
attnums, i.e. negative ones. So the first one gets -1, the second one
-2, etc. And then we shift all attnums above 0, to allow using the
bitmapset as before.

Overall, this works, but the shifting is kinda pointless - it allows us
to build a bitmapset, but it's mostly useless because it depends on how
many expressions are in the statistics definition. So we can't compare
or combine bitmapsets for different statistics, and (more importantly)
we can't easily compare bitmapset on attnums from clauses.

Using MaxHeapAttributeNumber allowed using the bitmapsets at least for
regular attributes. Not sure if that's a major advantage, outweighing
wasting some space.

I wonder if we should just ditch the bitmapsets, and just use simple
arrays of attnums. I don't think we expect too many elements here,
especially when dealing with individual statistics. So now we're just
building and rebuilding the bitmapsets ... seems pointless.

One thing I'd like to improve (independently of what we do with the
bitmapsets) is getting rid of the distinction between attributes and
expressions when building the statistics - currently all the various
places have to care about whether the item is attribute or expression,
and look either into the tuple or array of pre-calculated value, do
various shifts to get the indexes, etc. That's quite tedious, and I've
made a lot of errors in that (and I'm sure there are more). So IMO we
should simplify this by replacing this with something containing values
for both attributes and expressions, handling it in a unified way.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
0001-bootstrap-convert-Typ-to-a-List-20210218.patch text/x-patch 3.7 KB
0002-Allow-composite-types-in-bootstrap-20210218.patch text/x-patch 1.4 KB
0003-Extended-statistics-on-expressions-20210218.patch text/x-patch 245.9 KB
0004-WIP-rework-tracking-of-expressions-20210218.patch text/x-patch 28.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-02-18 02:04:05 Re: cryptohash: missing locking functions for OpenSSL <= 1.0.2?
Previous Message Greg Nancarrow 2021-02-18 01:03:11 Re: Parallel INSERT (INTO ... SELECT ...)