Re: PoC/WIP: Extended statistics on expressions

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PoC/WIP: Extended statistics on expressions
Date: 2021-03-04 22:16:04
Message-ID: f3820f57-dd45-8ea3-0c46-86629b0c0d41@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Attached is a slightly improved version of the patch series, addressing
most of the issues raised in the previous message.

0001-bootstrap-convert-Typ-to-a-List-20210304.patch
0002-Allow-composite-types-in-bootstrap-20210304.patch

These two parts are without any changes.

0003-Extended-statistics-on-expressions-20210304.patch

Mostly unchanged, The one improvement is removing some duplicate code in
in mvc.c. When building the match bitmap for clauses, some of the clause
types had one block for plain attributes, then a nearly identical block
for expressions. I got rid of that - the only thing that is really
different is determining the statistics dimension.

0004-WIP-rework-tracking-of-expressions-20210304.patch

This is mostly unchanged of the patch reworking how we assign artificial
attnums to expressions (negative instead of (MaxHeapAttributeNumber+i)).
I said I want to do some cleanup, but I ended up doing most of that in
the 0005 patch - and I plan to squash both parts into 0003 in the end. I
left them separate to make 0005 easier to review for now.

0005-WIP-unify-handling-of-attributes-and-expres-20210304.patch

This reworks how we build statistics on attributes and expressions.
Instead of treating attributes and expressions separately, this allows
handling them uniformly.

Until now, the various "build" functions (for different statistics
kinds) extracted attribute values from sampled tuples, but expressions
were pre-calculated in a separate array. Firstly to save CPU time (not
having to evaluate expensive expressions repeatedly) and to keep the
different stats consistent (there might be volatile functions etc.).

So the build functions had to look at the attnum, determine if it's
attribute or expression, and in some cases it was tricky / easy to get
wrong.

This patch replaces this "split" view with a simple "consistent"
representation merging values from attributes and expressions, and just
passes that to the build functions. There's no need to check the attnum,
and handle expressions in some special way, so the build functions are
much simpler / easier to understand (at least I think so).

The build data is represented by "StatsBuildData" struct - not sure if
there's a better name.

I'm mostly happy with how this turned out. I'm sure there's a bit more
cleanup needed (e.g. the merging/remapping of dependencies needs some
refactoring, I think) but overall this seems reasonable.

I did some performance testing, I don't think there's any measurable
performance degradation. I'm actually wondering if we need to transform
the AttrNumber arrays into bitmaps in various places - maybe we should
just do a plain linear search. We don't really expect many elements, as
each statistics has 8 attnums at most. So maybe building the bitmapsets
is a net loss? The one exception might be functional dependencies, where
we can "merge" multiple statistics together. But even then it'd require
many statistics objects to make a difference.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
0001-bootstrap-convert-Typ-to-a-List-20210304.patch text/x-patch 3.7 KB
0002-Allow-composite-types-in-bootstrap-20210304.patch text/x-patch 1.4 KB
0003-Extended-statistics-on-expressions-20210304.patch text/x-patch 249.2 KB
0004-WIP-rework-tracking-of-expressions-20210304.patch text/x-patch 26.4 KB
0005-WIP-unify-handling-of-attributes-and-expres-20210304.patch text/x-patch 34.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2021-03-04 22:38:03 Re: PROXY protocol support
Previous Message Thomas Munro 2021-03-04 22:08:22 Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?