Re: [HACKERS] WIP: Aggregation push-down

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Antonin Houska <ah(at)cybertec(dot)at>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] WIP: Aggregation push-down
Date: 2018-01-26 13:58:58
Message-ID: CA+TgmoasTkqJHo73UF_Kp2fMM33ya8cXqwrzp73zB4RgxBN+Dg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 26, 2018 at 8:04 AM, Antonin Houska <ah(at)cybertec(dot)at> wrote:
> So one problem is that the grouping expression can be inappropriate for
> partial aggregation even if there's no type change during the
> translation. What I consider typical for this case is that the equality
> operator used to identify groups (SortGroupClause.eqop) can return true even
> if binary (stored) values of the inputs differ. The partial aggregation could
> only take place if we had a special equality operator which distinguishes the
> *binary* values (I don't know yet how to store this operator the catalog ---
> in pg_opclass recors for the hash opclasses?)..

We don't have an operator that tests for binary equality, but it's
certainly testable from C; see datumIsEqual. I'm not sure how much
this really helps, though. I think it would be safe to build an
initial set of groups based on datumIsEqual comparisons and then
arrange to later merge some of the groups. But that's not something
we ever do in the executor today, so it might require quite a lot of
hacking. Also, it seems like it might really suck in some cases. For
instance, consider something like SELECT scale(one.a), sum(two.a) FROM
one, two WHERE one.a = two.a GROUP BY 1. Doing a Partial Aggregate on
two.a using datumIsEqual semantics, joining to a, and then doing a
Finalize Aggregate looks legal, but the Partial Aggregate may produce
a tremendous number of groups compared to the Finalize Aggregate. In
other words, this technique wouldn't merge any groups that shouldn't
be merged, but it might fail to merge groups that really need to be
merged to get good performance.

> One of my ideas is to check whether the source and destination types are
> binary coercible (i.e. pg_cast(castfunc) = 0) but this might be a misuse of
> the binary coercibility.

Yeah, binary coercibility has nothing to do with this; that tells you
whether the two types are the same on disk, not whether they have the
same equality semantics. For instance, I think text and citext are
binary coercible, but their equality semantics are not the same.

> Another idea is to allow only such changes that the
> destination type is in the same operator class as the source, and explicitly
> enumerate the "safe opclasses". But that would mean that user cannot define
> new opclasses within which the translation is possible --- not sure this is a
> serious issue.

Enumerating specific opclasses in the source code is a non-starter --
Tom Lane would roll over in his grave if he weren't still alive. What
we could perhaps consider doing is adding some mechanism for an
opclass or opfamily to say whether its equality semantics happen to be
exactly datumIsEqual() semantics. That's a little grotty because it
leaves data types for which that isn't true out in the cold, but on
the other hand it seems like it would be useful as a way of optimizing
a bunch of things other than this. Maybe it could also include a way
to specify that the comparator happens to have the semantics as C's
built-in < and > operators, which seems like a case that might also
lend itself to some optimizations.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-01-26 14:07:05 Re: PATCH: Exclude unlogged tables from base backups
Previous Message Bruce Momjian 2018-01-26 13:56:01 Re: Setting BLCKSZ 4kB