|From:||"Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>|
|To:||Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>|
|Cc:||Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>|
|Subject:||Re: POC: GROUP BY optimization|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
I keep work on this patch. Here is intermediate results.
On 7/22/21 3:58 AM, Tomas Vondra wrote:
> in the first loop. Which seems pretty bogus - why would there be just
> two groups? When processing the first expression, it's as if there was
> one big "prev group" with all the tuples, so why not to just use nGroups
> as it is?
I think, heapsort code seems very strange. Look into fallback case. It
based on an output_tuples value. Maybe we should use nGroups value here,
but based on a number of output_tuples?
> 1) I looked at the resources mentioned as sources the formulas came
> from, but I've been unable to really match the algorithm to them. The
> quicksort paper is particularly "dense", the notation seems to be very
> different, and none of the theorems seem like an obvious fit. Would be
> good to make the relationship clearer in comments etc.
Fixed (See attachment).
> 3) I'm getting a bit skeptical about the various magic coefficients that
> are meant to model higher costs with non-uniform distribution. But
> consider that we do this, for example:
> tuplesPerPrevGroup = ceil(1.5 * tuplesPerPrevGroup / nGroups);
> but then in the next loop we call estimate_num_groups_incremental and
> pass this "tweaked" tuplesPerPrevGroup value to it. I'm pretty sure this
> may have various strange consequences - we'll calculate the nGroups
> based on the inflated value, and we'll calculate tuplesPerPrevGroup from
> that again - that seems susceptible to "amplification".
> We inflate tuplesPerPrevGroup by 50%, which means we'll get a higher
> nGroups estimate in the next loop - but not linearly. An then we'll
> calculate the inflated tuplesPerPrevGroup and estimated nGroup ...
Weighting coefficient '1.5' shows our desire to minimize the number of
comparison operations on each next attribute of a pathkeys list.
Increasing this coef we increase a chance, that planner will order
pathkeys by decreasing of uniqueness.
I think, it's ok.
|Next Message||Michael Paquier||2022-01-20 05:32:43||Re: Refactoring of compression options in pg_basebackup|
|Previous Message||Michael Paquier||2022-01-20 04:38:45||Re: pg_upgrade should truncate/remove its logs before running|