Re: POC: GROUP BY optimization

From: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: GROUP BY optimization
Date: 2022-01-21 11:09:01
Message-ID: 4ee5b15c-1ce0-59c7-8d4a-a1fd68ee0d12@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/22/21 3:58 AM, Tomas Vondra wrote:
> 4) I'm not sure it's actually a good idea to pass tuplesPerPrevGroup to
> estimate_num_groups_incremental. In principle yes, if we use "group
> size" from the previous step, then the returned value is the number of
> new groups after adding the "new" pathkey.
> But even if we ignore the issues with amplification mentioned in (3),
> there's an issue with non-linear behavior in estimate_num_groups,
> because at some point it's calculating
>
> D(N,n,p) = n * (1 - ((N-p)/N)^(N/n))
>
> where N - total rows, p - sample size, n - number of distinct values.
> And if we have (N1,n1) and (N2,n2) then the ratio of calculated
> estimated (which is pretty much what calculating group size does)
>
> D(N2,n2,p2) / D(N1,n1,p1)
>
> which will differ depending on p1 and p2. And if we're tweaking the
> tuplesPerPrevGroup all the time, that's really annoying, as it may make
> the groups smaller or larger, which is unpredictable and annoying, and I
> wonder if it might go against the idea of penalizing tuplesPerPrevGroup
> to some extent.
> We could simply use the input "tuples" value here, and then divide the
> current and previous estimate to calculate the number of new groups.
tuplesPerPrevGroup is only a top boundary for the estimation. I think,
we should use result of previous estimation as a limit for the next
during incremental estimation. Maybe we simply limit the
tuplesPerPrevGroup value to ensure of monotonic non-growth of this
value? - see in attachment patch to previous fixes.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachment Content-Type Size
guard_tuplesPerPrevGroup.txt text/plain 775 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-01-21 11:55:22 Re: Skipping logical replication transactions on subscriber side
Previous Message Dilip Kumar 2022-01-21 10:27:02 Re: row filtering for logical replication