Fwd: Merging statistics from children instead of re-sampling everything

From: Damir Belyalov <dam(dot)bel07(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Fwd: Merging statistics from children instead of re-sampling everything
Date: 2022-08-15 11:10:44
Message-ID: CALH1LgvFDa4SehS6wHwE0iZVLrdrm=tuOMkvt7VuUcZ+EoiwuQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> 3) stadistinct - This is quite problematic. We only have the per-child
> estimates, and it's not clear if there's any overlap. For now I've just
> summed it up, because that's safer / similar to what we do for gather
> merge paths etc. Maybe we could improve this by estimating the overlap
> somehow (e.g. from MCV lists / histograms). But honestly, I doubt the
> estimates based on tiny sample of each child are any better. I suppose
> we could introduce a column option, determining how to combine ndistinct
> (similar to how we can override n_distinct itself).
>
> 4) MCV - It's trivial to build a new "parent" MCV list, although it may
> be too large (in which case we cut it at statistics target, and copy the
> remaining bits to the histogram)
>

I think there is one approach to solve the problem with calculating mcv and
distinct statistics.
To do this, you need to calculate the density of the sample distribution
and store it, for example, in some slot.
Then, when merging statistics, we will sum up the densities of all
partitions as functions and get a new density.
According to the new density, you can find out which values are most common
and which are distinct.

To calculate the partition densities, you can use the "Kernel density
Estimation" -
https://www.statsmodels.org/dev/examples/notebooks/generated/kernel_density
html

The approach may be very inaccurate and difficult to implement, but solves
the problem.

Regards,
Damir Belyalov
Postgres Professional

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Melih Mutlu 2022-08-15 11:26:26 Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Previous Message Thomas Munro 2022-08-15 10:48:22 Re: Cleaning up historical portability baggage