Quick Links

Re: Merging statistics from children instead of re-sampling everything

From:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Merging statistics from children instead of re-sampling everything
Date:	2022-02-10 11:50:31
Message-ID:	bdb0bea2-a0da-1f1d-5c92-96ff90c198eb@postgrespro.ru
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 21/1/2022 01:25, Tomas Vondra wrote:
> But I don't have a very good idea what to do about statistics that we
> can't really merge. For some types of statistics it's rather tricky to
> reasonably merge the results - ndistinct is a simple example, although
> we could work around that by building and merging hyperloglog counters.
I think, as a first step on this way we can reduce a number of pulled
tuples. We don't really needed to pull all tuples from a remote server.
To construct a reservoir, we can pull only a tuple sample. Reservoir
method needs only a few arguments to return a sample like you read
tuples locally. Also, to get such parts of samples asynchronously, we
can get size of each partition on a preliminary step of analysis.
In my opinion, even this solution can reduce heaviness of a problem
drastically.

--
regards,
Andrey Lepikhov
Postgres Professional

In response to

Re: Merging statistics from children instead of re-sampling everything at 2022-01-20 20:25:26 from Tomas Vondra

Responses

Re: Merging statistics from children instead of re-sampling everything at 2022-02-10 22:37:26 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Julien Rouhaud	2022-02-10 11:53:15	Re: Unnecessary call to resetPQExpBuffer in getIndexes
Previous Message	Masahiko Sawada	2022-02-10 11:28:57	Re: Logging in LockBufferForCleanup()