Re: Merging statistics from children instead of re-sampling everything

From: "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: d(dot)belyalov(at)postgrespro(dot)ru
Subject: Re: Merging statistics from children instead of re-sampling everything
Date: 2022-02-14 10:22:30
Message-ID: f1435fb9-e78d-bd43-ec2a-e1477db1ab32@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2/11/22 20:12, Tomas Vondra wrote:
>
>
> On 2/11/22 05:29, Andrey V. Lepikhov wrote:
>> On 2/11/22 03:37, Tomas Vondra wrote:
>>> That being said, this thread was not really about foreign partitions,
>>> but about re-analyzing inheritance trees in general. And sampling
>>> foreign partitions doesn't really solve that - we'll still do the
>>> sampling over and over.
>> IMO, to solve the problem we should do two things:
>> 1. Avoid repeatable partition scans in the case inheritance tree.
>> 2. Avoid to re-analyze everything in the case of active changes in
>> small subset of partitions.
>>
>> For (1) i can imagine a solution like multiplexing: on the stage of
>> defining which relations to scan, group them and prepare parameters of
>> scanning to make multiple samples in one shot.
> I'm not sure I understand what you mean by multiplexing. The term
> usually means "sending multiple signals at once" but I'm not sure how
> that applies to this issue. Can you elaborate?

I suppose to make a set of samples in one scan: one sample for plane
table, another - for a parent and so on, according to the inheritance
tree. And cache these samples in memory. We can calculate all parameters
of reservoir method to do it.

> sample might be used for estimation of clauses directly.
You mean, to use them in difficult cases, such of estimation of grouping
over APPEND ?
>
> But it requires storing the sample somewhere, and I haven't found a good
> and simple way to do that. We could serialize that into bytea, or we
> could create a new fork, or something, but what should that do with
> oversized attributes (how would TOAST work for a fork) and/or large
> samples (which might not fit into 1GB bytea)?
This feature looks like meta-info over a database. It can be stored in
separate relation. It is not obvious that we need to use it for each
relation, for example, with large samples. I think, it can be controlled
by a table parameter.

--
regards,
Andrey Lepikhov
Postgres Professional

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message tanghy.fnst@fujitsu.com 2022-02-14 10:30:19 RE: row filtering for logical replication
Previous Message Dilip Kumar 2022-02-14 10:19:55 Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints