From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: postgres_fdw: using TABLESAMPLE to collect remote sample |
Date: | 2022-07-19 19:27:56 |
Message-ID: | 1297000.1658258876@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> writes:
> I we want to improve sampling for partitioned cases (where the foreign
> table is just one of many partitions), I think we'd have to rework how
> we determine sample size for each partition. Now we simply calculate
> that from relpages, which seems quite fragile (different amounts of
> bloat, different tuple densities) and somewhat strange for FDW serves
> that don't use the same "page" concept.
> So it may easily happen we determine bogus sample sizes for each
> partition. The difficulties when calculating the sample_frac is just a
> secondary issue.
> OTOH the concept of a "row" seems way more general, so perhaps
> acquire_inherited_sample_rows should use reltuples, and if we want to do
> correction it should happen at this stage already.
Yeah, there's definitely something to be said for changing that to be
based on rowcount estimates instead of physical size. I think it's
a matter for a different patch though, and not a reason to hold up
this one.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Martin Kalcher | 2022-07-19 20:20:57 | Re: [PATCH] Introduce array_shuffle() and array_sample() |
Previous Message | Tom Lane | 2022-07-19 19:23:57 | Re: Convert planner's AggInfo and AggTransInfo to Nodes |