Re: postgres_fdw: using TABLESAMPLE to collect remote sample

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Date: 2022-07-19 19:07:08
Message-ID: 4e1c9731-8f8a-8b3e-17b8-d4c81b70df70@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/18/22 20:45, Tom Lane wrote:
> Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> writes:
>> Thanks. I'll switch this to "needs review" now.
>
> OK, I looked through this, and attach some review suggestions in the
> form of a delta patch. (0001 below is your two patches merged, 0002
> is my delta.) A lot of the delta is comment-smithing, but not all.
>

Thanks!

> After reflection I think that what you've got, ie use reltuples but
> don't try to sample if reltuples <= 0, is just fine. The remote
> would only have reltuples <= 0 in a never-analyzed table, which
> shouldn't be a situation that persists for long unless the table
> is tiny. Also, if reltuples is in error, the way to bet is that
> it's too small thanks to the table having been enlarged. But
> an error in that direction doesn't hurt us: we'll overestimate
> the required sample_frac and pull back more data than we need,
> but we'll still end up with a valid sample of the right size.
> So I doubt it's worth the complication to try to correct based
> on relpages etc. (Note that any such correction would almost
> certainly end in increasing our estimate of reltuples. But
> it's safer to have an underestimate than an overestimate.)
>

I mostly agree, particularly for the non-partitioned case.

I we want to improve sampling for partitioned cases (where the foreign
table is just one of many partitions), I think we'd have to rework how
we determine sample size for each partition. Now we simply calculate
that from relpages, which seems quite fragile (different amounts of
bloat, different tuple densities) and somewhat strange for FDW serves
that don't use the same "page" concept.

So it may easily happen we determine bogus sample sizes for each
partition. The difficulties when calculating the sample_frac is just a
secondary issue.

OTOH the concept of a "row" seems way more general, so perhaps
acquire_inherited_sample_rows should use reltuples, and if we want to do
correction it should happen at this stage already.

> I messed around with the sample_frac choosing logic slightly,
> to make it skip pointless calculations if we decide right off
> the bat to disable sampling. That way we don't need to worry
> about avoiding zero divide, nor do we have to wonder if any
> of the later calculations could misbehave.
>

Thanks.

> I left your logic about "disable if saving fewer than 100 rows"
> alone, but I have to wonder if using an absolute threshold rather
> than a relative one is well-conceived. Sampling at a rate of
> 99.9 percent seems pretty pointless, but this code is perfectly
> capable of accepting that if reltuples is big enough. So
> personally I'd do that more like
>
> if (sample_frac > 0.95)
> method = ANALYZE_SAMPLE_OFF;
>
> which is simpler and would also eliminate the need for the previous
> range-clamp step. I'm not sure what the right cutoff is, but
> your "100 tuples" constant is just as arbitrary.
>

I agree there probably is not much difference between a threshold on
sample_frac directly and number of rows, at least in general. My
reasoning for switching to "100 rows" is that in most cases the network
transfer is probably more costly than "local costs", and 5% may be quite
a few rows (particularly with higher statistics target). I guess the
proper approach would be to make some simple costing, but that seems
like an overkill.

> I rearranged the docs patch too. Where you had it, analyze_sampling
> was between fdw_startup_cost/fdw_tuple_cost and the following para
> discussing them, which didn't seem to me to flow well at all. I ended
> up putting analyze_sampling in its own separate list. You could almost
> make a case for giving it its own <sect3>, but I concluded that was
> probably overkill.
>

Thanks.

> One thing I'm not happy about, but did not touch here, is the expense
> of the test cases you added. On my machine, that adds a full 10% to
> the already excessively long runtime of postgres_fdw.sql --- and I
> do not think it's buying us anything. It is not this module's job
> to test whether bernoulli sampling works on partitioned tables.
> I think you should just add enough to make sure we exercise the
> relevant code paths in postgres_fdw itself.
>

Right, I should have commented on that. The purpose of those tests was
verifying that if we change the sampling method on server/table, the
generated query changes accordingly, etc. But that's a bit futile
because we don't have a good way of verifying what query was used - it
worked during development, as I added elog(WARNING).

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-07-19 19:23:57 Re: Convert planner's AggInfo and AggTransInfo to Nodes
Previous Message Justin Pryzby 2022-07-19 18:13:07 Re: First draft of the PG 15 release notes