Re: postgres_fdw: using TABLESAMPLE to collect remote sample

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: James Finnerty <jfinnert(at)amazon(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Date: 2023-01-05 21:47:23
Message-ID: 3e8cca01-f102-d4b9-abe9-ed7b5e938323@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/5/23 22:05, Tom Lane wrote:
> Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> writes:
>> The second patch adds the relkind check, so that we only issue
>> TABLESAMPLE on valid relation types (tables, matviews). But I'm not sure
>> we actually need it - the example presented earlier was foreign table
>> pointing to a sequence. But that actually works fine even without this
>> patch, because fore sequences we have reltuples=1, which disables
>> sampling due to the check mentioned above (because targrows >= 1).
>
> Seems pretty accidental still.
>
>> The other issue with this patch is that it seems wrong to check the
>> relkind value locally - instead we should check this on the remote
>> server, because that's where we'll run TABLESAMPLE. Currently there are
>> no differences between versions, but what if we implement TABLESAMPLE
>> for another relkind in the future? Then we'll either fail to use
>> sampling or (worse) we'll try using TABLESAMPLE for unsupported relkind,
>> depending on which of the servers has newer version.
>
> We don't have a way to do that, and I don't think we need one. The
> worst case is that we don't try to use TABLESAMPLE when the remote
> server is a newer version that would allow it. If we do extend the
> set of supported relkinds, we'd need a version-specific test in
> postgres_fdw so that we don't wrongly use TABLESAMPLE with an older
> remote server ... but that's hardly complicated, or a new concept.
>
> On the other hand, if we let the code stand, the worst case is that
> ANALYZE fails because we try to apply TABLESAMPLE in an unsupported
> case, presumably with some remote relkind that doesn't exist today.
> I think that's clearly worse than not exploiting TABLESAMPLE
> although we could (with some other new remote relkind).
>

OK

> As far as the patch details go: I'd write 0001 as
>
> + Assert(sample_frac >= 0.0 && sample_frac <= 1.0);
>
> The way you have it seems a bit contorted.

Yeah, that's certainly cleaner.

> As for 0002, personally
> I'd rename the affected functions to reflect their expanded duties,
> and they *definitely* require adjustments to their header comments.
> Functionally it looks fine though.
>

No argument about the header comments, ofc - I just haven't done that in
the WIP patch. As for the renaming, any suggestions for the new names?
The patch tweaks two functions in a way that affects what they return:

1) deparseAnalyzeTuplesSql - adds relkind to the "reltuples" SQL

2) postgresCountTuplesForForeignTable - adds can_sample flag

There are no callers outside postgresAcquireSampleRowsFunc, so what
about renaming them like this?

deparseAnalyzeTuplesSql
-> deparseAnalyzeInfoSql

postgresCountTuplesForForeignTable
-> postgresGetAnalyzeInfoForForeignTable

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias van de Meent 2023-01-05 21:49:23 Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE
Previous Message Thomas Munro 2023-01-05 21:39:59 Re: Cygwin cleanup