Re: postgres_fdw: using TABLESAMPLE to collect remote sample

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Date: 2022-07-18 11:36:23
Message-ID: 113b8f5b-eec6-ea60-8508-439f2692887c@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/16/22 23:57, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
>> On 2022-02-23 00:51:24 +0100, Tomas Vondra wrote:
>>> And here's the slightly simplified patch, without the part adding the
>>> unnecessary GetServerVersion() function.
>
>> Doesn't apply anymore: http://cfbot.cputube.org/patch_37_3535.log
>> Marked as waiting-on-author.
>
> Here's a rebased version that should at least pass regression tests.
>

Thanks. I've been hacking on this over the past few days, and by
coincidence I've been improving exactly the stuff you've pointed out in
the review. 0001 is just the original patch rebased, 0002 includes all
the various changes.

> I've not reviewed it in any detail, but:
>
> * I'm not really on board with defaulting to SYSTEM sample method,
> and definitely not on board with not allowing any other choice.
> We don't know enough about the situation in a remote table to be
> confident that potentially-nonrandom sampling is OK. So personally
> I'd default to BERNOULLI, which is more reliable and seems plenty fast
> enough given your upthread results. It could be an idea to extend the
> sample option to be like "sample [ = methodname ]", if you want more
> flexibility, but I'd be happy leaving that for another time.
>

I agree, I came roughly to the same conclusion, so I replaced the simple
on/off option with these options:

off - Disables the remote sampling, so we just fetch everything and do
sampling on the local node, just like today.

random - Remote sampling, but "naive" implementation using random()
function. The advantage is this reduces the amount of data we need to
transfer, but it still reads the whole table. This should work for all
server versions, I believe.

system - TABLESAMPLE system method.

bernoulli - TABLESAMOLE bernoulli (default for 9.5+)

auto - picks bernoulli on 9.5+, random on older servers.

I'm not sure about custom TABLESAMPLE methods - that adds more
complexity to detect if it's installed, it's trickier to decide what's
the best choice (for "auto" to make decide), and the parameter is also
different (e.g. system_rows uses number of rows vs. sampling rate).

> * The code depending on reltuples is broken in recent server versions,
> and might give useless results in older ones too (if reltuples =
> relpages = 0). Ideally we'd retrieve all of reltuples, relpages, and
> pg_relation_size(rel), and do the same calculation the planner does.
> Not sure if pg_relation_size() exists far enough back though.
>

Yes, I noticed that too, and the reworked code should deal with this
reltuples=0 (by just disabling remote sampling).

I haven't implemented the reltuples/relpages correction yet, but I don't
think compatibility would be an issue - deparseAnalyzeSizeSql() already
calls pg_relation_size(), after all.

FWIW it seems a bit weird being so careful about adjusting reltuples,
when acquire_inherited_sample_rows() only really looks at relpages when
deciding how many rows to sample from each partition. If our goal is to
use a more accurate reltuples, maybe we should do that in the first step
already. Otherwise we may end up build with a sample that does not
reflect sizes of the partitions correctly.

Of course, the sample rate also matters for non-partitioned tables.

> * Copying-and-pasting all of deparseAnalyzeSql (twice!) seems pretty
> bletcherous. Why not call that and then add a WHERE clause to its
> result, or just add some parameters to it so it can do that itself?
>

Right. I ended up refactoring this into a single function, with a
"method" parameter that determines if/how we do the remote sampling.

> * More attention to updating relevant comments would be appropriate,
> eg here you've not bothered to fix the adjacent falsified comment:
>
> /* We've retrieved all living tuples from foreign server. */
> - *totalrows = astate.samplerows;
> + if (do_sample)
> + *totalrows = reltuples;
> + else
> + *totalrows = astate.samplerows;
>

Yep, fixed.

> * Needs docs obviously. I'm not sure if the existing regression
> testing covers the new code adequately or if we need more cases.
>

Yep, I added the "sampling_method" to postgres-fdw.sgml.

> Having said that much, I'm going to leave it in Waiting on Author
> state.

Thanks. I'll switch this to "needs review" now.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
0001-postgres_fdw-sample-data-on-remote-node-for-20220718.patch text/x-patch 13.6 KB
0002-rework-postgres_fdw-analyze-sampling-20220718.patch text/x-patch 25.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-07-18 11:49:39 Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Previous Message vignesh C 2022-07-18 11:28:32 Re: Handle infinite recursion in logical replication setup