Re: postgres_fdw: using TABLESAMPLE to collect remote sample

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Date: 2022-02-18 13:28:48
Message-ID: 84afe85f-2aa0-5aef-fa4a-59759afc03fb@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

here's a slightly updated version of the patch series. The 0001 part
adds tracking of server_version_num, so that it's possible to enable
other features depending on it. In this case it's used to decide whether
TABLESAMPLE is supported.

The 0002 part modifies the sampling. I realized we can do something
similar even on pre-9.5 releases, by running "WHERE random() < $1". Not
perfect, because it still has to read the whole table, but still better
than also sending it over the network.

There's a "sample" option for foreign server/table, which can be used to
disable the sampling if needed.

A simple measurement on a table with 10M rows, on localhost.

old: 6600ms
random: 450ms
tablesample: 40ms (system)
tablesample: 200ms (bernoulli)

Local analyze takes ~190ms, so that's quite close.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
0001-postgres_fdw-track-server-version-for-conne-20220218.patch text/x-patch 3.0 KB
0002-postgres_fdw-sample-data-on-remote-node-for-20220218.patch text/x-patch 13.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-02-18 13:48:42 Re: adding 'zstd' as a compression algorithm
Previous Message Robert Haas 2022-02-18 13:08:23 Re: adding 'zstd' as a compression algorithm