Re: postgres_fdw: using TABLESAMPLE to collect remote sample

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, James Finnerty <jfinnert(at)amazon(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Subject: Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Date: 2022-12-21 12:08:08
Message-ID: cd3f2fda-d4cc-4371-210f-fc339d53b885@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/15/22 17:46, Tom Lane wrote:
> James Finnerty <jfinnert(at)amazon(dot)com> writes:
>> This patch looks good to me. I have two very minor nits: The inflation
>> of the sample size by 10% is arbitrary but it doesn't seem unreasonable
>> or concerning. It just makes me curious if there are any known cases
>> that motivated adding this logic.
>
> I wondered why, too.
>

Not sure about known cases, but the motivation is explained in the
comment before the 10% is applied.

The repluples value is never going to be spot on, and we use that to
determine what fraction of the table to sample (because all built-in
TABLESAMPLE methods only accept fraction to sample, not expected number
of rows).

If pg_class.reltuples is lower (than the actual row count), we'll end up
with sample fraction too high. That's mostly harmless, as we'll then
discard some of the rows locally.

But if the pg_class.reltuples is higher, we'll end up sampling too few
rows (less than targrows). This can happen e.g. after a bulk delete.

Yes, the 10% value is mostly arbitrary, and maybe it's pointless. How
much may the stats change with 10% larger sample? Probably not much, so
is it really solving anything?

Also, maybe it's fixing the issue at the wrong level - if stale
reltuples are an issue, maybe the right fix is making it more accurate
on the source instance. Decrease autovacuum_vacuum_scale_factor, or
maybe also look at pg_stat_all_tables or something.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message shiy.fnst@fujitsu.com 2022-12-21 13:14:09 RE: Force streaming every change in logical decoding
Previous Message Amit Kapila 2022-12-21 12:05:34 Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication