Re: postgres_fdw: using TABLESAMPLE to collect remote sample

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: James Finnerty <jfinnert(at)amazon(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Date: 2023-01-06 16:58:52
Message-ID: 3013665.1673024332@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> writes:
> The one difference is that I realized the relkind check does not
> actually say we can't do sampling - it just means we can't use
> TABLESAMPLE to do it. We could still use "random()" ...

> Furthermore, I don't think we should silently disable sampling when the
> user explicitly requests TABLESAMPLE by specifying bernoulli/system for
> the table - IMHO it's less surprising to just fail in that case.

Agreed on both points. This patch looks good to me.

> Of course, all relkinds that don't support TABLESAMPLE currently have
> reltuples value that will disable sampling anyway (e.g. views have -1).
> So we won't actually fallback to random() anyway, because we can't
> calculate the sample fraction.
> That's a bit annoying for foreign tables pointing at a view, which is a
> more likely use case than table pointing at a sequence.

Right, that's a case worth being concerned about.

> But I realized we could actually still do "random()" sampling:
> SELECT * FROM t ORDER BY random() LIMIT $X;

Hmm, interesting idea, but it would totally bollix our correlation
estimates. Not sure that those are worth anything for remote views,
but still...

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Banck 2023-01-06 17:21:23 Re: Support load balancing in libpq
Previous Message Tom Lane 2023-01-06 16:29:31 Re: [PATCH] Expand character set for ltree labels