Re: TABLESAMPLE doesn't actually satisfy the SQL spec, does it?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: TABLESAMPLE doesn't actually satisfy the SQL spec, does it?
Date: 2015-07-16 14:22:08
Message-ID: 3551.1437056528@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Petr Jelinek <petr(at)2ndquadrant(dot)com> writes:
> On 2015-07-12 18:02, Tom Lane wrote:
>> A possible way around this problem is to redefine the sampling rule so
>> that it is not history-dependent but depends only on the tuple TIDs.
>> For instance, one could hash the TID of a candidate tuple, xor that with
>> a hash of the seed being used for the current query, and then select the
>> tuple if (hash/MAXINT) < P.

> That would work for bernoulli for physical tuples, yes. Only thing that
> worries me is future extensibility for data sources that only provide
> virtual tuples.

Well, repeatability of a TABLESAMPLE attached to a join seems like an
unsolved and possibly unsolvable problem anyway. I don't think we should
assume that the API we define today will cope with that.

But that is another reason why the current API is inadequate: there's no
provision for specifying whether or how a tablesample method can be
applied to non-base-table RTEs. (I re-read the thread and noted that
Peter E. complained about that some time ago, but nothing was done about
it. I'm fine with not supporting the case right now, but nonetheless
it's another reason why we'd better make the API more easily extensible.)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ildus Kurbangaliev 2015-07-16 14:28:02 Re: RFC: replace pg_stat_activity.waiting with something more descriptive
Previous Message Tom Lane 2015-07-16 14:16:20 Re: TABLESAMPLE patch is really in pretty sad shape