TABLESAMPLE doesn't actually satisfy the SQL spec, does it?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: TABLESAMPLE doesn't actually satisfy the SQL spec, does it?
Date: 2015-07-12 16:02:07
Message-ID: 9871.1436716927@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

As best I can tell (evidence below), the SQL standard requires that if a
single query reads a table with a TABLESAMPLE clause multiple times (say,
because it's on the inside of a nestloop), then the exact same set of
sampled rows are returned each time. The BERNOULLI code, at least, fails
to achieve this. Suppose that a concurrent session adds some tuples to
a page in between scans. When that page is visited the next time, the
sampler RNG will be advanced more times than it was previously, because
it's called a number of times that depends on PageGetMaxOffsetNumber().
So the RNG state will be different at the end of the page, and then
the set of tuples selected from subsequent pages will be completely
different from what it was before. This will happen even if none of the
added rows are committed yet, let alone visible to the query's snapshot;
which makes it a failure of serializability even if you don't believe
the argument that it violates the requirements for TABLESAMPLE.

Now, as for the language-lawyering aspect of it, I read in SQL2011
7.6 <table reference> general rule 5a:

iv) Case:

1) If <sample method> specifies BERNOULLI, then the result of TF is a
table containing approximately (N*S/100) rows of RT. The probability
of a row of RT being included in result of TF is S/100. Further,
whether a given row of RT is included in result of TF is independent
of whether other rows of RT are included in result of TF.

2) Otherwise, result of TF is a table containing approximately
(N*S/100) rows of RT. The probability of a row of RT being included in
result of TF is S/100.

v) If TF contains outer references, then a table with identical rows is
generated every time TF is evaluated with a given set of values for
outer references.

This seems to me to clearly require that a fixed set of samples (ie, a
"table", whether real or virtual) is selected during any query. Note that
this requirement is the same whether or not REPEATABLE is used.

Also, I found a description of IBM DB2's implementation of this feature,
http://www.almaden.ibm.com/cs/people/peterh/idugjbig.pdf
and in that they say:

Semantically, sampling of a table occurs before any other query
processing, such as applying predicates or performing joins. One can
envision the original tables referenced in a query being initially
replaced by temporary reduced tables containing sampled rows, and then
normal query processing commencing on the reduced tables. (For
performance reasons, actual processing may not occur in exactly this
manner.) It follows, for example, that repeated accesses of a sampled
table within the same query, such as in a nested-loop join or a
correlated subquery, will see the same sample of that table within a
single execution of that query.

They do mention that alterations to a table may result in *successive*
queries giving different results, even when using REPEATABLE. But I do
not think it's okay for this to happen within a single query. That would
mean that vagaries of plan choice, eg nestloop vs some other join method,
would produce inconsistent results.

A possible way around this problem is to redefine the sampling rule so
that it is not history-dependent but depends only on the tuple TIDs.
For instance, one could hash the TID of a candidate tuple, xor that with
a hash of the seed being used for the current query, and then select the
tuple if (hash/MAXINT) < P.

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2015-07-12 16:17:19 TRANSFORM modules vs. AIX
Previous Message Michael Paquier 2015-07-12 13:55:21 Fixes for a couple of resource leaks