Re: TABLESAMPLE doesn't actually satisfy the SQL spec, does it?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TABLESAMPLE doesn't actually satisfy the SQL spec, does it?
Date: 2015-07-12 17:50:51
Message-ID: 12137.1436723451@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Sun, Jul 12, 2015 at 12:02 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> As best I can tell (evidence below), the SQL standard requires that if a
>> single query reads a table with a TABLESAMPLE clause multiple times (say,
>> because it's on the inside of a nestloop), then the exact same set of
>> sampled rows are returned each time.

> Hmm, I tend to agree that it would be good if it behaved that way.
> Otherwise, it seems like the behavior could be quite surprising.

Yeah. As a concrete example, consider

select * from t1, t2 tablesample ... where t1.x = t2.x

and suppose that there are multiple occurences of x = 10 in both tables.
As things stand, if the join is done as a nestloop then a particular t2
row with x = 10 might appear in the output joined with some of the t1 rows
with x = 10 but not with others. On the other hand, the results of a hash
join would not be inconsistent in that way, since t2 would be read only
once.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shulgin, Oleksandr 2015-07-12 18:11:48 Re: [PATCH] Generalized JSON output functions
Previous Message Merlin Moncure 2015-07-12 16:57:42 Re: [PATCH] Generalized JSON output functions