Re: TABLESAMPLE patch

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-09 23:47:49
Message-ID: CA+U5nMKjU=KkKLSKUXQr9LrSfWD0maLuyaa_DZ-9c_7Fdn7gBg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 9 April 2015 at 15:30, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On 4/9/15 5:02 AM, Michael Paquier wrote:
>> Just to be clear, the example above being misleading... Doing table
>> sampling using SYSTEM at physical level makes sense. In this case I
>> think that we should properly error out when trying to use this method
>> on something not present at physical level. But I am not sure that
>> this restriction applies to BERNOUILLI: you may want to apply it on
>> other things than physical relations, like views or results of WITH
>> clauses. Also, based on the fact that we support custom sampling
>> methods, I think that it should be up to the sampling method to define
>> on what kind of objects it supports sampling, and where it supports
>> sampling fetching, be it page-level fetching or analysis from an
>> existing set of tuples. Looking at the patch, TABLESAMPLE is just
>> allowed on tables and matviews, this limitation is too restrictive
>> IMO.
>
> In the SQL standard, the TABLESAMPLE clause is attached to a table
> expression (<table primary>), which includes table functions,
> subqueries, CTEs, etc. In the proposed patch, it is attached to a table
> name, allowing only an ONLY clause. So this is a significant deviation.

There is no deviation from the standard in the current patch.
Currently we are 100% unimplemented feature; the patch would move us
directly towards a fully implemented feature, perhaps reduce to fully
implemented.

> Obviously, doing block sampling on a physical table is a significant use
> case

Very significant use case, which this patch addresses. Query result
sampling would not be a very interesting use case and was not even
thought of without the SQL Standard.

>, but we should be clear about which restrictions and tradeoffs were
> are making now and in the future, especially if we are going to present
> extension interfaces. The fact that physical tables are interchangeable
> with other relation types, at least in data-reading contexts, is a
> feature worth preserving.

Agreed.

This patch does nothing to change that interchangeability. There is no
restriction or removal of current query capability.

It looks trivial to make it work for query results also, but if it is
not, ISTM something that can be added in a later release.

> It may be worth thinking about some examples of other sampling methods,
> in order to get a better feeling for whether the interfaces are appropriate.
>
> Earlier in the thread, someone asked about supporting specifying a
> number of rows instead of percents. While not essential, that seems
> pretty useful, but I wonder how that could be implemented later on if we
> take the approach that the argument to the sampling method can be an
> arbitrary quantity that is interpreted only by the method.

Not sure I understand that. The method could allow parameters of any unit.

Having a function-base implementation allows stratified sampling or
other approaches suited directly to user's data.

I don't think its reasonable to force all methods to offer both limits
on numbers of rows or percentages. They may not be applicable.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2015-04-10 00:58:56 Re: TABLESAMPLE patch
Previous Message Michael Paquier 2015-04-09 22:57:30 Re: FPW compression leaks information