Re: TABLESAMPLE patch

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-09 19:30:49
Message-ID: 5526D369.1070905@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/9/15 5:02 AM, Michael Paquier wrote:
> Just to be clear, the example above being misleading... Doing table
> sampling using SYSTEM at physical level makes sense. In this case I
> think that we should properly error out when trying to use this method
> on something not present at physical level. But I am not sure that
> this restriction applies to BERNOUILLI: you may want to apply it on
> other things than physical relations, like views or results of WITH
> clauses. Also, based on the fact that we support custom sampling
> methods, I think that it should be up to the sampling method to define
> on what kind of objects it supports sampling, and where it supports
> sampling fetching, be it page-level fetching or analysis from an
> existing set of tuples. Looking at the patch, TABLESAMPLE is just
> allowed on tables and matviews, this limitation is too restrictive
> IMO.

In the SQL standard, the TABLESAMPLE clause is attached to a table
expression (<table primary>), which includes table functions,
subqueries, CTEs, etc. In the proposed patch, it is attached to a table
name, allowing only an ONLY clause. So this is a significant deviation.

Obviously, doing block sampling on a physical table is a significant use
case, but we should be clear about which restrictions and tradeoffs were
are making now and in the future, especially if we are going to present
extension interfaces. The fact that physical tables are interchangeable
with other relation types, at least in data-reading contexts, is a
feature worth preserving.

It may be worth thinking about some examples of other sampling methods,
in order to get a better feeling for whether the interfaces are appropriate.

Earlier in the thread, someone asked about supporting specifying a
number of rows instead of percents. While not essential, that seems
pretty useful, but I wonder how that could be implemented later on if we
take the approach that the argument to the sampling method can be an
arbitrary quantity that is interpreted only by the method.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2015-04-09 19:40:23 Re: psql showing owner in \dT
Previous Message Magnus Hagander 2015-04-09 19:09:24 Re: SSL information view