Re: TABLESAMPLE patch

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-09 09:02:21
Message-ID: CAB7nPqS-t2pM6aCxurtdrvb0PJ3KYRpxCh20xKoYvGu+_9QrVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 9, 2015 at 5:12 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Thu, Apr 9, 2015 at 4:30 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>> On Mon, Apr 6, 2015 at 11:02 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>>>
>>> On 06/04/15 14:30, Petr Jelinek wrote:
>>>>
>>>> On 06/04/15 11:02, Simon Riggs wrote:
>>>>
>>>>> Are we ready for a final detailed review and commit?
>>>>>
>>>>
>>>> I plan to send v12 in the evening with some additional changes that came
>>>> up from Amit's comments + some improvements to error reporting. I think
>>>> it will be ready then.
>>>>
>>>
>>> Ok so here it is.
>>>
>>> Changes vs v11:
>>> - changed input parameter list to expr_list
>>> - improved error reporting, particularly when input parameters are wrong
>>> - fixed SELECT docs to show correct syntax and mention that there can be
>>> more sampling methods
>>> - added name of the sampling method to the explain output - I don't like
>>> the code much there as it has to look into RTE but on the other hand I don't
>>> want to create new scan node just so it can hold the name of the sampling
>>> method for explain
>>> - made views containing TABLESAMPLE clause not autoupdatable
>>> - added PageIsAllVisible() check before trying to check for tuple
>>> visibility
>>> - some typo/white space fixes
>>
>>
>>
>> Compiler warnings:
>> explain.c: In function 'ExplainNode':
>> explain.c:861: warning: 'sname' may be used uninitialized in this function
>>
>>
>> Docs spellings:
>>
>> "PostgreSQL distrribution" extra r.
>>
>> "The optional parameter REPEATABLE acceps" accepts. But I don't know that
>> 'accepts' is the right word. It makes the seed value sound optional to
>> REPEATABLE.
>>
>> "each block having same chance" should have "the" before "same".
>>
>> "Both of those sampling methods currently...". I think it should be "these"
>> not "those", as this sentence is immediately after their introduction, not
>> at a distance.
>>
>> "...tuple contents and decides if to return in, or zero if none" Something
>> here is confusing. "return it", not "return in"?
>>
>> Other comments:
>>
>> Do we want tab completions for psql? (I will never remember how to spell
>> BERNOULLI).
>
> Yes. I think so.
>
>> Needs a Cat version bump.
>
> The committer who will pick up this patch will normally do it.
>
> Patch 1 is simple enough and looks fine to me.
>
> Regarding patch 2...
> I found for now some typos:
> + <title><structname>pg_tabesample_method</structname></title>
> + <productname>PostgreSQL</productname> distrribution:
>
> Also, I am wondering if the sampling logic based on block analysis is
> actually correct, for example for now this fails and I think that we
> should support it:
> =# with query_select as (select generate_series(1, 10) as a) select
> query_select.a from query_select tablesample system (100.0/11)
> REPEATABLE (9999);
> ERROR: 42P01: relation "query_select" does not exist
>
> How does the SQL spec define exactly TABLESAMPLE? Shouldn't we get a
> sample from a result set?
> Thoughts?

Just to be clear, the example above being misleading... Doing table
sampling using SYSTEM at physical level makes sense. In this case I
think that we should properly error out when trying to use this method
on something not present at physical level. But I am not sure that
this restriction applies to BERNOUILLI: you may want to apply it on
other things than physical relations, like views or results of WITH
clauses. Also, based on the fact that we support custom sampling
methods, I think that it should be up to the sampling method to define
on what kind of objects it supports sampling, and where it supports
sampling fetching, be it page-level fetching or analysis from an
existing set of tuples. Looking at the patch, TABLESAMPLE is just
allowed on tables and matviews, this limitation is too restrictive
IMO.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-04-09 09:04:06 Re: TABLESAMPLE patch
Previous Message Simon Riggs 2015-04-09 08:52:47 Re: TABLESAMPLE patch