Re: TABLESAMPLE patch

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-13 03:21:29
Message-ID: CAA4eK1LTxiKSmHGPJeFgRqmj934JBNC=COoHHrm3PXrnvvhiKw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Apr 11, 2015 at 12:56 AM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>
> On 4/9/15 8:58 PM, Petr Jelinek wrote:
> > Well, you can have two approaches to this, either allow some specific
> > set of keywords that can be used to specify limit, or you let sampling
> > methods interpret parameters, I believe the latter is more flexible.
> > There is nothing stopping somebody writing sampling method which takes
> > limit as number of rows, or anything else.
> >
> > Also for example for BERNOULLI to work correctly you'd need to convert
> > the number of rows to fraction of table anyway (and that's exactly what
> > the one database which has this feature does internally) and then it's
> > no different than passing (SELECT 100/reltuples*number_of_rows FROM
> > tablename) as a parameter.
>
> What is your intended use case for this feature? I know that "give me
> 100 random rows from this table quickly" is a common use case, but
> that's kind of cumbersome if you need to apply formulas like that. I'm
> not sure what the use of a percentage is. Presumably, the main use of
> this features is on large tables. But then you might not even know how
> large it really is, and even saying 0.1% might be more than you wanted
> to handle.
>

The use case for specifying number of rows for sample scan is valid
and can be achieved by other means if required as suggested by Petr
Jelinek, however the current proposed syntax (w.r.t to Sample
Percentage [1]) seems to comply with SQL standard, so why not go
for it and then extend it based on more use-cases?

[1]
SQL Standard (2003) w.r.t Sample Percentage
<sample clause> ::=
TABLESAMPLE <sample method> <left paren> <sample percentage> <right paren>
[ <repeatable clause> ]

<sample percentage> ::= <numeric value expression>

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2015-04-13 09:58:46 Re: EvalPlanQual behaves oddly for FDW queries involving system columns
Previous Message Amit Kapila 2015-04-13 02:39:10 Re: FPW compression leaks information