Re: TABLESAMPLE patch

From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-10 20:44:32
Message-ID: 55283630.7090201@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/04/15 22:16, Tomas Vondra wrote:
>
>
> On 04/10/15 21:57, Petr Jelinek wrote:
>> On 10/04/15 21:26, Peter Eisentraut wrote:
>>
>> But this was not really my point, the BERNOULLI just does not work
>> well with row-limit by definition, it applies probability on each
>> individual row and while you can get probability from percentage very
>> easily (just divide by 100), to get it for specific target number of
>> rows you have to know total number of source rows and that's not
>> something we can do very accurately so then you won't get 500 rows
>> but approximately 500 rows.
>
> It's actually even trickier. Even if you happen to know the exact number
> of rows in the table, you can't just convert that into a percentage like
> that and use it for BERNOULLI sampling. It may give you different number
> of result rows, because each row is sampled independently.
>
> That is why we have Vitter's algorithm for reservoir sampling, which
> works very differently from BERNOULLI.
>

Hmm this actually gives me idea - perhaps we could expose Vitter's
reservoir sampling as another sampling method for people who want the
"give me 500 rows from table fast" then? We already have it implemented
it's just matter of adding the glue.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2015-04-10 21:26:25 Re: raw output from copy
Previous Message Tomas Vondra 2015-04-10 20:16:16 Re: TABLESAMPLE patch