Re: TABLESAMPLE patch

From: Simon Riggs <simon(dot)riggs(at)2ndquadrant(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-17 15:09:45
Message-ID: CANP8+jJTY8NV5HoOcgp_jFcw6+NtfcnYwDwcZn+4vYm0gSj8zw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 17 April 2015 at 14:54, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:

> I agree that DDL patch is not that important to get in (and I made it last
> patch in the series now), which does not mean somebody can't write the
> extension with new tablesample method.
>
>
> In any case attached another version.
>
> Changes:
> - I addressed the comments from Michael
>
> - I moved the interface between nodeSampleScan and the actual sampling
> method to it's own .c file and added TableSampleDesc struct for it. This
> makes the interface cleaner and will make it more straightforward to extend
> for subqueries in the future (nothing really changes just some functions
> were renamed and moved). Amit suggested this at some point and I thought
> it's not needed at that time but with the possible future extension to
> subquery support I changed my mind.
>
> - renamed heap_beginscan_ss to heap_beginscan_sampling to avoid confusion
> with sync scan
>
> - reworded some things and more typo fixes
>
> - Added two sample contrib modules demonstrating row limited and time
> limited sampling. I am using linear probing for both of those as the
> builtin block sampling is not well suited for row limited or time limited
> sampling. For row limited I originally thought of using the Vitter's
> reservoir sampling but that does not fit well with the executor as it needs
> to keep the reservoir of all the output tuples in memory which would have
> horrible memory requirements if the limit was high. The linear probing
> seems to work quite well for the use case of "give me 500 random rows from
> table".
>

For me, the DDL changes are something we can leave out for now, as a way to
minimize the change surface.

I'm now moving to final review of patches 1-5. Michael requested patch 1 to
be split out. If I commit, I will keep that split, but I am considering all
of this as a single patchset. I've already spent a few days reviewing, so I
don't expect this will take much longer.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2015-04-17 16:29:15 Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0
Previous Message Simon Riggs 2015-04-17 15:03:34 Re: Moving on to close the current CF 2015-02