From: | Simon Riggs <simon(dot)riggs(at)2ndquadrant(dot)com> |
---|---|
To: | Petr Jelinek <petr(at)2ndquadrant(dot)com> |
Cc: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz> |
Subject: | Re: TABLESAMPLE patch |
Date: | 2015-04-17 15:09:45 |
Message-ID: | CANP8+jJTY8NV5HoOcgp_jFcw6+NtfcnYwDwcZn+4vYm0gSj8zw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 17 April 2015 at 14:54, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
> I agree that DDL patch is not that important to get in (and I made it last
> patch in the series now), which does not mean somebody can't write the
> extension with new tablesample method.
>
>
> In any case attached another version.
>
> Changes:
> - I addressed the comments from Michael
>
> - I moved the interface between nodeSampleScan and the actual sampling
> method to it's own .c file and added TableSampleDesc struct for it. This
> makes the interface cleaner and will make it more straightforward to extend
> for subqueries in the future (nothing really changes just some functions
> were renamed and moved). Amit suggested this at some point and I thought
> it's not needed at that time but with the possible future extension to
> subquery support I changed my mind.
>
> - renamed heap_beginscan_ss to heap_beginscan_sampling to avoid confusion
> with sync scan
>
> - reworded some things and more typo fixes
>
> - Added two sample contrib modules demonstrating row limited and time
> limited sampling. I am using linear probing for both of those as the
> builtin block sampling is not well suited for row limited or time limited
> sampling. For row limited I originally thought of using the Vitter's
> reservoir sampling but that does not fit well with the executor as it needs
> to keep the reservoir of all the output tuples in memory which would have
> horrible memory requirements if the limit was high. The linear probing
> seems to work quite well for the use case of "give me 500 random rows from
> table".
>
For me, the DDL changes are something we can leave out for now, as a way to
minimize the change surface.
I'm now moving to final review of patches 1-5. Michael requested patch 1 to
be split out. If I commit, I will keep that split, but I am considering all
of this as a single patchset. I've already spent a few days reviewing, so I
don't expect this will take much longer.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2015-04-17 16:29:15 | Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0 |
Previous Message | Simon Riggs | 2015-04-17 15:03:34 | Re: Moving on to close the current CF 2015-02 |