Re: TABLESAMPLE patch

From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-03-09 09:38:22
Message-ID: 54FD6A0E.9070000@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09/03/15 04:51, Amit Kapila wrote:
> On Sat, Mar 7, 2015 at 10:37 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com
> <mailto:petr(at)2ndquadrant(dot)com>> wrote:
> >
> > On 05/03/15 09:21, Amit Kapila wrote:
> >>
> >> On Tue, Feb 17, 2015 at 3:29 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com
> <mailto:petr(at)2ndquadrant(dot)com>
> >> <mailto:petr(at)2ndquadrant(dot)com <mailto:petr(at)2ndquadrant(dot)com>>> wrote:
> >> >
> >> >
> >> > I didn't add the whole page visibility caching as the tuple ids we
> >> get from sampling methods don't map well to the visibility info we get
> >> from heapgetpage (it maps to the values in the rs_vistuples array not to
> >> to its indexes). Commented about it in code also.
> >> >
> >>
> >> I think we should set pagemode for system sampling as it can
> >> have dual benefit, one is it will allow us caching tuples and other
> >> is it can allow us pruning of page which is done in heapgetpage().
> >> Do you see any downside to it?
> >
> >
> > Double checking for tuple visibility is the only downside I can think
> of.
>
> That will happen if we use heapgetpage and the way currently
> code is written in patch, however we can easily avoid double
> checking if we don't call heapgetpage and rather do the required
> work at caller's place.
>

What's the point of pagemode then if the caller code does the visibility
checks still one by one on each call. I thought one of the points of
pagemode was to do this in one step (and one buffer lock).

And if the caller will try to do it in one step and cache the visibility
info then we'll end up with pretty much same structure as rs_vistuples -
there isn't saner way to cache this info other than ordered vector of
tuple offsets, unless we assume that most pages have close to
MaxOffsetNumber of tuples which they don't, so why not just use the
heapgetpage directly and do the binary search over rs_vistuples.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2015-03-09 10:35:24 Re: EvalPlanQual behaves oddly for FDW queries involving system columns
Previous Message Michael Paquier 2015-03-09 08:16:43 Re: Install shared libs in lib/ and bin/ with MSVC (Was: install libpq.dll in bin directory on Windows / Cygwin)