Re: TABLESAMPLE patch

From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-03-10 10:05:31
Message-ID: 54FEC1EB.9070503@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/03/15 10:54, Amit Kapila wrote:
> On Tue, Mar 10, 2015 at 3:03 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com
> <mailto:petr(at)2ndquadrant(dot)com>> wrote:
> >
> > Ok now I think I finally understand what you are suggesting - you are
> saying let's go over whole page while tsmnexttuple returns something,
> and do the visibility check and other stuff in that code block under the
> buffer lock and cache the resulting valid tuples in some array and then
> return those tuples one by one from that cache?
> >
>
> Yes, this is what I am suggesting.
>
> >> > And if the caller will try to do it in one step and cache the
> >> visibility info then we'll end up with pretty much same structure as
> >> rs_vistuples - there isn't saner way to cache this info other than
> >> ordered vector of tuple offsets, unless we assume that most pages have
> >> close to MaxOffsetNumber of tuples which they don't, so why not just use
> >> the heapgetpage directly and do the binary search over rs_vistuples.
> >> >
> >>
> >> The downside of doing it via heapgetpage is that it will do
> >> visibility test for tuples which we might not even need (I think
> >> we should do visibility test for tuples retrurned by tsmnexttuple).
> >>
> >
> > Well, heapgetpage can either read visibility data for whole page or
> not, depending on if we want pagemode reading or not. So we can use the
> pagemode for sampling methods where it's feasible (like system) and not
> use pagemode where it's not (like bernoulli) and then either use the
> rs_vistuples or call HeapTupleSatisfiesVisibility individually again
> depending if the method is using pagemode or not.
> >
>
> Yeah, but as mentioned above, this has some downside, but go
> for it only if you feel that above suggestion is making code complex,
> which I think should not be the case as we are doing something similar
> in acquire_sample_rows().
>

I think your suggestion is actually simpler code wise, I am just
somewhat worried by the fact that no other scan node uses that kind of
caching and there is probably reason for that.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2015-03-10 11:40:07 Re: Reduce pinning in btree indexes
Previous Message Amit Kapila 2015-03-10 09:54:02 Re: TABLESAMPLE patch