Re: TABLESAMPLE patch

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-03-10 09:54:02
Message-ID: CAA4eK1K05d=B=_a7yqXSYLFE0gh04J8Jm_SX1Dwq0t9NqssdgA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 10, 2015 at 3:03 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>
> On 10/03/15 04:43, Amit Kapila wrote:
>>
>> On Mon, Mar 9, 2015 at 3:08 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com
>> <mailto:petr(at)2ndquadrant(dot)com>> wrote:
>> >
>> > On 09/03/15 04:51, Amit Kapila wrote:
>> >>
>> >> On Sat, Mar 7, 2015 at 10:37 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com
>> <mailto:petr(at)2ndquadrant(dot)com>
>> >> > Double checking for tuple visibility is the only downside I can
think
>> >> of.
>> >>
>> >> That will happen if we use heapgetpage and the way currently
>> >> code is written in patch, however we can easily avoid double
>> >> checking if we don't call heapgetpage and rather do the required
>> >> work at caller's place.
>> >>
>> >
>> > What's the point of pagemode then if the caller code does the
>> visibility checks still one by one on each call. I thought one of the
>> points of pagemode was to do this in one step (and one buffer lock).
>> >
>>
>> You only need one buffer lock for doing at caller's location
>> as well something like we do in acquire_sample_rows().
>>
>
> Ok now I think I finally understand what you are suggesting - you are
saying let's go over whole page while tsmnexttuple returns something, and
do the visibility check and other stuff in that code block under the buffer
lock and cache the resulting valid tuples in some array and then return
those tuples one by one from that cache?
>

Yes, this is what I am suggesting.

>> > And if the caller will try to do it in one step and cache the
>> visibility info then we'll end up with pretty much same structure as
>> rs_vistuples - there isn't saner way to cache this info other than
>> ordered vector of tuple offsets, unless we assume that most pages have
>> close to MaxOffsetNumber of tuples which they don't, so why not just use
>> the heapgetpage directly and do the binary search over rs_vistuples.
>> >
>>
>> The downside of doing it via heapgetpage is that it will do
>> visibility test for tuples which we might not even need (I think
>> we should do visibility test for tuples retrurned by tsmnexttuple).
>>
>
> Well, heapgetpage can either read visibility data for whole page or not,
depending on if we want pagemode reading or not. So we can use the pagemode
for sampling methods where it's feasible (like system) and not use pagemode
where it's not (like bernoulli) and then either use the rs_vistuples or
call HeapTupleSatisfiesVisibility individually again depending if the
method is using pagemode or not.
>

Yeah, but as mentioned above, this has some downside, but go
for it only if you feel that above suggestion is making code complex,
which I think should not be the case as we are doing something similar
in acquire_sample_rows().

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2015-03-10 10:05:31 Re: TABLESAMPLE patch
Previous Message Petr Jelinek 2015-03-10 09:33:21 Re: TABLESAMPLE patch