Re: TABLESAMPLE patch

From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-03-15 15:21:47
Message-ID: 5505A38B.6060702@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/03/15 11:05, Petr Jelinek wrote:
> On 10/03/15 10:54, Amit Kapila wrote:
>> On Tue, Mar 10, 2015 at 3:03 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com
>> <mailto:petr(at)2ndquadrant(dot)com>> wrote:
>> >
>> > Ok now I think I finally understand what you are suggesting - you are
>> saying let's go over whole page while tsmnexttuple returns something,
>> and do the visibility check and other stuff in that code block under the
>> buffer lock and cache the resulting valid tuples in some array and then
>> return those tuples one by one from that cache?
>> >
>>
>> Yes, this is what I am suggesting.
>>
>> >> > And if the caller will try to do it in one step and cache the
>> >> visibility info then we'll end up with pretty much same structure as
>> >> rs_vistuples - there isn't saner way to cache this info other than
>> >> ordered vector of tuple offsets, unless we assume that most pages
>> have
>> >> close to MaxOffsetNumber of tuples which they don't, so why not
>> just use
>> >> the heapgetpage directly and do the binary search over rs_vistuples.
>> >> >
>> >>
>> >> The downside of doing it via heapgetpage is that it will do
>> >> visibility test for tuples which we might not even need (I think
>> >> we should do visibility test for tuples retrurned by tsmnexttuple).
>> >>
>> >
>> > Well, heapgetpage can either read visibility data for whole page or
>> not, depending on if we want pagemode reading or not. So we can use the
>> pagemode for sampling methods where it's feasible (like system) and not
>> use pagemode where it's not (like bernoulli) and then either use the
>> rs_vistuples or call HeapTupleSatisfiesVisibility individually again
>> depending if the method is using pagemode or not.
>> >
>>
>> Yeah, but as mentioned above, this has some downside, but go
>> for it only if you feel that above suggestion is making code complex,
>> which I think should not be the case as we are doing something similar
>> in acquire_sample_rows().
>>
>
> I think your suggestion is actually simpler code wise, I am just
> somewhat worried by the fact that no other scan node uses that kind of
> caching and there is probably reason for that.
>

So it turned out to be simpler code-wise and slightly better performing
to still use the heapgetpage approach (the caching approach introduces a
lot of pallocing and tuple copying which seems to hurt performance a
bit). In general it seems that the sampling scan is different enough
from ANALYZE that same principles just can't be applied well to it.

There are now 2 ways of checking the visibility in sample scan, if the
sample method says that it will read whole pages it will just use the
rs_vistuples and if it says it won't read whole pages than it executes
the HeapTupleSatisfiesVisibility() on individual tuples. The buffer
locking is also done only if the whole page reading does not happen.

We trust the author of sampling method to set this correctly - it only
has performance related implications, everything should still work
correctly even if sampling method sets this wrongly so I think that's
acceptable.

I also did all the other adjustments we talked about up-thread and
rebased against current master (there was conflict with 31eae6028).

In the end I decided to not overload the heap_beginscan_strat even more
but just crate new heap_beginscan_ss interface.

Also while playing with the keywords I realized that not only REPEATABLE
can be moved to type_func_name_keyword but also TABLESAMPLE can be moved
down to col_name_keyword so both keyword were moved one level down
compared to previous versions. It's the best I could do in terms of
keywords.

Attached is new version.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0003-tablesample-ddl-v5.patch text/x-diff 60.5 KB
0002-tablesample-v10.patch text/x-diff 114.7 KB
0001-separate-block-sampling-functions-v2.patch text/x-diff 21.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-03-15 15:30:16 Re: Merge compact/non compact commits, make aborts dynamically sized
Previous Message Tom Lane 2015-03-15 15:09:38 Re: pg_dump quietly ignore missing tables - is it bug?