Re: TABLESAMPLE patch

From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-01-17 19:16:39
Message-ID: 54BAB517.40308@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 17/01/15 13:46, Amit Kapila wrote:
> On Sun, Jan 11, 2015 at 1:29 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com
> <mailto:petr(at)2ndquadrant(dot)com>> wrote:
> >
> >
> > In second patch which implements the TABLESAMPLE itself I changed the
> implementation of random generator because when I looked at the code
> again I realized the old one would produce wrong results if there were
> multiple TABLESAMPLE statements in same query or multiple cursors in
> same transaction.
> >
>
> I have looked into this patch and would like to share my
> findings with you.

That's a lot for this.

>
> 1.
> +static void
> +set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
> RangeTblEntry *rte)
> {
> ..
> +/*
> +* There is only one plan to consider but we still need to set
> +* parameters for RelOptInfo.
> +*/
> +set_cheapest(rel);
> }
>
> It seems we already call set_cheapest(rel) in set_rel_pathlist()
> which is a caller of set_tablesample_rel_pathlist(), so why do
> we need it inside set_tablesample_rel_pathlist()?

Ah, this changed after I started working on this patch and I didn't
notice - previously all the set_<something>_rel_pathlist called
set_cheapest() individually. I will change the code.

>
> 2.
> +static void
> +set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
> RangeTblEntry *rte)
> {
> ..
> +/* We only do sample scan if it was requested */
> +add_path(rel, (Path *) create_samplescan_path(root, rel, required_outer));
> }
>
> Do we need to add_path, if there is only one path?

Good point, we can probably just set the pathlist directly in this case.

>
> 3.
> @@ -332,6 +334,11 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
> /* Foreign table */
> set_foreign_pathlist(root, rel, rte);
> }
> +else if (rte->tablesample != NULL)
> +{
> +/* Build sample scan on relation */
> +set_tablesample_rel_pathlist(root, rel, rte);
> +}
>
> Have you consider to have a different RTE for this?
>

I assume you mean different RTEKind, yes I did, but the fact is that
it's still a relation, just with different scan type and I didn't want
to modify every piece of code which deals with RTE_RELATION to also deal
with the new RTE for sample scan as it seems unnecessary. I am not
strongly opinionated about this one though.

> 4.
> SampleNext()
> a.
> Isn't it better to code SampleNext() similar to SeqNext() and
> IndexNext(), which encapsulate the logic to get the tuple in
> another function(tbs_next() or something like that)?

Possibly, my thinking was that unlike the index_getnext() and
heap_getnext(), this function would not be called from any other place
so adding one more layer of abstraction didn't seem useful. And it would
mean creating new ScanDesc struct, etc.

>
> b.
> Also don't we want to handle pruning of page while
> scanning (heap_page_prune_opt()) and I observed
> in heap scan API's after visibility check we do check
> for serializable conflict (CheckForSerializableConflictOut()).

Both good points, will add.

> Another thing is don't we want to do anything for sync scans
> for these method's as they are doing heap scan?

I don't follow this one tbh.

>
> c.
> for bernoulli method, it will first get the tupoffset with
> the help of function and then apply visibility check, it seems
> that could reduce the sample set way lower than expected
> in case lot of tuples are not visible, shouldn't it be done in reverse
> way(first visibility check, then call function to see if we want to
> include it in the sample)?
> I think something similar is done in acquire_sample_rows which
> seems right to me.

I don't think so, the way bernoulli works is that it returns every tuple
with equal probability, so the visible tuples have same chance of being
returned as the invisible ones so the issue should be smoothed away
automatically (IMHO).

The acquire_sample_rows has limit on number of rows it returns so that's
why it has to do the visibility check before as the problem you
described applies there.

The reason for using the probability instead of tuple limit is the fact
that there is no way to accurately guess number of tuples in the table
at the beginning of scan. This method should actually be better at
returning the correct number of tuples without dependence on how many of
them are visible or not and how much space in blocks is used.

>
> 5.
>
> CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10);
> INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM
> generate_series(0, 9) s(i) ORDER BY i;
>
> postgres=# SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (80);
> id
> ----
> 1
> 3
> 4
> 7
> 8
> 9
> (6 rows)
>
>
> postgres=# SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (80);
> id
> ----
> 0
> 1
> 2
> 3
> 4
> 5
> 6
> 7
> 8
> 9
> (10 rows)
>
> So above test yield 60% rows first time and 100% rows next time,
> when the test has requested 80%.
> I understand that sample percentage will result an approximate
> number of rows, however I just wanted that we should check if the
> implementation has any problem or not?

I think this is caused by random generator not producing smooth enough
random distribution on such a small sample (or possibly in general?).

>
> In this regard, I have one question related to below code:
>
> +Datum
> +tsm_bernoulli_nexttuple(PG_FUNCTION_ARGS)
> +{
> ..
> +/* Every tuple has percent chance of being returned */
> +while (sampler_random_fract(sampler->randstate) > samplesize)
> +{
> +tupoffset++;
> +
> +if (tupoffset > maxoffset)
> +break;
> +}
>
> Why are we not considering tuple in above code
> if tupoffset is less than maxoffset?
>

We consider it, I will rename the samplesize to probability or something
to make it more clear what it actually is and maybe expand the comment
above the loop.

What the loop does is that it basically considers each offset on a page
and does "coin flip" - generates random number using
sampler_random_fract and if the random number falls within the
probability (= is smaller or equal to samplesize) it will be returned, the
if (tupoffset > maxoffset)
break;
is there just because we need to give control back to scan so it can
move to another block.

> 6.
> One larger question about the approach used in patch, why do you
> think it is better to have a new node (SampleScan/SamplePath) like
> you have used in patch instead of doing it as part of Sequence Scan.
> I agree that we need some changes in different parts of Sequence Scan
> execution, but still sounds like a viable way. One simple thing that
> occurs to me is that in ExecSeqScan(), we can use something like
> SampleSeqNext instead of SeqNext to scan heap in a slightly different
> way, probably doing it as part of sequence scan will have advantage that
> we can use most of the existing infrastructure (sequence node path)
> and have less discrepancies as mentioned in point-4.
>

I originally started from SeqScan but well, it requires completely
different State structure, it needs to call sampling methods in various
places (not just for next tuple), it needs different handling in explain
and in optimizer (costing). If we add all this to sequential scan it
will not be that much different from new scan node (we'd save the couple
of new copy functions and one struct, but I'd rather have clearly
distinct scan).

It would also not help with the discrepancies you mentioned because
those are in heapam and SampleScan would not use that even if it was
merged with SeqScan - I don't think we want to implement the sampling on
heapam level, it's too much of a hotspot to be good idea to add
additional cycles there.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2015-01-17 22:43:10 Re: proposal: searching in array function - array_position
Previous Message Pavel Stehule 2015-01-17 18:56:54 Re: proposal: searching in array function - array_position