Quick Links

Re: TABLESAMPLE patch

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject:	Re: TABLESAMPLE patch
Date:	2015-03-07 17:07:27
Message-ID:	54FB304F.9020205@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 05/03/15 09:21, Amit Kapila wrote:
> On Tue, Feb 17, 2015 at 3:29 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com
> <mailto:petr(at)2ndquadrant(dot)com>> wrote:
> >
> >
> > I didn't add the whole page visibility caching as the tuple ids we
> get from sampling methods don't map well to the visibility info we get
> from heapgetpage (it maps to the values in the rs_vistuples array not to
> to its indexes). Commented about it in code also.
> >
>
> I think we should set pagemode for system sampling as it can
> have dual benefit, one is it will allow us caching tuples and other
> is it can allow us pruning of page which is done in heapgetpage().
> Do you see any downside to it?

Double checking for tuple visibility is the only downside I can think
of. Plus some added code complexity of course. I guess we could use
binary search on rs_vistuples (it's already sorted) so that info won't
be completely useless. Not sure if worth it compared to normal
visibility check, but not hard to do.

I personally don't see the page pruning in TABLESAMPLE as all that
important though, considering that in practice it will only scan tiny
portion of a relation and usually one that does not get many updates (in
practice the main use-case is BI).

>
> Few other comments:
>
> 1.
> Current patch fails to apply, minor change is required:
> patching file `src/backend/utils/misc/Makefile'
> Hunk #1 FAILED at 15.
> 1 out of 1 hunk FAILED -- saving rejects to
> src/backend/utils/misc/Makefile.rej

Ah bitrot over time.

>
> 2.
> Few warnings in code (compiled on windows(msvc))
> 1>src\backend\utils\tablesample\bernoulli.c(217): warning C4305: '=' :
> truncation from 'double' to 'float4'
> 1>src\backend\utils\tablesample\system.c(177): warning C4305: '=' :
> truncation from 'double' to 'float4'
>

I think this is just compiler stupidity but hopefully fixed (I don't
have msvc to check for it and other compilers I tried don't complain).

> 3.
> +static void
> +InitScanRelation(SampleScanState *node, EState *estate, int eflags,
> +TableSampleClause *tablesample)
> {
> ..
> +/*
> +* Page at a time mode is useless for us as we need to check visibility
> +* of tuples individually because tuple offsets returned by sampling
> +* methods map to rs_vistuples values and not to its indexes.
> +*/
> +node->ss.ss_currentScanDesc->rs_pageatatime = false;
> ..
> }
>
> Modifying scandescriptor in nodeSamplescan.c looks slightly odd,
> Do we modify this way at anyother place?
>
> I think it is better to either teach heap_beginscan_* api's about
> it or expose it via new API in heapam.c
>

Yeah I agree, I taught the heap_beginscan_strat about it as that one is
the advanced API.

>
> 4.
> +Datum
> +tsm_system_cost(PG_FUNCTION_ARGS)
> {
> ..
> +*tuples = path->rows * samplesize;
> }
>
> It seems above calculation considers all rows in table are of
> equal size and hence multiplying directly with samplesize will
> give estimated number of rows for sample method, however if
> row size varies then this calculation might not give right
> results. I think if possible we should consider the possibility
> of rows with varying sizes else we can at least write a comment
> to indicate the possibility of improvement for future.
>

I am not sure how we would know what size would the tuples have in the
random blocks that we are going to read later. That said, I am sure that
costing can be improved even if I don't know how myself.

> 6.
> @@ -13577,6 +13608,7 @@ reserved_keyword:
> | PLACING
> | PRIMARY
> | REFERENCES
> +| REPEATABLE
>
> Have you tried to investigate the reason why it is giving shift/reduce
> error for unreserved category and if we are not able to resolve that,
> then at least we can try to make it in some less restrictive category.
> I tried (on windows) by putting it in (type_func_name_keyword:) and
> it seems to be working.
>
> To investigate, you can try with information at below link:
> http://www.gnu.org/software/bison/manual/html_node/Understanding.html
>
> Basically I think we should try some more before concluding
> to change the category of REPEATABLE and especially if we
> want to make it a RESERVED keyword.

Yes it can be moved to type_func_name_keyword which is not all that much
better but at least something. I did try to play with this already
during first submission but could not find a way to move it to something
less restrictive.

I could not even pinpoint what exactly is the shift/reduce issue except
that it has something to do with the fact that the REPEATABLE clause is
optional (at least I didn't have the problem when it wasn't optional).

>
> On a related note, it seems you have agreed upthread with
> Kyotaro-san for below change, but I don't see the same in latest patch:
>
> diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
> index 4578b5e..8cf09d5 100644
> --- a/src/backend/parser/gram.y
> +++ b/src/backend/parser/gram.y
> @@ -10590,7 +10590,7 @@ tablesample_clause:
> ;
>
> opt_repeatable_clause:
> - REPEATABLE '(' AexprConst ')' { $$ = (Node *)
> $3; }
> + REPEATABLE '(' a_expr ')' { $$ = (Node *)
> $3; }
> | /*EMPTY*/
> { $$ = NULL; }
>

Bah, lost this change during rebase.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Re: TABLESAMPLE patch at 2015-03-05 08:21:16 from Amit Kapila

Responses

Re: TABLESAMPLE patch at 2015-03-09 03:51:41 from Amit Kapila

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bruce Momjian	2015-03-07 17:22:32	Re: MD5 authentication needs help
Previous Message	Marco Nenciarini	2015-03-07 15:20:17	Re: File based Incremental backup v8