Re: TABLESAMPLE patch

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-04 12:57:12
Message-ID: CAA4eK1J85f+Q3WP-3N_Vp2z9p6seRhQj5F6yhJXXKcnJ-0bGpA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 3, 2015 at 3:06 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>
> Hi,
>
> so here is version 11.
>

Thanks, patch looks much better, but I think still few more
things needs to discussed/fixed.

1.
+tablesample_clause:
+ TABLESAMPLE ColId '(' func_arg_list ')' opt_repeatable_clause

Why do you want to allow func_arg_list?

Basically if user tries to pass multiple arguments in
TABLESAMPLE method's clause like (10,20), then I think
that should be caught in grammer level as an error.

SQL - 2003 specs says that argument to REPAEATABLE and TABLESAMPLE
method is same <numeric value expr>

It seems to me that you want to allow it to make it extendable
to user defined Tablesample methods, but not sure if it is
right to use func_arg_list for the same because sample method
arguments can have different definition than function arguments.
Now even if we want to keep sample method arguments same as
function arguments that sounds like a separate discussion.

In general, I feel you already have good basic infrastructure for
supportting other sample methods, but it is better to keep the new
DDL's for doing the same as a separate patch than this patch, as that
way we can reduce the scope of this patch, OTOH if you or others
feel that it is mandatory to have new DDL's support for other
tablesample methods then we have to deal with this now itself.

2.
postgres=# explain update test_tablesample TABLESAMPLE system(30) set id =
2;
ERROR: syntax error at or near "TABLESAMPLE"
LINE 1: explain update test_tablesample TABLESAMPLE system(30) set i...

postgres=# Delete from test_tablesample TABLESAMPLE system(30);
ERROR: syntax error at or near "TABLESAMPLE"
LINE 1: Delete from test_tablesample TABLESAMPLE system(30);

Isn't TABLESAMPLE clause suppose to work with Update/Delete
statements?

3.
+static RangeTblEntry *
+transformTableSampleEntry(ParseState *pstate, RangeTableSample *r)
..
+ parser_errposition(pstate,
+ exprLocation((Node *) r))));

Better to align exprLocation() with pstate

4.
SampleTupleVisible()
{
..
else
{
/* No pagemode, we have to check the tuple itself. */
Snapshot
snapshot = scan->rs_snapshot;
Buffer buffer = scan->rs_cbuf;

bool visible =
HeapTupleSatisfiesVisibility(tuple, snapshot, buffer);
..
}

I think it is better to check if PageIsAllVisible() in above
code before visibility check as that can avoid visibility checks.

5.
ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable,
List
*sampleargs)
{
..
if (con->val.type == T_Null)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("REPEATABLE
clause must be NOT NULL numeric value"),
parser_errposition
(pstate, con->location)));
..
}

InitSamplingMethod(SampleScanState *scanstate, TableSampleClause
*tablesample)
{
..
if (fcinfo.argnull[1])
ereport(ERROR,
(errcode
(ERRCODE_NULL_VALUE_NOT_ALLOWED),
errmsg("REPEATABLE clause cannot be
NULL")));
..
}

I think it would be better if we can have same error message
and probably the same error code in both of the above cases.

6.
extern TableSampleClause *
ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable,
List *sampleargs)

It is better to expose function (usage of extern) via header file.
Is there a need to mention extern here?

7.
ParseTableSample()
{
..
arg = coerce_to_specific_type(pstate, arg, INT4OID, "REPEATABLE");
..
}

What is the reason for coercing value of REPEATABLE clause to INT4OID
when numeric value is expected for the clause. If user gives the
input value as -2.3, it will generate a seed which doesn't seem to
be right.

8.
+DATA(insert OID = 3295 ( tsm_system_init PGNSP PGUID 12 1 0 0 0 f f f f t
f v 3 0 2278 "2281
23 700" _null_ _null_ _null_ _null_ tsm_system_init _null_ _null_ _null_ ));
+DATA(insert OID = 3301 ( tsm_bernoulli_init PGNSP PGUID 12 1 0 0 0 f f f
f t f v 3 0 2278 "2281
23 700" _null_ _null_ _null_ _null_ tsm_bernoulli_init _null_ _null_ _null_
));

Datatype for second argument is kept as 700 (Float4), shouldn't
it be 1700 (Numeric)?

9.
postgres=# explain SELECT t1.id FROM test_tablesample as t1 TABLESAMPLE
SYSTEM (
10), test_tablesample as t2 TABLESAMPLE BERNOULLI (20);
QUERY PLAN
----------------------------------------------------------------------------
Nested Loop (cost=0.00..4.05 rows=100 width=4)
-> Sample Scan on test_tablesample t1 (cost=0.00..0.01 rows=1 width=4)
-> Sample Scan on test_tablesample t2 (cost=0.00..4.02 rows=2 width=0)
(3 rows)

Isn't it better to display sample method name in explain.
I think it can help in case of join queries.
We can use below format to display:
Sample Scan (System) on test_tablesample ...

10. Bernoulli.c

+/* State */
+typedef struct
+{
+ uint32 seed; /* random seed */
+ BlockNumber startblock; /* starting block, we use ths for syncscan
support */

typo.
/ths/this

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sawada Masahiko 2015-04-04 13:21:42 Re: Proposal: knowing detail of config files via SQL
Previous Message David Rowley 2015-04-04 09:19:24 Re: Parallel Seq Scan