From: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
---|---|
To: | Alexey Chernyshov <a(dot)chernyshov(at)postgrespro(dot)ru> |
Cc: | PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [PATCH] Pageinspect - add functions on GIN and GiST indexes from gevel |
Date: | 2017-09-06 10:56:45 |
Message-ID: | CAEepm=1Vf6edWc33qF5us5Vw4J8QS-3LYtFzFVKYscs6c9+B7g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Jul 22, 2017 at 12:05 AM, Alexey Chernyshov
<a(dot)chernyshov(at)postgrespro(dot)ru> wrote:
> the following patch transfers functionality from gevel module
> (http://www.sai.msu.su/~megera/wiki/Gevel) which provides functions for
> analyzing GIN and GiST indexes to pageinspect. Gevel was originally
> designed by Oleg Bartunov, and Teodor Sigaev for developers of GiST and
> GIN indexes.
>
> Functions added:
> - gist_stat(text) - shows statistics on GiST Tree
> - gist_tree(text) - shows GiST tree
> - gist_tree(text, int4) - shows GiST tree up to MAXLEVEL
> - gist_print(text) - prints objects stored in GiST tree
> - spgist_stat(text) - shows statistics on SP-GiST
> - spgist_print(text) - prints objects stored in index
> - gin_value_count() - originally gin_stat(text) - prints estimated counts
> for index values
> - gin_stats() - originally gin_statpage(text) - shows statistics
> - gin_count_estimate(text, tsquery) - shows number of indexed rows matched
> query
>
> Tests also transferred, docs for new functions are added. I run pgindent
> over the code, but the result is different from those I expected, so I leave
> pgindented one.
> The patch is applicable to the commit
> 866f4a7c210857aa342bf901558d170325094dde.
Hi Alexey,
This patch still applies but doesn't build after commits 2cd70845 and c6293249.
ginfuncs.c:91:47: error: invalid type argument of ‘->’ (have
‘FormData_pg_attribute’)
st->index->rd_att->attrs[st->attnum]->attbyval,
...several similar errors...
For example, that expression needs to be changed to:
TupleDescAttr(st->index->rd_att, attnum)->attbyval
Here is some superficial review since I'm here:
+/*
+ * process_tuples
+ * Retrieves the number of TIDs in a tuple.
+ */
+static void
+process_tuple(FuncCallContext *funcctx, GinStatState * st, IndexTuple itup)
"process_tuples" vs "process_tuple"
+ /* do no distiguish various null category */
"do not distinguish various null categories"
+ st->nulls[0] = (st->category == GIN_CAT_NORM_KEY) ? false : true;
That's a long way to write (st->category != GIN_CAT_NORM_KEY)!
+ * We scaned the whole page, so we should take right page
"scanned"
+/*
+ * refind_position
+ * Refinds a previois position, at returns it has correctly
+ * set offset and buffer is locked.
+ */
"previous", s/, at returns/. On return/
+ memset(st->nulls, false, 2 * sizeof(*st->nulls));
"false" looks strange here where an int is expected. The size
expression is weird. I think you should just write:
st->nulls[0] = false;
st->nulls[1] = false;
Investigating the types more closely, I see that 'nulls' is declared
like this (in struct GinStatState):
+ char nulls[2];
Then later you do this:
+ htuple = heap_form_tuple(funcctx->attinmeta->tupdesc,
+ st->dvalues,
+ st->nulls);
But that's not OK, heap_form_tuple takes bool *. bool and char are
not interchangeable (they may have different sizes on some platforms,
among other problems, even though usually they are both 1 byte). So I
think you should declare it as "bool nulls[2]".
Meanwhile in TypeStorage you have a member "bool *nulls", which you
then initialise like so:
+ st->nulls = (char *) palloc((tupdesc->natts + 2) * sizeof(*st->nulls));
That cast is wrong.
+/*
+ * gin_count_estimate
+ * Outputs number of indexed rows matched query. It doesn't touch heap at
+ * all.
Maybe "... number of indexed rows matching query."?
+ if (attnum < 0 || attnum >= st->index->rd_att->natts)
+ elog(ERROR, "Wrong column's number");
+ st->attnum = attnum;
Maybe "invalid column number" or "invalid attribute number"?
+ elog(ERROR, "Column type is not a tsvector");
s/C/c/ (according to convention).
+ * Prints various stat about index internals.
s/stat/stats/
+ elog(ERROR, "Relation %s.%s is not an index",
s/R/r/
+ elog(ERROR, "Index %s.%s has wrong type",
s/I/i/
+ <function>gin_value_count</function> prints estimated counts for each
+ indexed values, single-argument function will return result for a first
+ column of index. For example:
"... for each indexed value. The single argument version will return
results for the first column of an index. For example:"
+ <function>gin_count_estimate</function> outputs number of indexed
+ rows matched query. It doesn't touch heap at all. For example:
"... outputs the number of indexed rows matched by a query. ..."
+ <function>gist_print</function> prints objects stored in
+ <acronym>GiST</acronym> tree, works only if objects in index have
+ textual representation (<function>type_out</function> functions should be
+ implemented for the given object type). It's known to work with R-tree
+ <acronym>GiST</acronym> based index. Note, in example below, objects are
+ of type box. For example:
"... prints objects stored in a GiST tree. it works only if the
objects in the index have textual representation (type_out functions
should be implemented for the given object type). It's known to work
with R-tree GiST-based indexes. Note: in the example below, objects
are of type box. For example:"
+ <function>gin_value_count</function> prints estimated counts for each
+ indexed value for a given column. For example:
Maybe "for a given column (starting from zero)"?
+ <function>spgist_print</function> prints objects stored in
+ <acronym>SP-GiST</acronym> tree, works only if objects in index have
+ textual representation (<function>type_out</function> functions should
+ be implemented for the given object type).
"... tree. It works only if ..."
+ Note 1. in example below we used quad_point_ops which uses point
+ for leaf and prefix value, but doesn't use node_label at all.
+ Use type 'int' as dummy type for prefix or/and node_label.
"... in the example ..."
A few random whitespace changes:
- (errmsg("must be superuser to use raw page functions"))));
+ (errmsg("must be superuser to use raw page functions")))
+ );
- opaq->flags, GIN_META)));
+ opaq->flags, GIN_META))
+ );
- flags[nflags++] = DirectFunctionCall1(to_hex32, Int32GetDatum(flagbits));
+ flags[nflags++] = DirectFunctionCall1(to_hex32,
+ Int32GetDatum(flagbits));
- (GIN_DATA | GIN_LEAF | GIN_COMPRESSED))));
+ (GIN_DATA | GIN_LEAF | GIN_COMPRESSED)))
+ );
--
Thomas Munro
http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2017-09-06 11:11:03 | Re: additional contrib test suites |
Previous Message | Rafia Sabih | 2017-09-06 10:44:45 | Re: Effect of changing the value for PARALLEL_TUPLE_QUEUE_SIZE |