Re: [PATCH] Pageinspect - add functions on GIN and GiST indexes from gevel

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Alexey Chernyshov <a(dot)chernyshov(at)postgrespro(dot)ru>
Cc: PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Pageinspect - add functions on GIN and GiST indexes from gevel
Date: 2017-09-06 10:56:45
Message-ID: CAEepm=1Vf6edWc33qF5us5Vw4J8QS-3LYtFzFVKYscs6c9+B7g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jul 22, 2017 at 12:05 AM, Alexey Chernyshov
<a(dot)chernyshov(at)postgrespro(dot)ru> wrote:
> the following patch transfers functionality from gevel module
> (http://www.sai.msu.su/~megera/wiki/Gevel) which provides functions for
> analyzing GIN and GiST indexes to pageinspect. Gevel was originally
> designed by Oleg Bartunov, and Teodor Sigaev for developers of GiST and
> GIN indexes.
>
> Functions added:
> - gist_stat(text) - shows statistics on GiST Tree
> - gist_tree(text) - shows GiST tree
> - gist_tree(text, int4) - shows GiST tree up to MAXLEVEL
> - gist_print(text) - prints objects stored in GiST tree
> - spgist_stat(text) - shows statistics on SP-GiST
> - spgist_print(text) - prints objects stored in index
> - gin_value_count() - originally gin_stat(text) - prints estimated counts
> for index values
> - gin_stats() - originally gin_statpage(text) - shows statistics
> - gin_count_estimate(text, tsquery) - shows number of indexed rows matched
> query
>
> Tests also transferred, docs for new functions are added. I run pgindent
> over the code, but the result is different from those I expected, so I leave
> pgindented one.
> The patch is applicable to the commit
> 866f4a7c210857aa342bf901558d170325094dde.

Hi Alexey,

This patch still applies but doesn't build after commits 2cd70845 and c6293249.

ginfuncs.c:91:47: error: invalid type argument of ‘->’ (have
‘FormData_pg_attribute’)
st->index->rd_att->attrs[st->attnum]->attbyval,
...several similar errors...

For example, that expression needs to be changed to:

TupleDescAttr(st->index->rd_att, attnum)->attbyval

Here is some superficial review since I'm here:

+/*
+ * process_tuples
+ * Retrieves the number of TIDs in a tuple.
+ */
+static void
+process_tuple(FuncCallContext *funcctx, GinStatState * st, IndexTuple itup)

"process_tuples" vs "process_tuple"

+ /* do no distiguish various null category */

"do not distinguish various null categories"

+ st->nulls[0] = (st->category == GIN_CAT_NORM_KEY) ? false : true;

That's a long way to write (st->category != GIN_CAT_NORM_KEY)!

+ * We scaned the whole page, so we should take right page

"scanned"

+/*
+ * refind_position
+ * Refinds a previois position, at returns it has correctly
+ * set offset and buffer is locked.
+ */

"previous", s/, at returns/. On return/

+ memset(st->nulls, false, 2 * sizeof(*st->nulls));

"false" looks strange here where an int is expected. The size
expression is weird. I think you should just write:

st->nulls[0] = false;
st->nulls[1] = false;

Investigating the types more closely, I see that 'nulls' is declared
like this (in struct GinStatState):

+ char nulls[2];

Then later you do this:

+ htuple = heap_form_tuple(funcctx->attinmeta->tupdesc,
+ st->dvalues,
+ st->nulls);

But that's not OK, heap_form_tuple takes bool *. bool and char are
not interchangeable (they may have different sizes on some platforms,
among other problems, even though usually they are both 1 byte). So I
think you should declare it as "bool nulls[2]".

Meanwhile in TypeStorage you have a member "bool *nulls", which you
then initialise like so:

+ st->nulls = (char *) palloc((tupdesc->natts + 2) * sizeof(*st->nulls));

That cast is wrong.

+/*
+ * gin_count_estimate
+ * Outputs number of indexed rows matched query. It doesn't touch heap at
+ * all.

Maybe "... number of indexed rows matching query."?

+ if (attnum < 0 || attnum >= st->index->rd_att->natts)
+ elog(ERROR, "Wrong column's number");
+ st->attnum = attnum;

Maybe "invalid column number" or "invalid attribute number"?

+ elog(ERROR, "Column type is not a tsvector");

s/C/c/ (according to convention).

+ * Prints various stat about index internals.

s/stat/stats/

+ elog(ERROR, "Relation %s.%s is not an index",

s/R/r/

+ elog(ERROR, "Index %s.%s has wrong type",

s/I/i/

+ <function>gin_value_count</function> prints estimated counts for each
+ indexed values, single-argument function will return result for a first
+ column of index. For example:

"... for each indexed value. The single argument version will return
results for the first column of an index. For example:"

+ <function>gin_count_estimate</function> outputs number of indexed
+ rows matched query. It doesn't touch heap at all. For example:

"... outputs the number of indexed rows matched by a query. ..."

+ <function>gist_print</function> prints objects stored in
+ <acronym>GiST</acronym> tree, works only if objects in index have
+ textual representation (<function>type_out</function> functions should be
+ implemented for the given object type). It's known to work with R-tree
+ <acronym>GiST</acronym> based index. Note, in example below, objects are
+ of type box. For example:

"... prints objects stored in a GiST tree. it works only if the
objects in the index have textual representation (type_out functions
should be implemented for the given object type). It's known to work
with R-tree GiST-based indexes. Note: in the example below, objects
are of type box. For example:"

+ <function>gin_value_count</function> prints estimated counts for each
+ indexed value for a given column. For example:

Maybe "for a given column (starting from zero)"?

+ <function>spgist_print</function> prints objects stored in
+ <acronym>SP-GiST</acronym> tree, works only if objects in index have
+ textual representation (<function>type_out</function> functions should
+ be implemented for the given object type).

"... tree. It works only if ..."

+ Note 1. in example below we used quad_point_ops which uses point
+ for leaf and prefix value, but doesn't use node_label at all.
+ Use type 'int' as dummy type for prefix or/and node_label.

"... in the example ..."

A few random whitespace changes:

- (errmsg("must be superuser to use raw page functions"))));
+ (errmsg("must be superuser to use raw page functions")))
+ );

- opaq->flags, GIN_META)));
+ opaq->flags, GIN_META))
+ );

- flags[nflags++] = DirectFunctionCall1(to_hex32, Int32GetDatum(flagbits));
+ flags[nflags++] = DirectFunctionCall1(to_hex32,
+ Int32GetDatum(flagbits));

- (GIN_DATA | GIN_LEAF | GIN_COMPRESSED))));
+ (GIN_DATA | GIN_LEAF | GIN_COMPRESSED)))
+ );

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2017-09-06 11:11:03 Re: additional contrib test suites
Previous Message Rafia Sabih 2017-09-06 10:44:45 Re: Effect of changing the value for PARALLEL_TUPLE_QUEUE_SIZE