Re: pageinspect patch, for showing tuple data

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Nikolay Shaplov <n(dot)shaplov(at)postgrespro(dot)ru>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pageinspect patch, for showing tuple data
Date: 2015-09-04 05:58:29
Message-ID: CAB7nPqR2OGCVtO4wS+D6X8AND16wZHLcKa9SmwKo_YY+e95EMA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 2, 2015 at 6:58 PM, Nikolay Shaplov <n(dot)shaplov(at)postgrespro(dot)ru>
wrote:

> В письме от 3 августа 2015 15:35:23 пользователь Alvaro Herrera написал:
> > Nikolay Shaplov wrote:
> > > This patch adds several new functions, available from SQL queries. All
> > > these functions are based on heap_page_items, but accept slightly
> > > different arguments and has one additional column at the result set:
> > >
> > > heap_page_tuples - accepts relation name, and bulkno, and returns usual
> > > heap_page_items set with additional column that contain snapshot of
> tuple
> > > data area stored in bytea.
> >
> > I think the API around get_raw_page is more useful generally. You might
> > have table blocks stored in a bytea column for instance, because you
> > extracted from some remote server and inserted into a local table for
> > examination. If you only accept relname/blkno, there's no way to
> > examine data other than blocks directly in the database dir, which is
> > limiting.
> >
> > > There is also one strange function: _heap_page_items it is useless for
> > > practical purposes. As heap_page_items it accepts page data as bytea,
> but
> > > also get a relation name. It tries to apply tuple descriptor of that
> > > relation to that page data.
> >
> > For BRIN, I added something similar (brin_page_items), but it receives
> > the index OID rather than name, and constructs a tuple descriptor to
> > read the data. I think OID is better than name generally. (You can
> > cast the relation name to regclass).
> >
> > It's easy to misuse, but these functions are superuser-only, so there
> > should be no security issue at least. The possibility of a crash
> > remains real, though, so maybe we should try to come up with a way to
> > harden that.
>
> I've done as you've said: Now patch adds two functions heap_page_item_attrs
> and heap_page_item_detoast_attrs and these function takes raw_page and
> relation OID as arguments. They also have third optional parameter that
> allows
> to change error mode from ERROR to WARNING.
>
> Is it ok now?
>

Yeah, I think that's acceptable to have a switch, defaulting to ERROR if
caller specifies nothing.

Here are some observations after looking at the code, no functional testing.

+ int error_mode = PG_NARGS()==3 ? PG_GETARG_BOOL(2)
:NULL;

When handling additional arguments, it seems to me that it is more readable
to use something like that:
if (PG_NARGS >= 3)
{
arg = PG_GETARG_XXX(2);
//etc.
}

+ error_mode = error_mode?WARNING:ERROR;

This generates warnings at compilation.

+ if (SRF_IS_FIRSTCALL() && (error_mode == WARNING))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ (errmsg("Runnung heap_page_item_attrs in
WARNING mode."))));
+ }

I am not convinced that this is necessary.

+ inter_call_data->raw_page_size = VARSIZE(raw_page) -
VARHDRSZ;
+ if (inter_call_data->raw_page_size < SizeOfPageHeaderData)
Including raw_page_size in the status data does not seem necessary to me.
The page data is already available for each loop so you could just extract
it from it.

+ ereport(inter_call_data->error_level,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("Data corruption: number of
attributes in tuple header is greater than number of attributes in tuple
descripor")));
I'd rather switch the error reports related to data corruption from ereport
to elog, which is more suited for internal errors, and it seems to me that
it is the case here.

heapfuncs.c:370:7: warning: variable 'raw_attr' is used uninitialized
whenever 'if' condition is false [-Wsometimes-uninitialized]
if (!is_null)
^~~~~~~~
heapfuncs.c:419:43: note: uninitialized use occurs here
raw_attrs = accumArrayResult(raw_attrs, raw_attr, is_null,
BYTEAOID, fctx->multi_call_memory_ctx);
^~~~~~~~
heapfuncs.c:370:3: note: remove the 'if' if its condition is always true
if (!is_null)
^~~~~~~~~~~~~
heapfuncs.c:357:17: note: initialize the variable 'raw_attr' to silence
this warning
Datum raw_attr;
My compiler complains about uninitialized variables.

+--
+-- _heap_page_items()
+--
+CREATE FUNCTION _heap_page_items(IN page bytea, IN relname text,
I am not sure why this is necessary and visibly it overlaps with the
existing declaration of heap_page_items. The last argument is different
though, and I recall that we decided not to use anymore the relation name
specified as text, but its OID instead in this module.

Documentation is missing, that would be good to have to understand what
each function is intended to do. Code has some whitespaces.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2015-09-04 06:04:59 Re: BRIN INDEX value
Previous Message Atsushi Yoshida 2015-09-04 05:55:53 Re: Too many duplicated condition query return wrong value