Re: pageinspect: Hash index support

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pageinspect: Hash index support
Date: 2017-01-29 02:09:03
Message-ID: CAE9k0Pke046HKYfuJGcCtP77NyHrun7hBV-v20a0TW4CUg4H+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> The heap tuple is on page 3407872 at line pointer offset 12584?
> That's an awfully but not implausibly big page number (about 26GB into
> the relation) and an awfully and implausibly large line pointer offset
> (how do we fit 12584 index tuples into an 8192-byte page?). Also, the
> fact that this example has two index entries with the same hash code
> pointing at the same heap TID seems wrong; wouldn't that result in
> index scans returning duplicates? I think what's actually happening
> here is that (3407872,12584) is actually the attempt to interpret some
> chunk of memory containing the text representation of a TID as an
> actual TID. When I print those numbers as hex, I get 0x343128, and
> those are the digits "4" and "1" and an opening parenthesis ")" as
> ASCII, so that might fit this theory.

I too had a similar observations when debugging hash_page_items() and
I think we were trying to represent tid as a text rather than tid
itself which was not correct. Attched patch fixes this bug. Below is
what i observed and it is somewhat similar to your explanation in the
above comment:

(gdb) p PageGetItemId(uargs->page, uargs->offset)
$2 = (ItemIdData *) 0x1d84754

(gdb) p *(ItemIdData *) 0x1d84754
$3 = {lp_off = 1664, lp_flags = 1, lp_len = 16}

(gdb) p *(IndexTuple) (page + 1664)
$4 = {t_tid = {ip_blkid = {bi_hi = 0, bi_lo = 839}, ip_posid = 137},
t_info = 16}

(gdb) p BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid))
$5 = 839

(gdb) p (itup->t_tid.ip_posid)
$6 = 137

(gdb) p CStringGetTextDatum(psprintf("(%u,%u)", 839, 137))
$7 = 30959896

(gdb) p DatumGetCString(30959896)
$8 = 0x1d86918 "4"

(gdb) p (char *)0x1d86918
$23 = 0x1d86918 "4"

>
> The code that tries to extract the hashcode from the item also looks
> suspicious. I'm not 100% sure whether it's wrong or just
> strangely-written, but it doesn't make much sense to extract the item
> contents, then using sprintf() to turn that into a hex string of
> bytes, then parse the hex string using strtol() to get an integer
> back. I think what it ought to be doing is getting a pointer to the
> index tuple and then calling _hash_get_indextuple_hashkey.
>

I think there is nothing wrong with the hashcode being shown but i do
agree that it is a bit lengthly method to find a hashcode considering
that we already have a extern function _hash_get_indextuple_hashkey()
that can be used to find the hashcode. I have corrected this in the
attached patch.

> Another minor complaint about hash_page_items is that it doesn't
> pfree(uargs->page). I'm not sure it's necessary to pfree anything
> here, but if we're going to do it I think we might as well be
> consistent with the btree case. Anyway it can hardly make sense to
> free the 8-byte structure and not the 8192-byte page to which it's
> pointing.
>

In case of btree, we are copying entire page into uargs->page.

<btreefuncs.c>
memcpy(uargs->page, BufferGetPage(buffer), BLCKSZ);
</btrrefuncs.c>

But in our case we have a raw_page and uargs->page contains pointer to
the page so no need to pfree uargs->page separately.

> In hash_bimap_info(), we go to the trouble of creating a raw page but
> never do anything with it. I guess the idea here is just to error out
> if the supplied page number is not an overflow page, but there are no
> comments explaining that. Anyway, I'm not sure it's a very good idea,
> because it means that if you try to write a query to interrogate the
> status of all the bitmap pages, it's going to read all of the overflow
> pages to which they point, which makes the operation a LOT more
> expensive. I think it would be better to leave this part out; if the
> user wants to know which pages are actually overflow pages, they can
> use hash_page_type() to figure it out. Let's make it the job of this
> function just to check the available/free status of the page in the
> bitmap, without reading the bitmap itself.

Point taken. I am now checking the status of an overflow page without
reading bitmap page.

>
> In doing that, I think it should either return (bitmapblkno,
> bitmapbit, bit) or just bit. Returning bitmapblkno but not bitmapbit
> seems strange. Also, I think it should return bit as a bool, not
> int4.
>

The new bitmap_info() now returns bitmapblkno, bitmapbit and bitstatus as bool.

> Another general note is that, in general, if you use the
> BlahGetDatum() function to construct a datum, the SQL type should be
> match the macro you picked - e.g. if the SQL type is int4, the macro
> used should be Int32GetDatum(), not UInt32GetDatum() or anything else.
> If the SQL type is bool, you should be using BoolGetDatum().

Sorry to mention but I didn't find any SQL datatype equivalent to
uint32 or uint16 in 'C'. So, currently i am using int4 for uint16 and
int8 for uint32.

> Apart from the above, I did a little work to improve the reporting
> when a page of the wrong type is verified. Updated patch with those
> changes attached.

okay. Thanks. I have done changes on top of this patch.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

Attachment Content-Type Size
0001-Add-support-for-hash-index-in-pageinspect-contrib-mo_v9.patch invalid/octet-stream 34.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Sharma 2017-01-29 02:15:21 Re: pageinspect: Hash index support
Previous Message Cat 2017-01-28 22:20:01 Re: One-shot expanded output in psql using \G