Re: pageinspect: Hash index support

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pageinspect: Hash index support
Date: 2017-01-10 18:31:29
Message-ID: CA+TgmoaLV_-aw9Jx2x7E0zqwoWTT=LY9X4-_SkRia1atJ0GPWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 10, 2017 at 12:19 PM, Jesper Pedersen
<jesper(dot)pedersen(at)redhat(dot)com> wrote:
> Revised patched attached.

+ itup = (IndexTuple) PageGetItem(uargs->page, id);
+
+ MemSet(nulls, 0, sizeof(nulls));
+
+ j = 0;
+ values[j++] = UInt16GetDatum(uargs->offset);
+ values[j++] = CStringGetTextDatum(psprintf("(%u,%u)",
+
BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid)),
+ itup->t_tid.ip_posid));
+
+ ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info);
+ dlen = IndexTupleSize(itup) - IndexInfoFindDataOffset(itup->t_info);

It seems like this could be used to index off the end of the page, if
you feed it invalid data.

+ dump = palloc0(dlen * 3 + 1);

This is wasteful. Just use palloc and install a terminating NUL byte instead.

+ sprintf(dump, "%02x", *(ptr + off) & 0xff);

*(ptr + off) is normally written ptr[off].

+ if (pageopaque->hasho_flag != LH_OVERFLOW_PAGE)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("page is not an overflow page"),
+ errdetail("Expected %08x, got %08x.",
+ LH_OVERFLOW_PAGE, pageopaque->hasho_flag)));

I think this is an unnecessary test given that you've already called
verify_hash_page().

+ if (bitmappage >= metap->hashm_nmaps)
+ elog(ERROR, "invalid overflow bit number %u", ovflbitno);

I think this should be an ereport(), because it's reachable given a
bogus page which a user might construct (or a corrupted page).

+test=# SELECT * FROM hash_page_items(get_raw_page('con_hash_index', 1));
+ itemoffset | ctid | data
+------------+-----------------+-------------------------
+ 1 | (3145728,14376) | 00 c0 ca 3e 00 00 00 00
+ 2 | (3145728,14376) | 00 c0 ca 3e 00 00 00 00
+ 3 | (3407872,14376) | 00 c0 ca 3e 00 00 00 00

Won't the first 4 bytes always be a hash code and the second 4 bytes
always 0? Should we just return the hash code as an int4 or int8
instead of pretending it's a bunch of uninterpretable binary data?

+ <function>hash_bitmap_info</function> returns information about
+ the status of a bit for an overflow page in bitmap page of a
<acronym>HASH</acronym>
+ index. For example:
+<screen>
+test=# SELECT * FROM hash_bitmap_info('con_hash_index', 2050);
+ bitmapblkno | bitmapbit
+-------------+-----------
+ 65 | 1
+</screen>

I find this hard to understand. This says that it returns
information, but the nature of the returned information is unspecified
and in my opinion unclear.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-01-10 18:39:14 Re: PoC: Grouped base relation
Previous Message Tom Lane 2017-01-10 18:27:13 Re: Placement of InvokeObjectPostAlterHook calls