Re: pageinspect and hash indexes

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pageinspect and hash indexes
Date: 2017-03-21 04:45:14
Message-ID: CAE9k0Pkcw6iw2RNRo_gmc4AOn-+JLiqZRuwxaJr_yXz=cphWWg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 20, 2017 at 6:53 PM, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
> On Mon, Mar 20, 2017 at 9:31 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> On Sat, Mar 18, 2017 at 5:13 PM, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
>>> On Sat, Mar 18, 2017 at 1:34 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>>> On Sat, Mar 18, 2017 at 12:12 AM, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
>>>>> On Fri, Mar 17, 2017 at 10:54 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>>>>>> While trying to figure out some bloating in the newly logged hash indexes,
>>>>>> I'm looking into the type of each page in the index. But I get an error:
>>>>>>
>>>>>> psql -p 9876 -c "select hash_page_type(get_raw_page('foo_index_idx',x)) from
>>>>>> generate_series(1650,1650) f(x)"
>>>>>>
>>>>>> ERROR: page is not a hash page
>>>>>> DETAIL: Expected 0000ff80, got 00000000.
>>>>>>
>>>>>> The contents of the page are:
>>>>>>
>>>>>> \xa4000000d8f203bf65c900001800f01ff01f0420...
>>>>>>
>>>>>> (where the elided characters at the end are all zero)
>>>>>>
>>>>>> What kind of page is that actually?
>>>>>
>>>>> it is basically either a newly allocated bucket page or a freed overflow page.
>>>>>
>>>>
>>>> What makes you think that it can be a newly allocated page?
>>>> Basically, we always initialize the special space of newly allocated
>>>> page, so not sure what makes you deduce that it can be newly allocated
>>>> page.
>>>
>>> I came to know this from the following experiment.
>>>
>>> I created a hash index and kept on inserting data in it till the split happens.
>>>
>>> When split happened, I could see following values for firstblock and
>>> lastblock in _hash_alloc_buckets()
>>>
>>> Breakpoint 1, _hash_alloc_buckets (rel=0x7f6ac951ee30, firstblock=34,
>>> nblocks=32) at hashpage.c:993
>>> (gdb) n
>>> (gdb) p firstblock
>>> $15 = 34
>>> (gdb) p nblocks
>>> $16 = 32
>>> (gdb) n
>>> (gdb) p lastblock
>>> $17 = 65
>>>
>>> AFAIU, this bucket split resulted in creation of new bucket pages from
>>> block number 34 to 65.
>>>
>>> The contents for metap are as follows,
>>>
>>> (gdb) p *metap
>>> $18 = {hashm_magic = 105121344, hashm_version = 2, hashm_ntuples =
>>> 2593, hashm_ffactor = 81, hashm_bsize = 8152, hashm_bmsize = 4096,
>>> hashm_bmshift = 15,
>>> hashm_maxbucket = 32, hashm_highmask = 63, hashm_lowmask = 31,
>>> hashm_ovflpoint = 6, hashm_firstfree = 0, hashm_nmaps = 1,
>>> hashm_procid = 450,
>>> hashm_spares = {0, 0, 0, 0, 0, 1, 1, 0 <repeats 25 times>},
>>> hashm_mapp = {33, 0 <repeats 127 times>}}
>>>
>>> Now, if i try to check the page type for block number 65, this is what i see,
>>>
>>> test=# select * from hash_page_type(get_raw_page('con_hash_index', 65));
>>> ERROR: page is not a hash page
>>> DETAIL: Expected 0000ff80, got 00000000.
>>> test=#
>>>
>>
>> The contents of such a page should be zero and Jeff has reported some
>> valid-looking contents of the page. If you see this page contents as
>> zero, then we can conclude what Jeff is seeing was an freed overflow
>> page.
>
> As shown in the mail thread-[1], the contents of metapage are,
>
> (gdb) p *metap
> $18 = {hashm_magic = 105121344, hashm_version = 2, hashm_ntuples
> =2593, hashm_ffactor = 81, hashm_bsize = 8152, hashm_bmsize = 4096,
> hashm_bmshift = 15, hashm_maxbucket = 32, hashm_highmask = 63,
> hashm_lowmask = 31, hashm_ovflpoint = 6, hashm_firstfree = 0,
> hashm_nmaps = 1,
> hashm_procid = 450, hashm_spares = {0, 0, 0, 0, 0, 1, 1, 0 <repeats
> 25 times>}, hashm_mapp = {33, 0 <repeats 127 times>}}
>
> postgres=# select spares from
> hash_metapage_info(get_raw_page('con_hash_index', 0));
> spares
> -------------------------------------------------------------------
> {0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}
> (1 row)
>
> Here, if you see the spare page count is just 1 which corresponds to
> bitmap page. So, that means there is no overflow page in hash index
> table and neither I have ran any DELETE statements in my experiment
> that would result in freeing an overflow page.
>
> Also, the page header content for such a page is,
>
> $9 = {pd_lsn = {xlogid = 0, xrecoff = 23638120}, pd_checksum = 0,
> pd_flags = 0, pd_lower = 24, pd_upper = 8176, pd_special = 8176,
> pd_pagesize_version = 8196, pd_prune_xid = 0, pd_linp = 0x1f3aa88}
>
> From values of pd_lower and pd_upper it is clear that it is an empty page.
>
> The content of this page is,
>
> \x00000000b0308a01000000001800f01ff01f042000.....
>
> I think it is not just happening for freed overflow but also for newly
> allocated bucket page. It would be good if we could mark freed
> overflow page as UNUSED page rather than just initialising it's header
> portion and leaving the page type in special area as it is. Attached
> is the patch '0001-mark_freed_ovflpage_as_UNUSED_pagetype.patch' that
> marks a freed overflow page as an unused page.
>
> Also, I have now changed pageinspect module to handle unused and empty
> hash index page. Attached is the patch
> (0002-allow_pageinspect_handle_UNUSED_OR_EMPTY_hash_pages.patch) for
> the same.
>
> [1] - https://www.postgresql.org/message-id/CAE9k0P%3DN%2BJjzqnHqrURE7ZQMgySRpv%3DBkjafbz%3DpeD4cbCgbhA%40mail.gmail.com
>

I think when expanding hash index table we are only initialising the
pages that will be in-use after split or the last block in the index.
The initial few pages where tuples will be moved from old to new pages
has no issues but the last block that is just initialised and is not
marked as hash page needs to be handled along with the freed overflow
page. Please let me know thoughts on this. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-03-21 05:22:33 Re: exposing wait events for non-backends (was: Tracking wait event for latches)
Previous Message Ashutosh Sharma 2017-03-21 04:41:15 comments in hash_alloc_buckets