Re: pgsql 10: hash indexes testing

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: AP <ap(at)zip(dot)com(dot)au>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgsql 10: hash indexes testing
Date: 2017-07-06 07:08:38
Message-ID: CAA4eK1+dc70wjjGUGrP5Mgxzix=ie09j=AS0a06fEKdQoi884Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 6, 2017 at 9:32 AM, AP <ap(at)zip(dot)com(dot)au> wrote:
> On Thu, Jul 06, 2017 at 08:52:03AM +0530, Amit Kapila wrote:
>> On Thu, Jul 6, 2017 at 2:40 AM, AP <ap(at)zip(dot)com(dot)au> wrote:
>> > On Wed, Jul 05, 2017 at 05:52:32PM +0530, Amit Kapila wrote:
>> >> >> > version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | live_items | dead_items | free_percent
>> >> >> > ---------+--------------+----------------+--------------+--------------+------------+------------+------------------
>> >> >> > 3 | 10485760 | 2131192 | 66 | 0 | 2975444240 | 0 | 1065.19942179026
>> >> >> > (1 row)
>> > ...
>> >> >> > And I do appear to have an odd percentage of free space. :)
>> >>
>> >> Are you worried about "unused_pages"? If so, then this is not a major
>> >
>> > Nope. "free_percent" Just a bit weird that I have it at over 1000% free. :)
>> > Shouldn't that number be < 100?
>>
>> Yes, there seems to be some gotcha in free percent calculation. Is it
>> possible for you to debug or in some way share the test?
>
> I can try to debug but I need to know what to look for and how.
>

Okay, you need to debug function pgstathashindex and have your
breakpoint at free_percent calculation, then try to get the values of
nblocks, all the values in stats struct and total_space. I think
after getting this info we can further decide what to look for.

> If it
> requires data reloads then that's around 12-15 hours per hit.
>
> As for sharing the test, that'd mean sharing the data. If it helps I can
> provide the content of that column but you're looking at an sql dump that
> is roughly (2*64+1)*2.3 billion (give or take a (few) billion) in size. :)
>

This is tricky, will Ibe able to import that column values by creating
table, if so, then probably it is worth.

>> > Well, if this is the cause of my little issue, it might be nice. ATM
>> > my import script bombs out on errors (that I've duplicated! :) It took
>> > 11 hours but it bombed) and it sounds like I'll need to do a manual
>> > VACUUM before it can be run again.
>> >
>>
>> Yeah, I think after manual vacuum you should be able to proceed.
>
>
> That's markedly different. At a rough estimate I should be able to double
> the row count before I hit the "real" limits of a hash index.
>
> When you refer to VACUUM do you mean VACUUM FULL?
>

Normal Vauum won't work for this case as you don't have dead tuples
(deleted rows in table).

> That's going to get nasty
> if that's the case as the table is almost 1TB in size.
>

Yeah, I think for this situation REINDEX will be a better option
because anyway Vacuum Full will rewrite the entire index and heap.

>> >From above stats, it is clear that you have hit the maximum number of
>> overflow pages we can support today. Now, here one can argue that we
>> should increase the limit of overflow pages in hash index which we can
>> do, but I think you can again hit such a problem after some more time.
>
> True, though I'm not far off hitting it "for real" at the present limits
> so an increase would be of benefit in other respects (ie not needing to
> have as many tables to manage because we have to bundle a set off due
> to index limits).
>
>> So at this stage, there are two possibilities for you (a) run manual
>> Vacuum in-between (b) create the index after bulk load. In general,
>> whatever I have mentioned in (b) is a better way for bulk loading.
>> Note here again the free_percent seems to be wrong.
>
> If you didn't mean VACUUM FULL then (a) does not appear to work and (b)
> would kill usability of the db during import, which would happen daily
> (though with a vastly reduced data size).
>

If the data size in subsequent import is very less, then you only need
to Create the index after first import and then let it continue like
that.

> It also messes with the
> permission model that has been set up for the least-trusted section of
> the project (at the moment that section can only INSERT).
>

As per your permission model Vacuum Full is allowed, but not Create index?

> Still, I may wind up going with (b) if a VACUUM FULL is the only other
> real choice but would prefer to avoid it. The fact that the index is
> around 300GB smaller (so far) than btree may well be worth the pain
> all by its lonesome.
>

As mentioned above REINDEX might be a better option. I think for such
situation we should have some provision to allow squeeze functionality
of hash exposed to the user, this will be less costly than REINDEX and
might serve the purpose for the user. Hey, can you try some hack in
the code which can at least tell us whether squeezing hash can give us
any benefit or is the index in such a situation that only REINDEX will
help. Basically, while doing Vacuum (non-full), you need to somehow
bypass the below line in lazy_scan_heap
lazy_scan_heap
{
..
if (vacrelstats->num_dead_tuples > 0)
..
}

I am not sure it will work, but we can try once if you are okay.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2017-07-06 07:23:10 Another comment typo in execMain.c
Previous Message Etsuro Fujita 2017-07-06 07:06:04 Oddity in error handling of constraint violation in ExecConstraints for partitioned tables