Re: pgsql 10: hash indexes testing

From: AP <ap(at)zip(dot)com(dot)au>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgsql 10: hash indexes testing
Date: 2017-07-07 02:52:18
Message-ID: 20170707025218.7jwlg4ajli46g62u@zip.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 06, 2017 at 05:19:59PM +0530, Amit Kapila wrote:
> I think if you are under development, it is always advisable to create
> indexes after initial bulk load. That way it will be faster and will
> take lesser space atleast in case of hash index.

This is a bit of a pickle, actually:
* if I do have a hash index I'll wind up with a bloated one at some stage
that refused to allow more inserts until the index is re-created
* if I don't have an index then I'll wind up with a table where I cannot
create a hash index because it has too many rows for it to handle

I'm at a bit of a loss as to how to deal with this. The DB design does come
with a kind of partitioning where a bundle of tables get put off to the side
and searched seperately as needed but too many of those and the impact on
performance can be noticed so I need to minimise them.

> >> As mentioned above REINDEX might be a better option. I think for such
> >> situation we should have some provision to allow squeeze functionality
> >> of hash exposed to the user, this will be less costly than REINDEX and
> >> might serve the purpose for the user. Hey, can you try some hack in
> >
> > Assuming it does help, would this be something one would need to guess
> > at? "I did a whole bunch of concurrent INSERT heavy transactions so I
> > guess I should do a squeeze now"?
> >
> > Or could it be figured out programmatically?
>
> I think one can refer free_percent and number of overflow pages to
> perform such a command. It won't be 100% correct, but we can make a
> guess. We can even check free space in overflow pages with page
> inspect to make it more accurate.

Does this take much time? Main reason I am asking is that this looks like
something that the db ought to handle underneath (say as part of an autovac
run) and so if there are stats that the index code can maintain that can
then be used by the autovac (or something) code to trigger a cleanup this
I think would be of benefit.

Unless I am being /so/ unusual that it's not worth it. :)

I'll reply to the rest in a separate stream as I'm still poking other
work related things atm so can't do the debug testing as yet.

AP

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2017-07-07 05:31:06 Fix header comment of streamutil.c
Previous Message Amit Langote 2017-07-07 02:21:28 Re: Multi column range partition table