Re: pgsql 10: hash indexes testing

From: AP <ap(at)zip(dot)com(dot)au>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgsql 10: hash indexes testing
Date: 2017-07-11 01:21:56
Message-ID: 20170711012156.aotofk65eu2itjkj@zip.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 07, 2017 at 05:58:25PM +0530, Amit Kapila wrote:
> On Fri, Jul 7, 2017 at 8:22 AM, AP <ap(at)zip(dot)com(dot)au> wrote:
> > On Thu, Jul 06, 2017 at 05:19:59PM +0530, Amit Kapila wrote:
> >> I think if you are under development, it is always advisable to create
> >> indexes after initial bulk load. That way it will be faster and will
> >> take lesser space atleast in case of hash index.
> >
> > This is a bit of a pickle, actually:
> > * if I do have a hash index I'll wind up with a bloated one at some stage
> > that refused to allow more inserts until the index is re-created
> > * if I don't have an index then I'll wind up with a table where I cannot
> > create a hash index because it has too many rows for it to handle
> >
> > I'm at a bit of a loss as to how to deal with this.
>
> I can understand your concerns. To address first concern we need to
> work on one or more of following work items: (a) work on vacuums that
> can be triggered on insert only workload (it should perform index
> vacuum as well) (b) separate utility statement/function to squeeze
> hash index (c) db internally does squeezing like after each split, so
> that chances of such a problem can be reduced, but that will be at the
> cost of performance reduction in other workloads, so not sure if it is
> advisable. Among these (b) is simplest to do but may not be
> convenient for the user.

(a) seems like a good compromise on (c) if it can be done without disruption
and in time.
(b) seems analogous to the path autovcauum took. Unless I misremember, before
autovacuum we had a cronjob to do similar work. It's probably a sane path
to take as a first step on the way to (a)
(c) may not be worth the effort if it compromises general use, though perhaps
it could be used to indicate to (a) that now is a good time to handle
this bit?

> To address your second concern, we need to speed up the creation of
> hash index which is a relatively big project. Having said that, I
> think in your case, this is one-time operation so spending once more
> time might be okay.

Yup. Primarily I just wanted the idea out there that this isn't that easy
to cope with manually and to get it onto a todo list (unless it was an
easy thing to do given a bit of thought but it appears not).

Out of curiosity, and apologies if you explained it already and I missed
the signficance of the words, how does this bloat happen? There tables
obly cop COPY. There is no UPDATE or DELETE; all transactions get COMMITted
so there's no ROLLBACK undoing the COPY and yet the bloat occurs.

AP

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2017-07-11 01:23:05 Re: BUG #14738: ALTER SERVER for foregin servers not working
Previous Message Amit Langote 2017-07-11 00:59:00 Re: New partitioning - some feedback