Re: Building multiple indexes concurrently

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Rob Wultsch <wultsch(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Building multiple indexes concurrently
Date: 2010-03-17 10:11:29
Message-ID: 4BA0AAD1.5040508@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Rob Wultsch wrote:
> Are there any particular performance optimizations that would be in
> play in such a scenario?
>

You'd want to increase maintenance_work_mem significantly, just for the
sessions that are running these. Something like this:

|SET maintenance_work_mem = '1GB';|

I don't know if that's a huge or tiny number relative to total RAM in
your server, you get the idea though.

Also, you should have a larger than default value for
checkpoint_segments in advance of this. That you can't set per session,
but you can adjust the value in the postgresql.conf and request a
configuration reload--don't actually need to disrupt server operation by
restarting to do it. This will work for that:

pg_ctl reload

> At a minimum I assume that if both of the commands were started at
> about the same time they would each scan the table in the same
> direction and whichever creation was slower would benefit from most of
> the table data it needed being prepopulated in shared buffers. Is this
> the case?
>

This might be optimistic; whether it will be the case depends a lot on
how large your shared_buffers and OS buffer cache are relative to the
table involved. To pick an extreme example to demonstrate what I mean,
if shared_buffers is the common default of <32MB, your table is 1TB, and
you have a giant disk array that reads fast, it's not very likely that
the second scan is going to find anything of interest left behind by the
first one. You could try and make some rough estimates of how long it
will take to fill your RAM with table data at the expected I/O rate and
guess how likely overlap is.

There's a trade-off here, which is that in return for making it possible
the data you need to rebuild the index is more likely to be in RAM when
you need it by building two at once, the resulting indexes are likely to
end up interleaved on disk as they are written out. If you're doing a
lot of index scans, the increased seek penalties for that may ultimately
make you regret having combined the two. Really impossible to predict
which approach is going to be better long term without gathering so much
data that you might as well try and benchmark it on a test system
instead if you can instead. I am not a big fan of presuming one can
predict performance instead of measuring it for complicated cases.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Devrim GÜNDÜZ 2010-03-17 12:30:58 Re: Testing FusionIO
Previous Message Greg Stark 2010-03-17 09:52:13 Re: Block at a time ...