Re: Building multiple indexes concurrently

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-performance(at)postgresql(dot)org
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Rob Wultsch <wultsch(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Building multiple indexes concurrently
Date: 2010-03-17 19:24:10
Message-ID: 201003172024.10538.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wednesday 17 March 2010 19:44:56 Greg Smith wrote:
> Rob Wultsch wrote:
> > On Wed, Mar 17, 2010 at 7:30 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> No, it's not optimistic in the least, at least not since we implemented
> >> synchronized seqscans (in 8.3 or thereabouts).
> >
> > Where can I find details about this in the documentation?
>
> It's a behind the scenes optimization so it's not really documented on
> the user side very well as far as I know; easy to forget it's even there
> as I did this morning.
> http://j-davis.com/postgresql/syncscan/syncscan.pdf is a presentation
> covering it, and http://j-davis.com/postgresql/83v82_scans.html is also
> helpful.
>
> While my pessimism on this part may have been overwrought, note the
> message interleaved on the list today with this discussion from Bob
> Lunney discussing the other issue I brought up: "When using 8-way
> parallel restore against a six-disk RAID 10 group I found that table and
> index scan performance dropped by about 10x. I/O performance was
> restored by either clustering the tables one at a time, or by dropping
> and restoring them one at a time. The only reason I can come up with
> for this behavior is file fragmentation and increased seek times." Now,
> Bob's situation may very well involve a heavy dose of table
> fragmentation from multiple active loading processes rather than index
> fragmentation, but this class of problem is common when trying to do too
> many things at the same time. I'd hate to see you chase a short-term
> optimization (reduce total index built time) at the expense of long-term
> overhead (resulting indexes are not as efficient to scan).
I find it way much easier to believe such issues exist on a tables in
constrast to indexes. The likelihood to get sequential accesses on an index is
small enough on a big table to make it unlikely to matter much.

Whats your theory to make it matter much?

Andres

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Brad Nicholson 2010-03-17 20:01:56 Re: Testing FusionIO
Previous Message Kenny Gorman 2010-03-17 19:23:53 Re: Testing FusionIO