Re: Very high effective_cache_size == worse performance?

From: Scott Carey <scott(at)richrelevance(dot)com>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: David Kerr <dmk(at)mr-paradox(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Very high effective_cache_size == worse performance?
Date: 2010-04-22 01:21:10
Message-ID: 95826A63-5FC1-4AB8-A867-F63EF43942CF@richrelevance.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


On Apr 20, 2010, at 12:22 PM, Scott Marlowe wrote:

> On Tue, Apr 20, 2010 at 12:47 PM, David Kerr <dmk(at)mr-paradox(dot)net> wrote:
>> On Tue, Apr 20, 2010 at 02:15:19PM -0400, Robert Haas wrote:
>> - On Tue, Apr 20, 2010 at 2:03 PM, David Kerr <dmk(at)mr-paradox(dot)net> wrote:
>> - > that thought occured to me while I was testing this. I ran a vacuumdb -z
>> - > on my database during the load and it didn't impact performance at all.
>> -
>> - The window to run ANALYZE usefully is pretty short. If you run it
>> - before the load is complete, your stats will be wrong. If you run it
>> - after the select statements that hit the table are planned, the
>> - updated stats won't arrive in time to do any good.
>>
>> right, but i'm loading 20 million records in 1000 record increments. so
>> the analyze should affect all subsequent increments, no?
>
> I keep thinking FK checks are taking a long time because they aren't
> cached because in import they went through the ring buffer in pg or
> some other way aren't in a buffer but large effective cache size says
> it's 99.99% chance or better that it's in cache, and chooses a poor
> plan to look them up. Just a guess.
>

Yeah, I was thinking the same thing.

If possible make sure the table either has no indexes and FK's or only the minimum required (PK?) while doing the load, then add the indexes and FK's later.
Whether this is possible depends on what the schema is and what must be known by the app to load the data, but if you can do it its a huge win.

Of course, if its not all in one transaction and there is any other concurrency going on that could be a bad idea. Or, if this is not a load on a fresh table but an append/update it may not be possible to drop some of the indexes first.

Generally speaking, a load on a table without an index followed by index creation is at least twice as fast, and often 5x as fast or more. This is less true if each row is an individual insert and batching or 'insert into foo values (a, b, c, ...), (a2, b2, c2, ...)' multiple row syntax is not used.

> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Haas 2010-04-22 01:21:28 Re: Replacing Cursors with Temporary Tables
Previous Message Mark Wong 2010-04-22 01:10:35 Re: [PERFORM] Dbt2 with postgres issues on CentOS-5.3‏