Skip site navigation (1) Skip section navigation (2)

Re: pg_dumpall affecting performance

From: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
To: Mark Mikulec <mark(at)anticentertainment(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: pg_dumpall affecting performance
Date: 2011-02-15 18:56:44
Message-ID: 4D5ACC6C.50803@pinpointresearch.com (view raw or flat)
Thread:
Lists: pgsql-performance
On 02/15/2011 10:41 AM, Mark Mikulec wrote:
> Hello,
>
> I was under the impression that pg_dumpall didn't affect database
> performance when dumping while the db is live. However I have evidence to
> the contrary now - queries that are run during the pg_dumpall time take 10
> to a 100 times longer to execute than normal while pg_dumpall is running.
> The strange thing is that this started after my database grew by about 25%
> after a large influx of data due to user load. I'm wonder if there is a
> tipping
> point or a config setting I need to change now that the db is larger that
> is
> causing all this to happen.
>
Don't know where that impression came from. It is true that you can 
continue to *use* your database normally while running a dump but you 
are reading the entire database and either transmitting it over the 
network or writing it to a local drive so it shouldn't be surprising 
that performance is impacted.

There are tipping points - one big one is when you move from having all 
your data in RAM to needing to read disk. And it can be a whopper. If 
all your information, through PG or OS caching is in RAM then your dumps 
may run very quickly. The moment you cross the point that things don't 
quite fit you can see a sharp decline.

Consider a least-recently-used algorithm and a very simplistic scenario. 
You read the "start" data. It isn't cached so you go to disk *and* you 
put those blocks into cache pushing others than you would need later out 
of cache. This continues and you potentially end up having to read 
everything from disk plus incur the overhead of checking and updating 
the cache. Meanwhile, the data you needed for your query may have been 
pushed out of cache so there is more contention for disk.

Admittedly an over-simplified example but you see the problem.

Cheers,
Steve


In response to

pgsql-performance by date

Next:From: Strange, John WDate: 2011-02-15 19:08:18
Subject: Re: high user cpu, massive SELECTs, no io waiting problem
Previous:From: Plugge, Joe R.Date: 2011-02-15 18:45:34
Subject: Re: pg_dumpall affecting performance

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group