Skip site navigation (1) Skip section navigation (2)

Re: pg_dump & performance degradation

From: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Don Baccus <dhogaza(at)pacifier(dot)com>, pgsql-hackers(at)postgresql(dot)org, brianb-pggeneral(at)edsamail(dot)com
Subject: Re: pg_dump & performance degradation
Date: 2000-07-31 03:20:15
Message-ID: 3.0.5.32.20000731132015.0246e100@mail.rhyme.com.au (view raw or flat)
Thread:
Lists: pgsql-generalpgsql-hackers
At 11:34 29/07/00 -0400, Tom Lane wrote:
>
>I think Philip's idea of adding some delays into pg_dump is a reasonable
>answer.  I'm just recommending a KISS approach to implementing the
>delay, in the absence of evidence that a more complex mechanism will
>actually buy anything...
>

The results of some experiments:

Unfortunately I have been unable to devise a proper test of the 'fixed'
sleep method: the COPY loop is sufficiently tight so that on a fast machine
even a small delay inside *each* iteration makes for a very slow process
(on my machine, I think the smallest allowed delay is 10ms, and the COPY
loop runs in about 100-300 usec intervals while the COPY buffer is being
dumped). As a result I have had to activate the sleep code only when the
time since it last sleep is > 100ms. Which means that the 'throttle time'
specified by the user is effectively a ratio anyway, and I have implemented
it as such.

Basically, the user can specify a number of ms to wait per second of
'running'. This ratio is checked on each iteration, and when the amount of
time to sleep exceeds 100ms, the sleep call is made, and the timer is reset.

eg. 

    pg_dump dbname -T1000 --tab=big-table > /dev/null

will rest for an average of 1 second for each second running (during COPY);
the actual 'sleeps' will occur every 100ms or so, and last for 100ms.


    pg_dump dbname -T30000 --tab=big-table > /dev/null

will rest for an average of 30 seconds for each second running (during
COPY); the actual 'sleeps' will occur every 3ms or so and last for 100ms

    pg_dump dbname -T500 --tab=big-table > /dev/null

will rest for an average of 0.5 seconds for each second running (during
COPY); the actual 'sleeps' will occur every 200ms or so and last for 100ms.

etc.

This is actually more complex that I had hoped (originally I planned to
just do a simple ratio), but experimentation of a *very* slow mashine (P90)
and a fast-ish one (PIII 550) showed that a range of values to achieve 50%
CPU utilization (on an unloaded machine) varied from as high as 30:1 down
to 0.3:1 (the CPU boost from IO on a P90 is enormous - postmaster runs at
90% CPU until a ratio of about 15:1).

These times are obviously highly subjective, and the only real conclusion I
can draw from them are:

- It's too hard to predict (as Tom suggested)
- It's important to allow for a very tight loop in coding the 'sleep' code.

This is disappointing in the sense that I had hoped to get a
one-number-suits-all-tables-at-a-given-time-on-a-given-machine solution,
but experiments with tables with lots of columns and tables with large
toasted values reveals quite a wide variation even with this model.

Unless someone has a further suggestion, I'll just clean it up and submit
it...


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|
                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

In response to

pgsql-hackers by date

Next:From: Denis PerchineDate: 2000-07-31 04:53:32
Subject: Re: Problem with updating system indices.
Previous:From: Tom LaneDate: 2000-07-31 01:35:59
Subject: gram.y now producing warnings?

pgsql-general by date

Next:From: Thomas LockhartDate: 2000-07-31 05:26:50
Subject: Re: [HACKERS] Hmm ... shouldn't path_distance be MIN distance not MAX distance?
Previous:From: KMillerDate: 2000-07-30 22:47:00
Subject: Is this a bug or am I missing something?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group