Re: Backup/dump of huge tables and performance

From: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To: brianb-pggeneral(at)edsamail(dot)com, pgsql-general(at)postgresql(dot)org
Cc: paul(at)edsamail(dot)com
Subject: Re: Backup/dump of huge tables and performance
Date: 2000-07-28 06:47:15
Message-ID: 3.0.5.32.20000728164715.026dc640@mail.rhyme.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

At 02:26 28/07/00 GMT, brianb-pggeneral(at)edsamail(dot)com wrote:
>
>1. pg_dump or COPY take up a lot of I/O resources. That's not surprising
>considering the size of the tables, but the impact on the overall
>production system's performance is not acceptable.

Firstly, you are not using '--inserts', I hope. It is very slow for both
backup & restore. Also, do you know if pg_dump is the IO bottleneck, or the
backend?

Other than that, I'm not sure what can be done about it - the I/O has to be
done some time. Possibly causing pg_dump to (optionally) pause between
records, but that seems like a bad idea, especially with 10s of millions of
records.

Maybe someone who knows more about backend processing can suggest if using
a different kind of retrieval in the COPY command would help.

>2. I don't know how to estimate the size of the resulting dump files from
>the database files.

Not very easy, unless you have very similar data in each tuple...and in the
future pg_dump will support compression, so the size will be even harder to
estimate.

>I would very much prefer to have the backup files in little 10-20MB chunks,
>rather than one humongous dumpfile.

Maybe: pg_dump | split --bytes=10m

> I also want to be able to run the
>backup without shutting down the service that uses the database.

AFAIK, you don't need to shut it down, or are you referring to the
performance problems?

>As noted above, this particular table is no longer being used in
>transactions. I figure I could write a C program to declare a cursor and
>write it out in n-MB or m-thousand row chunks, with rest pauses in between.

You could always do a file-based backup of the database, and restore it
somewhere else, and drop the tables you don't need. Not very elegant, I
realize.

>Any better ideas? Would this be a good feature to incorporate into future
>versions of pg_dump?

I'm not sure what should be changed in pg_dump; delaying between records
seems like a bad idea since it does the dump in a single TX, and besides,
sleeping while a TX is open seems evil to me.

I think making multiple files can be done by 'split', so the real issue is
where the IO problem comes from, and how to reduce it. If pg_dump is the
source of the I/O, then I can try to address it, but if the COPY command is
the problem, that needs to be done by someone else...

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Karl Trygve Kalleberg 2000-07-28 09:54:35 Async unidirectional replication
Previous Message Tom Lane 2000-07-28 06:35:16 Re: Connection problem under extreme load.