VS: Backup/dump of huge tables and performance

From: Janne Blomqvist <janne(dot)blomqvist(at)hansel(dot)fi>
To: pgsql-general(at)postgresql(dot)org
Subject: VS: Backup/dump of huge tables and performance
Date: 2000-07-31 11:11:25
Message-ID: 2A80FDF6CF54D211A97D00A0C992F5906CE81B@hanselx.hansel.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>At 10:11 28/07/00 GMT, brianb-pggeneral(at)edsamail(dot)com wrote:
>>
>>Philip Warner writes:
>>> At 02:26 28/07/00 GMT, brianb-pggeneral(at)edsamail(dot)com wrote:
>>> >1. pg_dump or COPY take up a lot of I/O resources. That's not
surprising
>>> >considering the size of the tables, but the impact on the overall
>>> >production system's performance is not acceptable.
>>>
>>> Firstly, you are not using '--inserts', I hope. It is very slow for both
>>> backup & restore. Also, do you know if pg_dump is the IO bottleneck, or
the
>>> backend?
>>
>>Nope, I don't use --inserts. The bottleneck seems to be at the backend,
>>since it also happens with COPY.
>
>I'm not sure you can conclude that, but you are probably right. If possible
>can you use whatever tools you have to see which process *is* doing the
>I/O? Just to be sure.
>
>
>>> Other than that, I'm not sure what can be done about it - the I/O has to
be
>>> done some time. Possibly causing pg_dump to (optionally) pause between
>>> records, but that seems like a bad idea, especially with 10s of millions
of
>>> records.
>>
>>> You could always do a file-based backup of the database, and restore it
>>> somewhere else, and drop the tables you don't need. Not very elegant, I
>>> realize.
>>
>>What file-based backup are you referring to? From reading previous posts,
>>I'd inferred that I couldn't just copy my /var/lib/pgsql over to another
>>server with Postgres and start postmaster.

>You probably can't/shouldn't. To make it work, I suspect you'd have to kick
>all the users off the DB, which is not going to help.

>>>
>>> I'm not sure what should be changed in pg_dump; delaying between records
>>> seems like a bad idea since it does the dump in a single TX, and
besides,
>>> sleeping while a TX is open seems evil to me.
>>
>>I suppose it's a special case, then. I just need to move the table, and
I'd
>>rather have the I/O load spread over a longer period of time to soften the
>>impact on the rest of the system.

>It might bear thinking about...especially if anyone else has an opinion on
>the options.

One option which is sometimes recommended for larger Oracle installations is
to use RAID 1 (ie. mirroring, which you should do anyway for performance and
data security reasons). Then to take a backup you
1. Split the mirror. Let's call the volumes primary and secondary (they can
of course span multiple disks). The database continues to run unaffected on
the primary volume (except for a performance hit). Thus the secondary volume
contains a snapshot of the database at the moment of the split.
2. Backup the files on the secondary volume.
3. Recombine the mirror. The RAID software will take care of updating the
secondary volume.
Of course this is less secure than a real online backup (ie. One of the
primary disks fails during the backup process. oops.) but it's *a lot*
faster than online backup and you can still be up and running all the time.

Trading House Hansel
Janne Blomqvist
janne(dot)blomqvist(at)hansel(dot)fi

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Michaël Fiey 2000-07-31 12:24:28 opaque in sql function
Previous Message Nina Kuznetsova 2000-07-31 09:35:15 gcc-version