Re: Proposal: More flexible backup/restore via pg_dump

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
Cc: Zeugswetter Andreas SB <ZeugswetterA(at)wien(dot)spardat(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: More flexible backup/restore via pg_dump
Date: 2000-10-11 00:49:52
Message-ID: 200010110049.UAA18367@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Can I ask on a status?

> At 10:17 26/06/00 +0200, Zeugswetter Andreas SB wrote:
> >
> >A problem I see with an index at file end is, that you will need to read the
> >file twice, and that may be very undesireable if e.g the backup is on tape
> >or a compressed file.
>
> The proposal has actually come a fairly long way after extensive
> discussions with Tom Lane, and I have added the current plans at the end of
> this message. The TOC-at-end problem is an issue that I am trying to deal
> with; I am planning a 'custom' format that has the large parts (data dumps)
> compressed, to avoid the need of compressing the entire file. This means
> that you would not need to uncompress the entire file to get to the TOC, or
> to restore just the schema. It also allows good random access to defns and
> data. I'm also considering putting the dumped data at the end of the file,
> but this has issues when you want to restore table data before defining
> indexes, for example.
>
> I must admit that I've been working on the assumption that people using
> PostgreSQL don't have multi-GB (compressed) database dumps, so that (in
> theory) a restore can be loaded onto disk from tape before being used. I
> know this is pretty evil, but it will cover 95% of users. For those people
> with huge backups, they will have to suffer tapes that go backward and
> forwards a bit. From the details below, you will see that this is unavoidable.
>
> Sanity Check: does fseek work on tapes? If not, what is the correct way to
> read a particular block/byte from a file on a tape?
>
> -----------------------------------------------------------
> Updated Proposal:
> -------------------------
>
> For the sake of argument, call the new utilities pg_backup and pg_restore.
>
> pg_backup
> ---------
>
> Dump schema [and data] in OID order (to try to make restores sequential,
> for when tar/tape storage is used). Each dumped item has a TOC entry which
> includes the OID and description, and for those items for which we know
> some dependencies (functions for types & aggregates; types for tables;
> superclasses for classes; - any more?), it will also dump the dependency OIDs.
>
> Each object (table defn, table data, function defn, type defn etc) is
> dumped to a separate file/thing in the output file. The TOC entries go into
> a separate file/thing (probably only one file/thing for the whole TOC).
>
> The output scheme will be encapsulated, and in the initial version will be
> a custom format (since I can't see an API for tar files), and a
> dump-to-a-directory format. Future use of tar, DB, PostgreSQL or even a
> Make file should not be excluded in the IO design. This last goal *may* not
> be achieved, but I don't see why it can't be at this stage. Hopefully
> someone with appropriate skills & motivation can do a tar archive 8-}.
>
> The result of a pg_backup should be a single file with metadata and
> optional data, along with whatever dependency and extra data is available
> pg_backup, or provided by the DBA.
>
>
> pg_restore
> ----------
>
> Reads a backup file and dumps SQL suitable for sending to psql.
>
> Options will include:
>
> - No Data (--no-data? -nd? -s?)
> - No metadata (--no-schema? -ns? -d?)
> - Specification of items to dump from an input file; this allows custom
> ordering AND custom selection of multiple items. Basically, I will allow
> the user to dump part of the TOC, edit it, and tell pg_restore to use the
> edited partial TOC. (--item-list=<file>? -l=<file>?)
> - Dump TOC (--toc-only? -c?)
>
> [Wish List]
> - Data For a single table (--table=<name>? -t=<name>)
> - Defn/Data for a single OID; (--oid=<oid>? -o=<oid>?)
> - User definied dependencies. Allow the DB developer to specify once for
> thier DB what the dependencies are, then use that files as a guide to the
> restore process. (--deps=<file> -D=<file>)
>
> pg_restore will use the same custom IO routines to allow IO to
> tar/directory/custom files. In the first pass, I will do custom file IO.
>
> If a user selects to restore the entire metadata, then it will be dumped
> according to the defaul policy (OID order). If they select to specify the
> items from an input file, then the file ordering is used.
>
>
> -------
>
> Typical backup procedure:
>
> pg_backup mydb mydb.bkp
>
> or *maybe*
>
> pg_backup mydb > mydb.bkp
>
> BUT AFAIK, fseek does not work on STDOUT, and at the current time pg_backup
> will use fseek.
>
>
> Typical restore procedure:
>
> pg_restore mydb mydb.bkp | psql
>
> A user will be able to extract only the schema (-s), only the data (-d), a
> specific table (-t=name), or even edit the object order and selection via:
>
> pg_restore --dump-toc mydb.bkp > mytoc.txt
> vi mytoc.txt {ie. reorder TOC elements as per known dependency problems}
> pg_restore --item-list=mytoc.txt mydb.bkp | psql
>
> FWIW, I envisage the ---dump-toc output to look like:
>
> ID; FUNCTION FRED(INT4)
> ID; TYPE MY_TYPE
> ID; TABLE MY_TABLE
> ID; DATA MY_TABLE
> ID; INDEX MY_TABLE_IX1
> ...etc.
>
> so editing and reordering the dump plan should not be too onerous.
>
>
> ----------------------------------------------------------------
> Philip Warner | __---_____
> Albatross Consulting Pty. Ltd. |----/ - \
> (A.C.N. 008 659 498) | /(@) ______---_
> Tel: (+61) 0500 83 82 81 | _________ \
> Fax: (+61) 0500 83 82 82 | ___________ |
> Http://www.rhyme.com.au | / \|
> | --________--
> PGP key available upon request, | /
> and from pgp5.ai.mit.edu:11371 |/
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2000-10-11 00:56:27 Re: 7.0.2 on Solaris
Previous Message Bruce Momjian 2000-10-11 00:48:36 Re: Re: [HACKERS] My new job