Re: Proposal: More flexible backup/restore via pg_dump

From: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To: Zeugswetter Andreas SB <ZeugswetterA(at)wien(dot)spardat(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: More flexible backup/restore via pg_dump
Date: 2000-06-26 14:30:30
Message-ID: 3.0.5.32.20000627003030.02573640@mail.rhyme.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At 10:17 26/06/00 +0200, Zeugswetter Andreas SB wrote:
>
>A problem I see with an index at file end is, that you will need to read the
>file twice, and that may be very undesireable if e.g the backup is on tape
>or a compressed file.

The proposal has actually come a fairly long way after extensive
discussions with Tom Lane, and I have added the current plans at the end of
this message. The TOC-at-end problem is an issue that I am trying to deal
with; I am planning a 'custom' format that has the large parts (data dumps)
compressed, to avoid the need of compressing the entire file. This means
that you would not need to uncompress the entire file to get to the TOC, or
to restore just the schema. It also allows good random access to defns and
data. I'm also considering putting the dumped data at the end of the file,
but this has issues when you want to restore table data before defining
indexes, for example.

I must admit that I've been working on the assumption that people using
PostgreSQL don't have multi-GB (compressed) database dumps, so that (in
theory) a restore can be loaded onto disk from tape before being used. I
know this is pretty evil, but it will cover 95% of users. For those people
with huge backups, they will have to suffer tapes that go backward and
forwards a bit. From the details below, you will see that this is unavoidable.

Sanity Check: does fseek work on tapes? If not, what is the correct way to
read a particular block/byte from a file on a tape?

-----------------------------------------------------------
Updated Proposal:
-------------------------

For the sake of argument, call the new utilities pg_backup and pg_restore.

pg_backup
---------

Dump schema [and data] in OID order (to try to make restores sequential,
for when tar/tape storage is used). Each dumped item has a TOC entry which
includes the OID and description, and for those items for which we know
some dependencies (functions for types & aggregates; types for tables;
superclasses for classes; - any more?), it will also dump the dependency OIDs.

Each object (table defn, table data, function defn, type defn etc) is
dumped to a separate file/thing in the output file. The TOC entries go into
a separate file/thing (probably only one file/thing for the whole TOC).

The output scheme will be encapsulated, and in the initial version will be
a custom format (since I can't see an API for tar files), and a
dump-to-a-directory format. Future use of tar, DB, PostgreSQL or even a
Make file should not be excluded in the IO design. This last goal *may* not
be achieved, but I don't see why it can't be at this stage. Hopefully
someone with appropriate skills & motivation can do a tar archive 8-}.

The result of a pg_backup should be a single file with metadata and
optional data, along with whatever dependency and extra data is available
pg_backup, or provided by the DBA.

pg_restore
----------

Reads a backup file and dumps SQL suitable for sending to psql.

Options will include:

- No Data (--no-data? -nd? -s?)
- No metadata (--no-schema? -ns? -d?)
- Specification of items to dump from an input file; this allows custom
ordering AND custom selection of multiple items. Basically, I will allow
the user to dump part of the TOC, edit it, and tell pg_restore to use the
edited partial TOC. (--item-list=<file>? -l=<file>?)
- Dump TOC (--toc-only? -c?)

[Wish List]
- Data For a single table (--table=<name>? -t=<name>)
- Defn/Data for a single OID; (--oid=<oid>? -o=<oid>?)
- User definied dependencies. Allow the DB developer to specify once for
thier DB what the dependencies are, then use that files as a guide to the
restore process. (--deps=<file> -D=<file>)

pg_restore will use the same custom IO routines to allow IO to
tar/directory/custom files. In the first pass, I will do custom file IO.

If a user selects to restore the entire metadata, then it will be dumped
according to the defaul policy (OID order). If they select to specify the
items from an input file, then the file ordering is used.

-------

Typical backup procedure:

pg_backup mydb mydb.bkp

or *maybe*

pg_backup mydb > mydb.bkp

BUT AFAIK, fseek does not work on STDOUT, and at the current time pg_backup
will use fseek.

Typical restore procedure:

pg_restore mydb mydb.bkp | psql

A user will be able to extract only the schema (-s), only the data (-d), a
specific table (-t=name), or even edit the object order and selection via:

pg_restore --dump-toc mydb.bkp > mytoc.txt
vi mytoc.txt {ie. reorder TOC elements as per known dependency problems}
pg_restore --item-list=mytoc.txt mydb.bkp | psql

FWIW, I envisage the ---dump-toc output to look like:

ID; FUNCTION FRED(INT4)
ID; TYPE MY_TYPE
ID; TABLE MY_TABLE
ID; DATA MY_TABLE
ID; INDEX MY_TABLE_IX1
...etc.

so editing and reordering the dump plan should not be too onerous.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2000-06-26 14:42:31 Re: File versioning (was: Big 7.1 open items)
Previous Message Philip Warner 2000-06-26 13:58:21 Re: physical backup of PostgreSQL