Skip site navigation (1) Skip section navigation (2)

binary vs. txt dumps with pg

From: Dick Visser <dick(dot)visser(at)tienhuis(dot)nl>
To: pgsql-admin(at)postgresql(dot)org
Subject: binary vs. txt dumps with pg
Date: 2005-12-17 11:28:54
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-admin
Hi all

This is my first post to the pgsal-admin list.
I am doing IT support for a small company, and we have been running PG
for a few years now.

I have recently upgraded our PG-8.0.3 server to 8.1.0.
Everything went OK, and now I am working on the backup schema.

I seems that since 8.1 pg_dump does not include a switch to dump blobs
anymore. As a consequence, it is now also possible to dump db's with
blobs in txt format. This is great, only the dumps become rather large
because the blobs have to be escaped etc etc.

Since I use rysnc to a remote host for backups, I have created a
script hat dumps each db to a file, and then checks if the dump from 
yesterday is different. If not, the new dump is deleted and the old dump 
stays in place. This means a significant smaller backup, since most of 
our db's are pretty big but do not change that much.

I have been fiddling with all kinds of techniques to accomplish this.
The only way that seems to work is to make txt based dumps, and strip 
out all the comments and empty lines:

pg_dump -Fp db_name | sed 's/^--//g' > db_name.txt.dump

If no data has changed, doing this one day later will yield a dump that 
is 100% the same (as per md5sum).

Since the txt dump is really big (in my case way over 2 Gb) I decided to 
compress it as well:

pg_dump -Fp db_name | sed 's/^--//g' | gzip -n > db_name.txt.dump.gz

This yields a file of 850 Mb, and the md5sums of unchanged db's are stil 
the same.

When I try to dump to custom format, the dumps are *not* the same 
anymore, even though the data has *not* changed. Example: dumping a db 
that is not in use twice will yielf different files:

$ pg_dump -Fc test > test1.dump
$ pg_dump -Fc test > test2.dump
$ md5sum test*.dump
ec158ad86aabc7322c4b4dca58d6e4de  test1.dump
9156ce40ae5f7c54c64382e078a2bb67  test2.dump

I suspect it is because the -Fc format also contains comments etc.

Is there a way to make -Fc dumps that do not contain any comments, 
timestamps or other stuff that makes consequent dumps of the same db 
different? This would be very helpfull. The txt dumps I use now are OK 
but are really big and slow to dump/restore.

Best regards,

   *    ***     Dick Visser         TIENHUIS Networking
  **   *   *    Touwbaan 68             P: +31206843731
   *   * ***    1018 HS Amsterdam       F: +31208641420
   *   * * *    The Netherlands         M: +31622698108
   *   **  *    IP-phone (SIP)/email:  dick(at)tienhuis(dot)nl
   *   *   *    PGP-key:
   *   *   *    Webcam:
  ***   ***


pgsql-admin by date

Next:From: olive MckenzieDate: 2005-12-17 14:39:59
Subject: psql
Previous:From: Christopher BrowneDate: 2005-12-17 04:09:14
Subject: Re: Moving a database

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group