Database file copy

From: Srini Raghavan <sixersrini(at)yahoo(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Database file copy
Date: 2010-12-23 00:35:12
Message-ID: 655913.69697.qm@web80807.mail.mud.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

[Tried the general forum, didn't hear from anyone so far, trying this forum now,
please review, thanks]

We are looking to distribute postgres databases to our customers along with our
application. We are currently evaluating postgres version 8.4.4. The database
can be of size 25 gb (compressed files fits in few dvds, the product is
distributed on dvds). The pg_restore of this database takes several hours on the
low end machines running windows os. The pg_restore is run during our product
install, and the current install time projection is not acceptable. Our
customers can purchase different databases over a period of time, and the
application makes transactional updates to the databases after installation.
Hence, copying the entire data folder instead of using the pg_restore is not an
option, as the transactional updates will be lost.

I have read the documentation and the few posts available that discourages file
copy based restore of individual databases, but, I have found a way to do this.
I would appreciate if the experts can read and advise if the approach will work,
given our environment and usage boundaries.

Master Postgres instance (this is where we create the data, we have complete
control of this environment):
1. Create the database and populate data.
2. Set vacuum_freeze_table_age to 0 in the postgresql.conf
3. Run vacuum full - this will reset the row xid to the FrozenXid
4. Shutdown postgres and take a copy of the files for the given database.

In the deploy instance at the customer site:
1. Create the new database.
2. Shutdown postgres instance and copy the database files created in the master
instance to the database specific folder.
3. Start postgres instance.

We don't use table row oids. If the cluster wide oid collides with the oid in
the copied database files during subsequent ddl operations, postgres resolves
this by skipping to the next available oid. There will be a delay to find the
next available oid, which is acceptable in our case, as the ddl operations at
the customer site are rare.  And, the vacuum full with vacuum_freeze_table_age
set to 0 on the master instance takes care of the xmin, allowing transactions to
be visible, and for further transactions at the customer site to continue
without colliding.

I have tested this and it works, and I am continuing to test it more. I would
like for validation of this idea from the experts and the community to make sure
I haven't overlooked something obvious that might cause issues.

Thank you,
Srini

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2010-12-23 01:04:29 Re: knngist - 0.8
Previous Message Merlin Moncure 2010-12-22 23:13:18 Re: How much do the hint bits help?