Quick Links

Re: pg_dump and thousands of schemas

From:	"Hugo <Nabble>" <hugo(dot)tech(at)gmail(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: pg_dump and thousands of schemas
Date:	2012-08-01 05:33:04
Message-ID:	1343799184312-5718532.post@n5.nabble.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-performance

Hi guys,

I just want to let you know that we have created our own solution to dump
and restore our databases. The tool was written in java and the source is
now on Github (MIT license): https://github.com/tig100/JdbcPgBackup

The main approach was to cache all database objects - schemas, tables,
indexes, etc., and instead of having postgres do joins between the
pg_catalog tables (which include thousands of schemas in pg_namespace and
millions of columns in pg_attribute), we do full table scans and then find
which schema or table an object belongs to by looking it up in a hash map in
java, based on schema and table oid's. The dump is not transactionally safe,
so it should be performed on a replica db only (WAL importing disabled), not
on a live db. Some statistics:

Dump 11,000 schemas = 3 hours.
Dump 28,000 schemas = 8 hours.

You can read more about the tool on the github page.

Best regards,
Hugo

--
View this message in context: http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766p5718532.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

In response to

Re: pg_dump and thousands of schemas at 2012-06-13 01:45:25 from Tatsuo Ishii

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Fabien COELHO	2012-08-01 08:28:09	[PATCH] Make "psql -1 < file.sql" work as with "-f"
Previous Message	Tom Lane	2012-08-01 04:42:19	Re: JSON function reference in docs

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Amit Kapila	2012-08-01 14:14:45	Re: Help me develop new commit_delay advice
Previous Message	Craig James	2012-07-31 14:33:29	Re: ZFS vs. UFS