From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | pgsql-committers(at)postgresql(dot)org |
Subject: | pgsql: Eliminate O(N^2) behavior in parallel restore with many blobs. |
Date: | 2010-12-09 18:04:55 |
Message-ID: | E1PQkrP-0003za-FV@gemulon.postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers |
Eliminate O(N^2) behavior in parallel restore with many blobs.
With hundreds of thousands of TOC entries, the repeated searches in
reduce_dependencies() become the dominant cost. Get rid of that searching
by constructing reverse-dependency lists, which we can do in O(N) time
during the fix_dependencies() preprocessing. I chose to store the reverse
dependencies as DumpId arrays for consistency with the forward-dependency
representation, and keep the previously-transient tocsByDumpId[] array
around to locate actual TOC entry structs quickly from dump IDs.
While this fixes the slow case reported by Vlad Arkhipov, there is still
a potential for O(N^2) behavior with sufficiently many tables:
fix_dependencies itself, as well as mark_create_done and
inhibit_data_for_failed_table, are doing repeated searches to deal with
table-to-table-data dependencies. Possibly this work could be extended
to deal with that, although the latter two functions are also used in
non-parallel restore where we currently don't run fix_dependencies.
Another TODO is that we fail to parallelize restore of multiple blobs
at all. This appears to require changes in the archive format to fix.
Back-patch to 9.0 where the problem was reported. 8.4 has potential issues
as well; but since it doesn't create a separate TOC entry for each blob,
it's at much less risk of having enough TOC entries to cause real problems.
Branch
------
REL9_0_STABLE
Details
-------
http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=2ffcb0cb6a5bf97de22f0ce58f55537ce1c87653
Modified Files
--------------
src/bin/pg_dump/pg_backup_archiver.c | 116 +++++++++++++++++++++------------
src/bin/pg_dump/pg_backup_archiver.h | 2 +
2 files changed, 76 insertions(+), 42 deletions(-)
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2010-12-10 07:00:42 | pgsql: Reduce spurious Hot Standby conflicts from never-visible records |
Previous Message | User Hinoue | 2010-12-09 12:41:12 | psqlodbc - psqlodbc: Change to read and skip the rest of result data |