pg_dump --load-via-partition-root vs. parallel restore

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: pg_dump --load-via-partition-root vs. parallel restore
Date: 2018-08-28 19:53:39
Message-ID: 13624.1535486019@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Parallel pg_restore generally assumes that the archive file is telling it
the truth about data dependencies; in particular, that a TABLE DATA item
naming a particular target table is loading data into exactly that table.
--load-via-partition-root creates a significant probability that that
assumption is wrong, at least in scenarios where the data really does get
redirected into other partitions than the original one. This can result
in inefficiencies (e.g., index rebuild started before a table's data is
really all loaded) or outright failures (foreign keys or RLS policies
applied before the data is all loaded). I suspect that deadlock failures
during restore are also possible, since identify_locking_dependencies
is not going to be nearly close to the truth about which operations
might hold which locks.

This could possibly be fixed by changing around the dependencies shown
in the archive file so that POST_DATA objects that're nominally dependent
on any one of a partitioned table's members are shown as dependent on all
of them. I'm not particularly eager to write that patch though.

For the moment I'm inclined to just document the problem, e.g. "It's
recommended that parallel restore not be used with archives generated
with this option."

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2018-08-28 20:01:21 Re: Would it be possible to have parallel archiving?
Previous Message Asim R P 2018-08-28 19:34:30 Re: Catalog corruption