More parallel pg_dump bogosities

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Stephen Frost <sfrost(at)snowman(dot)net>
Subject: More parallel pg_dump bogosities
Date: 2018-08-27 17:28:22
Message-ID: 19784.1535390902@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

So I started poking at the idea of sorting by size during parallel
restore instead of sorting pg_dump's TOC that way. While investigating
just where to do that, I discovered that, using the regression database
as test case, restore_toc_entries_parallel() finds these objects to
be *immediately* ready to restore at the start of the parallel phase:

all TABLE DATA objects --- as expected
all SEQUENCE SET objects --- as expected
BLOBS --- as expected
CONSTRAINT idxpart_another idxpart_another_pkey
INDEX mvtest_aa
INDEX mvtest_tm_type
INDEX mvtest_tvmm_expr
INDEX mvtest_tvmm_pred
ROW SECURITY ec1
ROW SECURITY rls_tbl
ROW SECURITY rls_tbl_force

I wasn't expecting any POST_DATA objects to be ready at this point,
so I dug into the reasons why these other ones are ready, and found
that:

idxpart_another_pkey is an index on a partitioned table (new feature
in v11). According to the dump archive, it has a dependency on the
partitioned table. Normally, repoint_table_dependencies() would change
an index's table dependency to reference the table's TABLE DATA item,
preventing it from being restored before the data is loaded. But a
partitioned table has no TABLE DATA item, so that doesn't happen.
I guess this is okay, really, but it's a bit surprising.

The other four indexes are on materialized views, which likewise don't
have TABLE DATA items. This means that when restoring materialized
views, we make their indexes before we REFRESH the matviews. I guess
that's probably functionally okay (the same thing happens in non-parallel
restores) but it's leaving some parallelism on the table, because it means
more work gets crammed into the REFRESH action. Maybe somebody would like
to fix that. I'm not volunteering right now, though.

And lastly, the ROW SECURITY items are ready because they are not marked
with any dependency at all, none, nada. This seems bad. In principle
it could mean that parallel restore would try to emit "ALTER TABLE ENABLE
ROW LEVEL SECURITY" before it's created the table :-(. I think that in
practice that can't happen today, because CREATE TABLE commands get
emitted before we've switched into parallel restore mode at all. But it's
definitely possible that ENABLE ROW LEVEL SECURITY could be emitted before
we've restored the table's data. Won't that break things?

I think this is easy enough to fix, just force a dependency on the table
to be attached to a ROW SECURITY item; but I wanted to confirm my
conclusion that we need one.

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2018-08-27 17:39:33 Re: More parallel pg_dump bogosities
Previous Message Stephen Frost 2018-08-27 15:59:44 Re: pg_dump test instability