Re: More parallel pg_dump bogosities

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: More parallel pg_dump bogosities
Date: 2018-08-27 17:39:33
Message-ID: 20180827173932.GY3326@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> So I started poking at the idea of sorting by size during parallel
> restore instead of sorting pg_dump's TOC that way. While investigating
> just where to do that, I discovered that, using the regression database
> as test case, restore_toc_entries_parallel() finds these objects to
> be *immediately* ready to restore at the start of the parallel phase:
>
> all TABLE DATA objects --- as expected
> all SEQUENCE SET objects --- as expected
> BLOBS --- as expected
> CONSTRAINT idxpart_another idxpart_another_pkey
> INDEX mvtest_aa
> INDEX mvtest_tm_type
> INDEX mvtest_tvmm_expr
> INDEX mvtest_tvmm_pred
> ROW SECURITY ec1
> ROW SECURITY rls_tbl
> ROW SECURITY rls_tbl_force
>
> I wasn't expecting any POST_DATA objects to be ready at this point,
> so I dug into the reasons why these other ones are ready, and found
> that:
>
> idxpart_another_pkey is an index on a partitioned table (new feature
> in v11). According to the dump archive, it has a dependency on the
> partitioned table. Normally, repoint_table_dependencies() would change
> an index's table dependency to reference the table's TABLE DATA item,
> preventing it from being restored before the data is loaded. But a
> partitioned table has no TABLE DATA item, so that doesn't happen.
> I guess this is okay, really, but it's a bit surprising.
>
> The other four indexes are on materialized views, which likewise don't
> have TABLE DATA items. This means that when restoring materialized
> views, we make their indexes before we REFRESH the matviews. I guess
> that's probably functionally okay (the same thing happens in non-parallel
> restores) but it's leaving some parallelism on the table, because it means
> more work gets crammed into the REFRESH action. Maybe somebody would like
> to fix that. I'm not volunteering right now, though.

Hrmpf, I agree, that certainly doesn't seem ideal.

> And lastly, the ROW SECURITY items are ready because they are not marked
> with any dependency at all, none, nada. This seems bad. In principle
> it could mean that parallel restore would try to emit "ALTER TABLE ENABLE
> ROW LEVEL SECURITY" before it's created the table :-(. I think that in
> practice that can't happen today, because CREATE TABLE commands get
> emitted before we've switched into parallel restore mode at all. But it's
> definitely possible that ENABLE ROW LEVEL SECURITY could be emitted before
> we've restored the table's data. Won't that break things?

We certainly wouldn't want the ROW SECURITY items to be emitted before
the table itself, that wouldn't work. Emitting ENABLE RLS before
restoring the table's data shouldn't actually break anything though as
the owner of the table (which is who we're restoring the data as,
at that point, right?) will bypass RLS. Now, if what's actually set is
FORCE RLS, then we may have an issue if that is emitted before the table
data since that would cause RLS to be in effect even if we're the owner
of the table.

> I think this is easy enough to fix, just force a dependency on the table
> to be attached to a ROW SECURITY item; but I wanted to confirm my
> conclusion that we need one.

I do think we should have one. In an ideal world, I think we'd want:

CREATE TABLE
TABLE DATA
ENABLE RLS
ADD RLS POLICIES
GRANT RIGHTS

Though, ultimately, those last two could be flipped since RLS has a
'default deny' policy if no policies exist on the table but RLS is
enabled. We really shouldn't issue GRANT statements before ENABLE'ing
RLS though, since that might allow someone to access all the rows in the
table when they really should be limited to only some subset. This all
only applies in the cases where we aren't running the restore in a
single transaction, of course, but that's implied by talking about
paralell restore. I'm not sure about how much fun it would be to make
that all work that way though. I do think we need the ENABLE RLS bits
to depend on the table to exist though.

Thanks!

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-08-27 17:58:11 Re: More parallel pg_dump bogosities
Previous Message Tom Lane 2018-08-27 17:28:22 More parallel pg_dump bogosities