Re: Improving test coverage of extensions with pg_dump

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Euler Taveira <euler(at)timbira(dot)com(dot)br>, Andreas Karlsson <andreas(at)proxel(dot)se>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improving test coverage of extensions with pg_dump
Date: 2015-09-29 12:39:07
Message-ID: CAB7nPqRz9UcQJjUM_gXOGMt1dkqHp09GB=N_brLPdjMUg-_SGw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Sep 26, 2015 at 10:22 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2015-09-26 21:54:46 +0900, Michael Paquier wrote:
>> On Wed, Sep 23, 2015 at 1:04 AM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
>> wrote:
>> > We discussed this in some other thread, not long ago. I looked briefly
>> > in the archives but couldn't find it. I think the conclusion was
>> > something along the lines of "hmm, tough".
>
>> That's hours, even days of fun ahead for sure.
>
> To me that's a somewhat serious bug, and something that we probably need
> to address at some point.

Well, addressing it at the root of the problem is... Let's say...
Really tough. I have put my head on this stuff for a couple of hours
and tried to draw up a couple of ways to fix this. First, even if
pg_dump relies heavily on the assumption that attributes need to be
ordered by attnum, I thought that it would have been possible to use
attinhcount to order the columns in the same way when taking the dump
from a source db and a target (where dump of source has been restored
to). This would work if there is no more than 1 level of child
relations, but with grand-child relations the order gets messed up
again.

Locating a fix on the backend side would make things for pg_dump
easier, an idea would be to simply reorder the attribute numbers when
a column is added to a parent table. For example with something like
that:
CREATE TABLE test_parent (c1 integer, c2 integer);
CREATE TABLE test_child_1 (c3 integer) inherits (test_parent);
CREATE TABLE test_child_2 (c4 integer) inherits (test_child_1);
ALTER TABLE test_parent ADD COLUMN c5 integer;
We get the following attribute ordering:
=# SELECT c.relname, a.attname, a.attnum
FROM pg_attribute a
JOIN pg_class c ON (a.attrelid = c.oid)
WHERE a.attrelid IN ('test_parent'::regclass,
'test_child_1'::regclass,
'test_child_2'::regclass)
AND a.attnum > 0 ORDER BY c.relname, a.attnum;
relname | attname | attnum
--------------+---------+--------
test_child_1 | c1 | 1
test_child_1 | c2 | 2
test_child_1 | c3 | 3
test_child_1 | c5 | 4
test_child_2 | c1 | 1
test_child_2 | c2 | 2
test_child_2 | c3 | 3
test_child_2 | c4 | 4
test_child_2 | c5 | 5
test_parent | c1 | 1
test_parent | c2 | 2
test_parent | c5 | 3
(12 rows)

Now, what we could do on a child relation when a new attribute in its
parent relation is to increment the attnum of the existing columns
with attinhcount = 0 by 1, and insert the new column at the end of the
existing ones where attinhcount > 0, to give something like that:
relname | attname | attnum
--------------+---------+--------
test_child_1 | c1 | 1
test_child_1 | c2 | 2
test_child_1 | c5 | 3
test_child_1 | c3 | 4
test_child_2 | c1 | 1
test_child_2 | c2 | 2
test_child_2 | c3 | 3
test_child_2 | c5 | 4
test_child_2 | c4 | 5
test_parent | c1 | 1
test_parent | c2 | 2
test_parent | c5 | 3
(12 rows)
Looking at tablecmds.c, this could be intuitively done as a part of
ATExecAddColumn by scanning the attributes of the child relation from
the end. But it is of course not that simple, a lot of code paths rely
on the attnum of the current attributes. One is CookedConstraint, but
that's a can of worms for back branches.

Another bandaid solution, that would be less invasive for a backpatch
is to reproduce the sequence of DDL commands within the dump itself:
we would need to dump attinhcount in getTableAttrs and use it to guess
what are the columns on the child relations that have been added on a
parent relation after the children have been created. This would not
solve the case of data-only dumps containing INSERT queries that have
no column names on databases restored from past schema dumps though.

Perhaps you did not look at the last patch I sent on this thread, but
I changed it so as a schedule is used with a call to pg_regress.
That's a more scalable approach as you were concerned about as we can
plug in more easily new modules and new regression tests without
modifying the perl script used to kick the tests, though it does not
run the main regression test suite and it actually cannot run it,
except with a modified schedule or with a filter of the child-parent
tables. Patch is attached for consideration, which looks like a good
base for future support, feel free to disagree :)
Thanks,
--
Michael

Attachment Content-Type Size
20150929_pgdump_restore_test.patch text/x-patch 9.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2015-09-29 13:59:59 Re: track_commit_timestamp and COMMIT PREPARED
Previous Message Kouhei Kaigai 2015-09-29 12:38:54 Re: Foreign join pushdown vs EvalPlanQual