Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robins Tharakan <tharakan(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject: Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump
Date: 2018-01-26 16:38:29
Message-ID: CA+TgmoanYAWbtCq8xomaXPUUbB3S=j4RhphF_moatROp0oDbXg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 26, 2018 at 11:09 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> * Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
>> On Thu, Jan 25, 2018 at 4:52 PM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>> > My proposal is that instead of looking at three hundred lines, you'd
>> > look for 50 lines of `pg_restore -l` output -- is element XYZ in there
>> > or not. Quite a bit simpler for the guy adding a new test. This tests
>> > combinations of pg_dump switches: are we dumping the right set of
>> > objects.
>>
>> My counter-proposal is that we remove the test entirely. It looks
>> like an unmaintainable and undocumented mess to me, and I doubt
>> whether the testing value is sufficient to justify the effort of
>> updating it every time anyone wants to change something in pg_dump.
>
> Considering it turned up multiple serious bugs, particularly in the
> binary upgrade path, I can't disagree more. If you have a counter
> proposal which actually results in better test coverage, that'd be
> fantastic, but I wholly reject the notion that we should be considering
> reducing our test coverage of pg_dump.

I figured you would, but it's still my opinion. I guess my basic
objection here is to the idea that we somehow know that the 6000+ line
test case file actually contains only correct tests. That vastly
exceeds the ability of any normal human being to verify correctness,
especially given what's already been said about the interdependencies
between different parts of the file and the lack of adequate
documentation.

For example, I notice that the file contains 166 copies of
only_dump_test_schema => 1 (with 4 different variations as to how
much whitespace is included). Some of those are in the "like" clause
and some are in the "unlike" clause. If one of them were misplaced,
and pg_dump is actually producing the wrong output, the two errors
would cancel out and I suspect nobody would notice. If somebody makes
a change to pg_dump that changes which things get dumped when --schema
is used, then a lot of those lines would need to moved around. That
would be a huge nuisance. If the author of such a patch just stuck
those lines where they make the tests pass, they might fail to notice
if one of them were actually in the wrong place, and a bug would go
undiscovered. Some people probably have the mental stamina to audit
166 separate cases and make sure that each one is properly positioned,
but some people might not; and that's really just one test of many.
Really, every pg_dump change that alters behavior needs to
individually reconsider the position of every one of ~6000 lines in
this file, and nobody is really going to do that.

There's some rule that articulates what the effect of --schema is
supposed to be. I don't know the exact rule, but it's probably
something like "global objects shouldn't get dumped and schema objects
should only get dumped if they're in that schema". That rule ought to
be encoded into this file in some recognizable form. It's encoded
there, but only in the positioning of hundreds of separate lines of
code that look identical but must be positioned differently based on a
human programmer's interpretation of how that rule applies to each
object type. That's not a good way of implementing the rule; nobody
can look at this and say "oh, well I changed the rules for --schema,
so here's which things need to be updated". You're not going to be
able to look at the diff somebody produces and have any idea whether
it's right, or at least not without a lot of very time-consuming
manual verification. If you had just saved the output of pg_dump in a
file (maybe with OIDs and other variable bits stripped out) and
compared the new output against the old, it would give you just as
much code coverage but probably be a lot easier to maintain. If you
had implemented the rules programmatically instead of encoding a giant
manually precomputed data structure in the test case file it would be
a lot more clearly correct and again easier to maintain.

I think you've chosen a terrible design and ought to throw the whole
thing away and start over.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2018-01-26 16:56:09 Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump
Previous Message Tom Lane 2018-01-26 16:31:47 Re: Boolean partitions syntax