Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robins Tharakan <tharakan(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject: Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump
Date: 2018-01-26 16:56:09
Message-ID: 20180126165609.GI2416@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert,

* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> On Fri, Jan 26, 2018 at 11:09 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > * Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> >> On Thu, Jan 25, 2018 at 4:52 PM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> >> > My proposal is that instead of looking at three hundred lines, you'd
> >> > look for 50 lines of `pg_restore -l` output -- is element XYZ in there
> >> > or not. Quite a bit simpler for the guy adding a new test. This tests
> >> > combinations of pg_dump switches: are we dumping the right set of
> >> > objects.
> >>
> >> My counter-proposal is that we remove the test entirely. It looks
> >> like an unmaintainable and undocumented mess to me, and I doubt
> >> whether the testing value is sufficient to justify the effort of
> >> updating it every time anyone wants to change something in pg_dump.
> >
> > Considering it turned up multiple serious bugs, particularly in the
> > binary upgrade path, I can't disagree more. If you have a counter
> > proposal which actually results in better test coverage, that'd be
> > fantastic, but I wholly reject the notion that we should be considering
> > reducing our test coverage of pg_dump.
>
> I figured you would, but it's still my opinion. I guess my basic
> objection here is to the idea that we somehow know that the 6000+ line
> test case file actually contains only correct tests. That vastly
> exceeds the ability of any normal human being to verify correctness,
> especially given what's already been said about the interdependencies
> between different parts of the file and the lack of adequate
> documentation.

I don't mean to discount your opinion at all, but I'm quite concerned
about the code coverage of pg_dump. I certainly agree that the testing
of pg_dump can and should be improved and I'd like to find time to work
on that, but simply throwing this out strikes me as a step backwards,
not forwards, even for its difficulty.

> For example, I notice that the file contains 166 copies of
> only_dump_test_schema => 1 (with 4 different variations as to how
> much whitespace is included). Some of those are in the "like" clause
> and some are in the "unlike" clause. If one of them were misplaced,
> and pg_dump is actually producing the wrong output, the two errors
> would cancel out and I suspect nobody would notice. If somebody makes
> a change to pg_dump that changes which things get dumped when --schema
> is used, then a lot of those lines would need to moved around. That
> would be a huge nuisance. If the author of such a patch just stuck
> those lines where they make the tests pass, they might fail to notice
> if one of them were actually in the wrong place, and a bug would go
> undiscovered. Some people probably have the mental stamina to audit
> 166 separate cases and make sure that each one is properly positioned,
> but some people might not; and that's really just one test of many.
> Really, every pg_dump change that alters behavior needs to
> individually reconsider the position of every one of ~6000 lines in
> this file, and nobody is really going to do that.
>
> There's some rule that articulates what the effect of --schema is
> supposed to be. I don't know the exact rule, but it's probably
> something like "global objects shouldn't get dumped and schema objects
> should only get dumped if they're in that schema". That rule ought to
> be encoded into this file in some recognizable form. It's encoded
> there, but only in the positioning of hundreds of separate lines of
> code that look identical but must be positioned differently based on a
> human programmer's interpretation of how that rule applies to each
> object type. That's not a good way of implementing the rule; nobody
> can look at this and say "oh, well I changed the rules for --schema,
> so here's which things need to be updated". You're not going to be
> able to look at the diff somebody produces and have any idea whether
> it's right, or at least not without a lot of very time-consuming
> manual verification. If you had just saved the output of pg_dump in a
> file (maybe with OIDs and other variable bits stripped out) and
> compared the new output against the old, it would give you just as
> much code coverage but probably be a lot easier to maintain. If you
> had implemented the rules programmatically instead of encoding a giant
> manually precomputed data structure in the test case file it would be
> a lot more clearly correct and again easier to maintain.

The existing code already has some of these type of rules encoded in it,
specifically to reduce the number of like/unlike cases where it's
possible to do so- see the discussion of catch-all tests, described on
about line 282. While there may be other such rules which can be
encoded, I don't believe they'd end up being nearly as clean as you're
suggesting unless we go through and make changes to pg_dump itself to
consistently behave according to those rules. Perhaps that's a way to
go, I'll admit that I was focusing more on making sure that the
documented behavior is what pg_dump was actually doing, and so I hadn't
considered how we could simplify the testing if pg_dump was stipulated
to operate in a more specific manner.

The notion of using the TOC and then splitting up the output as
suggested by Alvaro seems like a good approach to me. Perhaps there's a
way to do that and to also encode more rules about how pg_dump is
expected to operate.

> I think you've chosen a terrible design and ought to throw the whole
> thing away and start over.

I'll all for throwing away the existing test once we've got something
that covers at least what it does (ideally more, of course).

Thanks!

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-01-26 17:09:51 Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump
Previous Message Robert Haas 2018-01-26 16:38:29 Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump