Re: Thoughts on unit testing?

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Thoughts on unit testing?
Date: 2017-08-13 19:19:32
Message-ID: CAH2-WzkJ4tkSN0dK32jtW-RVauoQcOX-Tafg3qO72vZW72QKdA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 10, 2017 at 2:53 PM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> The current regression tests, isolation tests and TAP tests are very
> good (though I admit my experience with TAP is limited), but IMHO we
> are lacking support for C-level unit testing. Complicated, fiddly
> things with many states, interactions, edge cases etc can be hard to
> get full test coverage on from the outside. Consider
> src/backend/utils/mmgr/freepage.c as a case in point.

It is my understanding that much of the benefit of unit testing comes
from maintainability. It's something that goes well with design by
contract. Writing unit tests is a forcing function. It requires
testable units, making the units more composable. The programmer must
be very deliberate about state, and must delineate code as testable
units. Unit tests are often supposed to be minimal, to serve as a kind
of design document.

In order to unit test something like an index access method, or
VACUUM, or the transaction manager, it would be necessary to mock the
buffer manager, and have a large amount of scaffolding code. This
would be an awful lot of work to do as an afterthought. I wouldn't
know where to draw the line between one unit and another, because the
code just wasn't written with that in mind to begin with. Is snapshot
acquisition a unit that includes heapam.c? If so, does that mean that
heapam.c has to have its own unit for other functionality, or does
that get lumped in with the snapshot acquisition unit?

As I've said a couple of times already, one thought behind amcheck was
that it would allow us to formalize the design of access methods in a
way that somewhat resembles unit testing. The code that we already
have for nbtree is extremely well commented, and is intended to
document how nbtree really works in terms of invariants. Although the
tests are very comprehensive, there really isn't all that much code,
because it's supposed to minimally describe correct states for a
B-Tree. It manages to do this well, perhaps because the Lehman & Yao
design formalizes everything as a small number of per-page atomic
operations, and is very clear about legal and illegal states.

It would be nice if amcheck eventually *guided* the development of the
heap access method. If comprehensive testing of invariants for heapam
becomes very complicated, it could be instructive to consider how to
refactor heapam to make the required verification code simpler but no
less comprehensive. Mostly, this is desirable because it will lead to
an overall simpler, more robust design, and because it lowers the
barrier to entry to assess the correctness of the code. Of course,
this requires buy-in from patch authors to work.

To put it another way, a patch author whose patch touches storage
implicitly asserts that their patch is correct. It would be useful if
they provided a precise falsifiable statement about its correctness up
front, in the form of verification code.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2017-08-13 19:39:20 Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)
Previous Message Tom Lane 2017-08-13 19:05:13 Re: Server crash (FailedAssertion) due to catcache refcount mis-handling