Re: Fuzz testing COPY FROM parsing

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fuzz testing COPY FROM parsing
Date: 2021-02-05 20:06:46
Message-ID: 20210205200646.GS27507@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Heikki Linnakangas (hlinnaka(at)iki(dot)fi) wrote:
> On 05/02/2021 21:16, Andrew Dunstan wrote:
> >On 2/5/21 10:54 AM, Stephen Frost wrote:
> >>* Heikki Linnakangas (hlinnaka(at)iki(dot)fi) wrote:
> >>>I ran it for about 2 h on my laptop with the patch I was working on [2]. It
> >>>didn't find any crashes, but it generated about 1300 input files that it
> >>>considered "interesting" based on code coverage analysis. When I took those
> >>>generated inputs, and ran them against unpatched and patched server, some
> >>>inputs produced different results. So that revealed a couple of bugs in the
> >>>patch. (I'll post a fixed patched version on that thread soon.)
> >>>
> >>>I hope others find this useful, too.
> >>Nice! I wonder if there's a way to have a buildfarm member or other
> >>system doing this automatically on new commits and perhaps adding
> >>coverage for other things like the JSON code..
> >
> >Not easily in the buildfarm as it is today. We can easily create modules
> >for extensions and other things that don't require modification of core
> >code, but things that require patching core code are a whole different
> >story.
>
> It might be possible to call the fuzzer's HF_ITER() function from a C
> extension instead. So you would run a query like "SELECT next_fuzz_iter()"
> in a loop, and next_fuzz_iter() would be a C function that calls HF_ITER(),
> and executes the actual query with SPI.

I wonder how much we could fuzz with that approach...

> That said, I don't think it's important to run the fuzzer in the buildfarm.
> It should be enough to do that every once in a while, when you modify the
> COPY FROM code (or something else that you want to fuzz test). But we could
> easily include the test inputs generated by the fuzzer in the regular tests.
> We've usually been very frugal in adding tests, though, to keep the time it
> takes to run all the tests short.

If we could be sure that everyone who might ever modify the COPY FROM or
JSON parser or other code that we arrange to get fuzz testing on with
this approach, that would be great, but I wouldn't make a bet on that
happening, which is why having it done (however it's done) in an
automated fashion would be good. Also, doing it on the buildfarm, or
using a CI tool, means we can allow it to run longer since it won't be
directly impacting developers. I'd love to see us do more of that in
general. It's great that we have good regression tests that can be run
fast and catch some things, but it seems likely that there'll always be
things that just take longer to test and having that done in an
automated fashion essentially 'in the background' would be great, so we
can get reports back and fix anything they find before release.

Thanks,

Stephen

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-02-05 20:07:16 First-draft release notes for back branches are up
Previous Message Peter Geoghegan 2021-02-05 20:02:34 Re: New IndexAM API controlling index vacuum strategies