Re: Suppressing occasional failures in copy2 regression test

From: Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Suppressing occasional failures in copy2 regression test
Date: 2009-06-15 14:11:29
Message-ID: 65937bea0906150711sfcb499fh235707397dcd547@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jun 14, 2009 at 12:39 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Sat, Jun 13, 2009 at 2:48 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Greg Stark <greg(dot)stark(at)enterprisedb(dot)com> writes:
> >> I'm not sure about that. It seems like race conditions with autovacuum
> >> are a real potential bug that it would be nice to be testing for.
> >
> > It's not a bug; it's a limitation of our testing framework that it sees
> > this as a failure. Serious testing for autovac race conditions would
> > indeed be interesting, but you're never going to get anything meaningful
> > in that direction out of the current framework.
>
> The elephant in the room here may be moving to some more
> flexible/powerful testing framework, but the difficulty will almost
> certainly be in agreeing what it should look like. The actual writing
> of said test framework will take some work too, but to some degree
> that's a SMOP.
>
> This tuple-ordering issue seems to be one that comes up over and over
> again, but in the short term, making it a TEMP table seems like a
> reasonable fix.
>

I am forwarding a mail perl script and a pair of sample files that I
developed about an year ago. The forwarded mail text explains what the
script is trying to do. A line beginning with '?' in the expected file is
treated specially.

If a line begins with '?' then the rest of the line is treated as a regular
expression which will be used to match the corresponding line from the
actual output.

If '?' is immediately followed by the word 'unordered' all the lines till a
line containing '?/unordered' are buffered and compared against
corresponding lines from the result file ignoring the order of the result
lines.

Although we at EnterpriseDB have resolved the issues by alternate files
etc., and do not use this script, I think it might be useful for community
regression tests.

Best regards,

---------- Forwarded message ----------
From: Gurjeet Singh <gurjeet(dot)singh(at)enterprisedb(dot)com>
Date: Fri, Aug 8, 2008 at 1:45 AM
Subject: neurodiff: a new diff utility for our regression test suites

Hi All,

PFA a perl script that implements a new kind of comparison, that might
help us in situations like we have encountered with differeing plan costs in
the hints patch recently. This script implements two new kinds of
comparisons:

i) Regular Expression (RE) based comparison, and
ii) Comparison of unordered group of lines.

The input for this script, just like regular diff, are two files, one
expected output and one the actual output. The lines in the expected output
file which are expected to have any kind of variability should start with a
'?' character followed by an RE that line should match.

For example, if we wish to compare a line of EXPLAIN output, that has the
cost component too, then it might look like:

? Index Scan using accounts_i1 on accounts \(cost=\d+\.\d+\.\.\d+\.\d+
rows=\d+ width=\d+\)

The above RE would help us match any line that matches the pattern, such
as:

Index Scan using accounts_i1 on accounts (cost=0.00..8.28 rows=1
width=106)
or
Index Scan using accounts_i1 on accounts (cost=1000.9999..2000.20008
rows=10000 width=1000)

Apart from this, the SQL standard does not guarantee any order of
results unless the query has an explicit ORDER BY clause. We often encounter
cases in our result files where the output differs from the expected only in
the order of the result. To bypass this effect, and to keep the 'diff'
quiet, I have seen people invariably add an ORDER BY clause to the query,
and modify the expected file accordingly. There is a remote possibility of
the ORDER BY clause masking an issue/bug that would have otherwise shown up
in the diffs or might have caused the crash.

Using this script we can put special markers in the expected output,
that denote the boundaries of a set of lines, that are expected to be
produced in an undefined order. The script would not complain unless there's
an actual missing or extra line in the output.

Suppose that we have the following result-set to compare:

4 | JACK
5 | CATHY
2 | SCOTT
1 | KING
3 | MILLER

The expected file would look like this:

?unordered
1 | KING
2 | SCOTT
?\d \| MILLER
4 | JACK
5 | CATHY
?/unordered

This expected file will succeed for both the following variations of the
result-sets too:

5 | CATHY
4 | JACK
3 | MILLER
2 | SCOTT
1 | KING

or

1 | KING
4 | JACK
3 | MILLER
2 | SCOTT
5 | CATHY

Also, as shown in the above example, the RE based matching works for the
lines within the 'unordered' set too.

The beauty of this approach for testing pattern matches and unordered
results is that we don't have to modify the test cases in any way, just need
to make adjustments in the expected output files.

I am no perl guru, so I definitely see a lot of performance/semantic
improvements possible (perl gurus, take a stab); and maybe thats the reason
the script looks more like a C program than a whacky perl script full of
~!$^ and whatnot.

This script cannot identify hunks, like 'diff' can do; which means that
even if a single line is missing, or if there an extra line somewhere in the
result file, all the rest of the lines from both the files will show up in
the diff. But I think we do not need the hunk identification as much as we
need the features this script provides.

Some time ago I had attempted to implement these very features in
diffutils (diff et al.), but gave up too early! And then Dave's mention two
days ago about trying to remove MinGW dependencies and moving to perl
prompted me to start afresh in perl, and it was amazingly simple in perl
(but was time consuming as I am a complete newbie)!

Best regards,
--
Lets call it Postgres

EnterpriseDB http://www.enterprisedb.com

gurjeet[(dot)singh](at)EnterpriseDB(dot)com
singh(dot)gurjeet(at){ gmail | hotmail | indiatimes | yahoo }.com
Mail sent from my BlackLaptop device

Attachment Content-Type Size
neurodiff.pl application/octet-stream 6.1 KB
expected.out application/octet-stream 293 bytes
result.out application/octet-stream 241 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2009-06-15 15:31:03 Re: Suppressing occasional failures in copy2 regression test
Previous Message Tom Lane 2009-06-15 14:08:58 Re: Should mdxxx functions(e.g. mdread, mdwrite, mdsync etc) PANIC instead of ERROR when I/O failed?