Re: tuplesort test coverage

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tuplesort test coverage
Date: 2019-10-24 18:10:34
Message-ID: 20191024181034.b44nr5lugqvfcev2@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-10-15 13:05:32 +0100, Peter Geoghegan wrote:
> > - aborted abbreviated keys (lines 1522, 1608, 1774, 3620, 3739, 3867, 4266)
>
> Also hard to test -- there was a bug here when abbreviated keys first
> went in -- that was detected by amcheck.
>
> All of the places where we abort are essentially the same, though.

Why is it that hard? Seems fairly easy to create cases that reliably
abort.

I really don't think it's ok to have as many abbrev abort related paths
without any coverage - the relevant code isn't that trivial. And
something like amcheck really doesn't strike me as sufficient. For one,
it doesn't provide any coverage either. For another, plenty sorts don't
end up in a form that amcheck sees.

Tests aren't just there to verify that the current behaviour isn't
broken, they're also there to allow to develop with some confidence. And
I don't think tuplesort as is really allows that, and e.g. abbreviated
keys made that substantially worse. That happens, but I think it'd be
good if you could help improving the situation.

E.g.
SELECT * FROM (SELECT ('00000000-0000-0000-0000-'||to_char(g.i, '000000000000FM'))::uuid uuid FROM generate_series(15000, 0, -1) g(i)) d ORDER BY uuid
reliably triggers abbreviated keys, and it looks to me like that should
be portable. With a few tweaks it'd be fairly easy to use that to
provide some OK coverage for most the abbrev key cases.

> > - in memory backwards scans (lines 1936, 3042) - *any* coverage for
> > TSS_SORTEDONTAPE (line 1964)
>
> That used to exist, but it went away when we killed replacement selection sort.

Yes, that's kind of my point? Either that patch reduced coverage, or it
created dead code. Neither is good.

> > - mergeruns without abbrev key (line 2582)
>
> mergeruns() doesn't use abbreviated keys -- this code disables their
> use in the standard way.

Well, then reformulate the point that we should have coverage for
mergeruns() when initially abbreviated keys were set up.

> > - disk sorts with more than one run (lines 2707, 2789)
> > - any disk based tuplesort_begin_heap() (lines 3649, 3676)
>
> I had to convince Tom to get the coverage of external sorts we have
> now. Apparently there are buildfarm animals that are very sensitive to
> that cost, that could have substantially increased test runtimes were
> we to do more. Perhaps this could be revisited.

Hm. I'm a bit confused. Isn't all that's required to set a tiny amount
of work_mem? Then it's easy to trigger many passes without a lot of IO?

> > I'm pretty unhappy that tuplesort has been whacked around pretty heavily
> > in the last few years, while *reducing* effective test coverage
> > noticeably, rather than increasing it.
>
> I don't think that that's true, on balance. There are only 1,000 extra
> lines of code in tuplesort.c in master compared to 9.4, even though we
> added parallel sorts and abbreviated keys, two huge enhancements. In
> many ways, tuplesort is now simpler than ever.

I'm not saying that tuplesort has gotten worse or anything. Just that
there's been too much development without adding tests.

> > There's pretty substantial and
> > nontrivial areas without any tests - do we have actually have any
> > confidence that they work?
>
> Everything that you're talking about has existed since v11 came out a
> year ago, and most of it is a year or two older than that. So yeah,
> I'm pretty confident that it works.

That's may be true, but there's also basically no way to discover bugs
except manual testing, and users encountering the bugs. That's not good
enough.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message James Coleman 2019-10-24 18:38:33 Re: Consider low startup cost in add_partial_path
Previous Message Tom Lane 2019-10-24 17:06:54 Re: Add json_object(text[], json[])?