CLOBBER_CACHE_ALWAYS regression instability

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: CLOBBER_CACHE_ALWAYS regression instability
Date: 2020-04-05 18:33:19
Message-ID: 30675.1586111599@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Over in the thread at [1] we were wondering how come buildfarm member
hyrax has suddenly started to fail like this:

diff -U3 /home/buildfarm/buildroot/HEAD/pgsql.build/src/test/regress/expected/errors.out /home/buildfarm/buildroot/HEAD/pgsql.build/src/test/regress/results/errors.out
--- /home/buildfarm/buildroot/HEAD/pgsql.build/src/test/regress/expected/errors.out 2018-11-21 13:48:48.340000000 -0500
+++ /home/buildfarm/buildroot/HEAD/pgsql.build/src/test/regress/results/errors.out 2020-04-04 04:48:16.704699045 -0400
@@ -446,4 +446,4 @@
'select infinite_recurse()' language sql;
\set VERBOSITY terse
select infinite_recurse();
-ERROR: stack depth limit exceeded
+ERROR: stack depth limit exceeded at character 8

I've now looked into this, and found that it's not at all hard to
duplicate; compile HEAD with -DCLOBBER_CACHE_ALWAYS, and run "select
infinite_recurse()", and you'll likely get the changed error message.
(The lack of other buildfarm failures is probably just because we
have so few animals doing CLOBBER_CACHE_ALWAYS builds frequently.)

The issue seems indeed to have been triggered by 8f59f6b9c0, because
that inserted an equal() call into recomputeNamespacePath(). equal()
includes a check_stack_depth() call. We get the observed message if
this call is the one where the stack limit is hit, and it is invoked
inside ParseFuncOrColumn(), which has set up a parser error callback
to point at the infinite_recurse() call that it's trying to resolve.
That callback's been there a long time of course, so we may conclude
that no other code path reached from ParseFuncOrColumn contains a
stack depth check, or we'd likely have seen this before.

It's a bit surprising perhaps that we run out of stack here and not
somewhere else; but ParseFuncOrColumn and its subroutines consume
quite a lot of stack, because of FUNC_MAX_ARGS-sized local arrays,
so it's not *that* surprising.

So, what to do to re-stabilize the regression tests? Even if
we wanted to revert 8f59f6b9c0, that'd be kind of a band-aid fix,
because there are lots of other possible ways that a parser error
callback could be active at the point of the error. A few other
possibilities are:

1. Change the test to do "\set VERBOSITY sqlstate" so that all that
is printed is
ERROR: 54001
ERRCODE_STATEMENT_TOO_COMPLEX is used in few enough places that
this wouldn't be too much of a loss of specificity. (Or we could
give stack overflow its very own ERRCODE.)

2. Hack pcb_error_callback so that it suppresses the error position
report for ERRCODE_STATEMENT_TOO_COMPLEX, as it already does
for ERRCODE_QUERY_CANCELED. That seems pretty unpleasant though.

3. Create a separate expected-file to match the variant output.
This would be a maintenance problem, but we could ameliorate that
by moving the test to its own regression script, which was something
that'd already been proposed to get around the PPC64 Linux kernel
signal-handling bug that's been causing intermittent failures on
most of the PPC64 buildfarm animals [2].

On the whole I find #1 the least unpleasant, as well as the most
likely to forestall future variants of this issue. It won't dodge
the PPC64 problem, but I'm willing to keep living with that one
for now.

Thoughts?

regards, tom lane

[1] https://www.postgresql.org/message-id/CA%2BTgmoaUOS5X64nKgFxNV7JHN4sRkNAJYW2gHz-LMb0Ej4xHig%40mail.gmail.com
[2] https://www.postgresql.org/message-id/27924.1571068231%40sss.pgh.pa.us

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-04-05 18:57:49 Re: CLOBBER_CACHE_ALWAYS regression instability
Previous Message Michail Nikolaev 2020-04-05 17:04:32 Re: [PATCH] Btree BackwardScan race condition on Standby during VACUUM