Re: parallel.c is not marked as test covered

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Clément Prévost <prevostclement(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel.c is not marked as test covered
Date: 2016-06-14 16:37:06
Message-ID: CA+Tgmob78-4NEYsqA7mapdqLgkAyVuOAKmtgQH9qiMhand-acg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 10, 2016 at 4:12 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Jun 10, 2016 at 1:49 PM, Peter Eisentraut
> <peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:
>> Regarding the patch that ended up being committed, I wonder if it is
>> intentional that PL/pgSQL overwrites the context from the parallel worker.
>> Shouldn't the context effectively look like
>>
>> ERROR: message
>> CONTEXT: parallel worker
>> CONTEXT: PL/pgSQL function
>>
>> Elsewhere in this thread I suggested getting rid of the parallel worker
>> context by default (except for debugging), but if we do want to keep it,
>> then it seems to be a bug that a PL/pgSQL function can just eliminate it.
>
> Several people have suggested getting rid of that now, so maybe we
> should just go ahead and do it.
>
> How would we make it available for debugging, though? That seems like
> something people will want.

This is currently listed as an open item, but it doesn't seem very
actionable to me. The fact that PL/plpgsql chucks the existing
context instead of appending to it is presumably a property of
PL/plpgsql, not parallel query, and changing that seems like it ought
to be out of scope for 9.6.

As far as the context is concerned, the reason why that's there is
because I felt (and still feel, to some extent) that users might
experience errors that happen inside a parallel worker and it might be
important for debugging purposes to know that. Suppose, for example,
that you mislabel a function as PARALLEL SAFE when in fact it relies
on some backend-local state (and should therefore be PARALLEL
RESTRICTED). When it runs in a worker, it might generate an error
message like this:

ERROR: can't generate random numbers because you haven't specified a seed

...to which the user will reply, "oh yes I did; in fact I ran SELECT
magic_setseed(42) just before I ran the offending query!". They'll
then contact an expert (hopefully) who will very possibly spend a lot
of time troubleshooting the wrong thing. If the message says:

ERROR: can't generate random numbers because you haven't specified a seed
CONTEXT: parallel worker, PID 62413

...then the expert has a very good chance of guessing what has gone
wrong right away.

Now, against this scenario, there is every possibility that this
message will just annoy a lot of people. It is certainly annoying for
regression test authors because the PID changes and, even if we took
the PID out, any given error might sometimes happen in the leader
rather than the worker, depending on timing. I am not aware of a
concrete reason why it will annoy people in other situations, but
there may well be one (or more).

The simplest thing we could do here is nothing. We can wait to see
what happens with PostgreSQL 9.6 and then decide whether to make a
change. Or we can do something now. We can eliminate the context
entirely and take our chances with scenarios like the one mentioned
above. Or we can add a new GUC, something like hide_parallel_context,
which suppresses it and turn that off when running regression tests,
or even by default. It's all just a matter of how much it bothers you
to sometimes get an extra context line vs. how much annoyance you
think will be caused by people wasting time troubleshooting because
they didn't realize parallel query was involved.

Personally, I'm very doubtful about our ability to paper over
artifacts like this. Sophisticated users are going to end up having
to understand that there are parallel workers and that they propagate
their messages back to the leader, which appends its own context. We
often seem to assume users won't find out about internal details like
that and so we don't make an effort to expose and document them, but I
don't think that often works out. There are a whole lot of people who
know that they need to run EXPLAIN 6 times to find out what's really
going on. I don't expect this case to be any different. I'm actually
sort of flattered that parallel query is working well enough that
multiple people think they might not ever need to know whether an
error comes from the worker or from the leader, but I'm rather
doubtful that will be true in practice.

All that having been said, I don't know everything and this is just a
judgement call. I exercised my judgement and that's how we got here.
If there's a consensus that my judgement should be overruled, I'm fine
with that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-06-14 16:51:25 Re: parallel.c is not marked as test covered
Previous Message Martín Marqués 2016-06-14 16:34:22 PgQ and pg_dump