Re: proposal: additional error fields

From: "David Johnston" <polobo(at)yahoo(dot)com>
To: "'Peter Geoghegan'" <peter(at)2ndquadrant(dot)com>, "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "'Pavel Stehule'" <pavel(dot)stehule(at)gmail(dot)com>, "'PostgreSQL Hackers'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: additional error fields
Date: 2012-05-01 21:20:15
Message-ID: 008d01cd27e0$3462a510$9d27ef30$@yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org [mailto:pgsql-hackers-
> owner(at)postgresql(dot)org] On Behalf Of Peter Geoghegan
> Sent: Tuesday, May 01, 2012 4:37 PM
> To: Tom Lane
> Cc: Pavel Stehule; PostgreSQL Hackers
> Subject: Re: [HACKERS] proposal: additional error fields
>
> On 1 May 2012 21:14, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Peter Geoghegan <peter(at)2ndquadrant(dot)com> writes:
> >> Maybe no one is convinced by any of this, but the fact is that the
> >> SQLSTATE argument falls down when one considers that we aren't using
> >> it in many cases of errors that clearly are severe.
> >
> > The reason that argument isn't convincing is that we *are* using a
> > SQLSTATE for every such message; it's just defaulted to XX000.
> > AFAICT, it would be reasonable to treat all XX000 as alarm conditions
> > until proven different.  If a given message is, in fact, not supposed
> > to be "can't happen", then it shouldn't be going through elog().  We'd
> > probably be needing to fix some places that were lazily coded as
> > elogs, but under your proposal we would also have to touch every such
> > place ... and thousands more besides.
>
> Fair enough. Adjusting all of those elog calls may be excessive. The
argument
> could be made that what I've characterised as severe (which is, as I've
said,
> not entirely clear-cut) could be deduced from SQLSTATE if we were to
> formalise the "can't happen errors are only allowed to use elog()"
convention
> into a hard rule. However, I think it's critically important to make all
of this
> easy and well-documented. Severity should probably be part of the default
> log_line_prefix.
>
> Sorry for high-jacking your thread, Pavel.
>

So the apparent desire is to promote proper usage of SQLSTATE but
simultaneously add and encode a default SQLSTATE_PG_SEVERITY value for each
class/code that can be used for external monitoring and notification.
Ideally customization could be done so that differing opinions on such
severity classification could be made on a client-per-client basis without
having to resort to outputting the SQLSTATE code itself and then requiring
external software to maintain such an association. To that end any
"severity" on the class itself would act as a default and specific codes
that want to share the same severity can be skipped while those needing a
different code can have an override specified. Since the codes are neither
exhaustive nor mandatory such a default would apply to any user-chosen code
not previously defined.

Simply adding in more high-level categories avoids the issue that the
current system has insufficient information encoded to facilitate desired
reporting requirements. If we encode our messages with a sufficient level
of detail then internally or externally adding categories and meta-data on
top of those layers is simple and new ideas and techniques can be tried
without having to modify the system in the future.

Supplemental context information such as table and constraint names can be
useful if the cost of recording such data is low enough and the value
sufficient. That said, knowing the SQL that caused the error and which
process it is implementing should be sufficient to identify possible causes
and resolutions without requiring the specific columns and tables involved.
Since constraint violations already expose the name of the violated
constraint that particular situation seems to have a sufficient solution.
Given that you should not be giving end-users that kind of implementation
artifact anyway the developer and DBA should be able to identify the root
cause and either avoid it themselves or code an application interface to
present to the end-user. So, at least from my perspective, the bar to move
this forward is pretty high - either it must be fairly simple to implement
(which it is not) or there needs to be more value to it than I am seeing
currently. This ignores whether normal runtime performance costs will be a
significant factor.

Looking at the SQLSTATE error classes I am somewhat concerned with the
number of items found under "HV" and the apparent intermixing of "client"
and "internal" error types.

As for an upgrade path how about something along the lines of:

1) Make a best-attempt effort at identifying existing elog and ereport calls
and modifying them to output specific SQLSTATE codes
2) Modify elog/ereport to catch and log (stack trace) any calls that do not
set SQLSTATE to a specific value.
3) During development, beta, and RC phases keep such code in place and ask
people to look at their logs for missed elog/ereport calls
4) Remove the stack trace (logging) within ereport/elog from the final
released code

David J.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-05-01 21:26:37 Re: psql omits row count under "\x auto"
Previous Message Stephen Frost 2012-05-01 21:15:03 Re: extending relations more efficiently