Re: Error message style guide

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Error message style guide
Date: 2003-03-15 16:46:20
Message-ID: 28760.1047746780@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> Some people were mentioning an error message style guide. Here's a start
> of one that I put together a while ago. Feel free to consider it.

Looks like a good start. But you expected quibbles, right? ;-)

> The main part of a message should be at most 72 characters long. For
> embedded format specifiers (%s, %d, etc.), a reasonable estimate of
> the expected string should be taken into account. The rest should be
> distributed to the detail and the hint parts.

This is not really workable to adhere to strictly. For example, a
message that includes more than one user identifier (eg, a table and
column name) fails the test immediately since each name might be
NAMEDATALEN-1 long. Even with only one identifier, I have nine
characters allowed for the error text ... less quotes and a space
makes six... less "ERROR:" leaves me with nothing. Okay, so you said
"reasonable estimate" not "worst case", but unless you want to specify
what you think a reasonable estimate is, this guideline is useless.

I think a style guide should just say "Keep primary messages short".

> A message may not contain a newline or a tab.

This might work for primary messages given the "keep it short" dictum,
but it's quite unworkable for detail and hint messages --- we have some
of the latter that run to many lines.

How about something like "Avoid tabs. Insert newlines as needed to keep
message lines shorter than X characters. Keep in mind that client
code might reformat long messages for its own purposes, so don't rely on
text layout for legibility."

> Use quotes always to denote files, database objects, and other
> variables of a character-string nature. Do not use them to mark up
> nonvariable items.

One thing that's been annoying me recently is that some of our messages
exhibit double quoting, eg

regression=# select 'a' ### 'b';
ERROR: Unable to identify an operator '###' for types '"unknown"' and '"unknown"'
You will have to retype this query using an explicit cast

The reason this particular case happens is that the elog call puts
(single) quotes around the result of format_type_be --- and the latter
puts double quotes around names that seem to need it, which include
mixed-case names and (as in this case) names that are also SQL keywords.
Individually each of these choices seems defensible, but the result is
mighty ugly. How can we fix it?

> NOTE: This format encourages embedding data items into the message in
> grammatical positions instead of the old style 'invalid value: bar'.

I'm not sure that I like making messages be utterly dependent on the
presence of quotes to be decipherable. Would you consider the above
message to be better phrased as, say,

ERROR: Unable to identify an infix operator "unknown" "###" "unknown"

Throw a few spaces and random characters into the type names, and this
gets very unreadable very fast. The "invalid value: bar" style has the
advantage that the message text is pretty clearly separated from the
object being complained about.

> Do not end the message with a period. Do not even think about ending
> a message with an exclamation point.

> RATIONALE: Avoiding punctuation makes it easier for client
> applications to embed the message into a variety of grammatical
> contexts. Often, messages are not grammatically complete sentences
> anyway. (And if they're long enough to be more than one sentence,
> split them up.)

This works for primary messages, I think, but not detail and hint
messages. Can we use a different rule for detail/hint messages?

> Use lower case for message wording, including the first letter of the
> message. Use upper case for SQL commands and key words if the message
> refers to the command string.

Again, this falls down for multi-sentence hints.

> Instead of multiple sentences, consider using semicolons or commas.

Here's an example of an actual hint in the present sources. Do you
really want to convert it into one run-on sentence?

This error does *not* mean that you have run out of disk space.

It occurs when either the system limit for the maximum number of
semaphore sets (SEMMNI), or the system wide maximum number of
semaphores (SEMMNS), would be exceeded. You need to raise the
respective kernel parameter. Alternatively, reduce PostgreSQL's
consumption of semaphores by reducing its max_connections parameter
(currently %d).

The PostgreSQL Administrator's Guide contains more information about
configuring your system for PostgreSQL.

> | could not open file %s (%m)

> RATIONALE: It would be difficult to account for all possible error codes
> to paste this into a single smooth sentence. It also looks better and is
> more flexible than colons or dashes to separate the sentences

We almost uniformly use "could not open file %s: %m" for this now. Is
the parenthesis style really better? I don't find it more natural. In
most cases, the %m part is the actually useful information, so it seems
odd to put it in parentheses. That normally indicates a subsidiary,
less-important part of a sentence.

> Try to avoid "unknown". Consider, "error: unknown response". If you
> don't know what the response is, how do you know it's erroneous? If,
> however, the error lies in the fact that you don't know the response,
> this wording is clearly confusing.

But suggest an alternative. "Unrecognized" might be the desired word
here. Also, please recommend that such a message should show the actual
value it's unhappy with, eg

ERROR: Unrecognized node type: 42

> Rather than mentioning what the function or system call was that
> failed, describe what the function was trying to do, e.g., "could not
> open file". This may admittedly be difficult to do with candidates
> such as "select()".

> RATIONALE: Users don't know what all those functions do.

We have done this in the past at least partly for debugging reasons;
but the availability of file/line number info should reduce the pressure
to phrase error messages in a way that exposes exactly which call
failed. Nonetheless I'm not sure that avoiding references to system
calls will improve matters. In particular, for cases that are really
"can't happen" situations (eg, we are normally not expecting select(2)
to fail), I'm not seeing the advantage of avoiding the reference.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2003-03-16 02:38:13 ALTER USER
Previous Message Bruno Wolff III 2003-03-15 15:23:28 Re: No index maximum? (was Re: No merge sort?)