Re: Frontend error logging style

From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Frontend error logging style
Date: 2022-03-29 10:13:38
Message-ID: 191BF874-CB92-4F0A-9F0B-414465DE35CB@yesql.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 27 Mar 2022, at 22:19, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Here's a rebase up to today's HEAD. I've fixed the merge problems,
> but there may be some stray new error calls that could be converted
> to use pg_fatal() and haven't been. I don't want to do a full
> fresh scan of the code until we're about ready to commit this.

I focused on finding any changes in logic (exiting where it previously kept
going after error, or vice versa) and was unable to find any which inspires
confidence. This was a large patch to read through, so I'm fairly sure I've
missed something.

Below are comments (nitpicks to a large extent) which either relates to this
patch or to the messages in general. For the latter, feel free to deem them
out of scope and we can take those comments separately from this work on
-hackers.

As a side-note realization from reading this, there is a tremendous amount of
code paths in pg_dump which leads to a fatal error.. thats not necessarily a
bad thing, it's just striking when reading them all at once.

> @@ -508,24 +502,15 @@ writefile(char *path, char **lines)

> if (fclose(out_file))
> - {
> - pg_log_error("could not write file \"%s\": %m", path);
> - exit(1);
> - }
> + pg_fatal("could not write file \"%s\": %m", path);
> }

Should we update this message to differentiate it from the fwrite error case?
Something like "an error occurred during writing.."

> @@ -2057,10 +2004,7 @@ check_locale_name(int category, const char *locale, char **canonname)
>
> save = setlocale(category, NULL);
> if (!save)
> - {
> - pg_log_error("setlocale() failed");
> - exit(1);
> - }
> + pg_fatal("setlocale() failed");

Should this gain a hint message for those users who have no idea what this
really means?

> @@ -2127,15 +2067,14 @@ check_locale_encoding(const char *locale, int user_enc)
> user_enc == PG_SQL_ASCII))
> {
> pg_log_error("encoding mismatch");
> - fprintf(stderr,
> - _("The encoding you selected (%s) and the encoding that the\n"
> - "selected locale uses (%s) do not match. This would lead to\n"
> - "misbehavior in various character string processing functions.\n"
> - "Rerun %s and either do not specify an encoding explicitly,\n"
> - "or choose a matching combination.\n"),
> - pg_encoding_to_char(user_enc),
> - pg_encoding_to_char(locale_enc),
> - progname);
> + pg_log_error_detail("The encoding you selected (%s) and the encoding that the "
> + "selected locale uses (%s) do not match. This would lead to "

I will be sad to see double space after period go, but I think I'm rare in
preferring that.

> @@ -2344,10 +2276,7 @@ setup_pgdata(void)
> * have embedded spaces.
> */
> if (setenv("PGDATA", pg_data, 1) != 0)
> - {
> - pg_log_error("could not set environment");
> - exit(1);
> - }
> + pg_fatal("could not set environment");

Should we be explicit about which env var?

> @@ -3089,18 +2979,14 @@ main(int argc, char *argv[])
> else if (strcmp(optarg, "libc") == 0)
> locale_provider = COLLPROVIDER_LIBC;
> else
> - {
> - pg_log_error("unrecognized locale provider: %s", optarg);
> - exit(1);
> - }
> + pg_fatal("unrecognized locale provider: %s", optarg);

Should this %s be within quotes to match how we usually emit user-input?

> @@ -1123,9 +1097,9 @@ verify_btree_slot_handler(PGresult *res, PGconn *conn, void *context)
> pg_log_warning("btree index \"%s.%s.%s\": btree checking function returned unexpected number of rows: %d",
> rel->datinfo->datname, rel->nspname, rel->relname, ntups);
> if (opts.verbose)
> - pg_log_info("query was: %s", rel->sql);
> - pg_log_warning("Are %s's and amcheck's versions compatible?",
> - progname);
> + pg_log_warning_detail("Query was: %s", rel->sql);
> + pg_log_warning_hint("Are %s's and amcheck's versions compatible?",
> + progname);

Should "amcheck's" be a %s parameter to make translation reusable (which it
miht never be) and possibly avoid translation mistake?

> --- a/src/bin/pg_basebackup/receivelog.c
> +++ b/src/bin/pg_basebackup/receivelog.c
> @@ -140,7 +140,7 @@ open_walfile(StreamCtl *stream, XLogRecPtr startpoint)
> /* fsync file in case of a previous crash */
> if (stream->walmethod->sync(f) != 0)
> {
> - pg_log_fatal("could not fsync existing write-ahead log file \"%s\": %s",
> + pg_log_error("could not fsync existing write-ahead log file \"%s\": %s",
> fn, stream->walmethod->getlasterror());
> stream->walmethod->close(f, CLOSE_UNLINK);
> exit(1);

In the case where we already have IO related errors, couldn't the close() call
cause an additional error message which potentially could be helpful for
debugging?

> @@ -597,31 +570,19 @@ main(int argc, char *argv[])

> if (ControlFile->data_checksum_version == 0 &&
> mode == PG_MODE_CHECK)
> - {
> - pg_log_error("data checksums are not enabled in cluster");
> - exit(1);
> - }
> + pg_fatal("data checksums are not enabled in cluster");
>
> if (ControlFile->data_checksum_version == 0 &&
> mode == PG_MODE_DISABLE)
> - {
> - pg_log_error("data checksums are already disabled in cluster");
> - exit(1);
> - }
> + pg_fatal("data checksums are already disabled in cluster");
>
> if (ControlFile->data_checksum_version > 0 &&
> mode == PG_MODE_ENABLE)
> - {
> - pg_log_error("data checksums are already enabled in cluster");
> - exit(1);
> - }
> + pg_fatal("data checksums are already enabled in cluster");

Fatal seems sort of out place here, it's not really a case of "something
terrible happened" but rather "the preconditions weren't met". Couldn't these
be a single pg_log_error erroring out with the reason in a pg_log_detail?

> @@ -1323,10 +1320,10 @@ lockTableForWorker(ArchiveHandle *AH, TocEntry *te)
> res = PQexec(AH->connection, query->data);
>
> if (!res || PQresultStatus(res) != PGRES_COMMAND_OK)
> - fatal("could not obtain lock on relation \"%s\"\n"
> - "This usually means that someone requested an ACCESS EXCLUSIVE lock "
> - "on the table after the pg_dump parent process had gotten the "
> - "initial ACCESS SHARE lock on the table.", qualId);
> + pg_fatal("could not obtain lock on relation \"%s\"\n"
> + "This usually means that someone requested an ACCESS EXCLUSIVE lock "
> + "on the table after the pg_dump parent process had gotten the "
> + "initial ACCESS SHARE lock on the table.", qualId);

Punctuation and formatting in a pg_fatal feels a bit wrong, and smells of need
for _detail/_hint. I guess there isn't a way to do that here though, but still
a shame.

> @@ -50,9 +50,10 @@ _check_database_version(ArchiveHandle *AH)
> && (remoteversion < AH->public.minRemoteVersion ||
> remoteversion > AH->public.maxRemoteVersion))
> {
> - pg_log_error("server version: %s; %s version: %s",
> - remoteversion_str, progname, PG_VERSION);
> - fatal("aborting because of server version mismatch");
> + pg_log_error("aborting because of server version mismatch");
> + pg_log_error_detail("server version: %s; %s version: %s",
> + remoteversion_str, progname, PG_VERSION);
> + exit(1);

Capitalization of the _detail message?

> @@ -166,7 +167,7 @@ ConnectDatabase(Archive *AHX,
> AH->connection = PQconnectdbParams(keywords, values, true);
>
> if (!AH->connection)
> - fatal("could not connect to database");
> + pg_fatal("could not connect to database");

Shouldn't this print which database it failed to connect to?

> @@ -721,7 +721,7 @@ setFilePath(ArchiveHandle *AH, char *buf, const char *relativeFilename)
> dname = ctx->directory;
>
> if (strlen(dname) + 1 + strlen(relativeFilename) + 1 > MAXPGPATH)

Unrelated, but shouldn't that be >= MAXPGPATH?

> @@ -2033,8 +2027,8 @@ dumpTableData_copy(Archive *fout, const void *dcontext)
> {
> /* copy data transfer failed */
> pg_log_error("Dumping the contents of table \"%s\" failed: PQgetCopyData() failed.", classname);
> - pg_log_error("Error message from server: %s", PQerrorMessage(conn));
> - pg_log_error("The command was: %s", q->data);
> + pg_log_error_detail("Error message from server: %s", PQerrorMessage(conn));
> + pg_log_error_detail("Command was: %s", q->data);

Shouldn't the pg_log_error() call be with leading lowercase and no punctuation?
(same in the following hunk)

> @@ -11756,8 +11747,8 @@ dumpFunc(Archive *fout, const FuncInfo *finfo)
> else if (provolatile[0] == PROVOLATILE_STABLE)
> appendPQExpBufferStr(q, " STABLE");
> else if (provolatile[0] != PROVOLATILE_VOLATILE)
> - fatal("unrecognized provolatile value for function \"%s\"",
> - finfo->dobj.name);
> + pg_fatal("unrecognized provolatile value for function \"%s\"",
> + finfo->dobj.name);
> }
>
> if (proisstrict[0] == 't')
> @@ -11806,8 +11797,8 @@ dumpFunc(Archive *fout, const FuncInfo *finfo)
> else if (proparallel[0] == PROPARALLEL_RESTRICTED)
> appendPQExpBufferStr(q, " PARALLEL RESTRICTED");
> else if (proparallel[0] != PROPARALLEL_UNSAFE)
> - fatal("unrecognized proparallel value for function \"%s\"",
> - finfo->dobj.name);
> + pg_fatal("unrecognized proparallel value for function \"%s\"",
> + finfo->dobj.name);

We should probably move the provolatile and propallel keywords to a %s param so
we can reuse the translation.

> @@ -14951,18 +14942,18 @@ createViewAsClause(Archive *fout, const TableInfo *tbinfo)

> - fatal("definition of view \"%s\" appears to be empty (length zero)",
> - tbinfo->dobj.name);
> + pg_fatal("definition of view \"%s\" appears to be empty (length zero)",
> + tbinfo->dobj.name);

I'm not sure we need to provide a definition of empty here, most readers will
probably understand that already =)

> @@ -16602,13 +16593,10 @@ dumpSequence(Archive *fout, const TableInfo *tbinfo)
> res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
>
> if (PQntuples(res) != 1)
> - {
> - pg_log_error(ngettext("query to get data of sequence \"%s\" returned %d row (expected 1)",
> - "query to get data of sequence \"%s\" returned %d rows (expected 1)",
> - PQntuples(res)),
> - tbinfo->dobj.name, PQntuples(res));
> - exit_nicely(1);
> - }
> + pg_fatal(ngettext("query to get data of sequence \"%s\" returned %d row (expected 1)",
> + "query to get data of sequence \"%s\" returned %d rows (expected 1)",
> + PQntuples(res)),
> + tbinfo->dobj.name, PQntuples(res));

The ngettext() call seems a bit out of place here since we already know that
second form will be taken given the check on PQntuples(res).

> @@ -16824,13 +16812,10 @@ dumpSequenceData(Archive *fout, const TableDataInfo *tdinfo)
> res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
>
> if (PQntuples(res) != 1)
> - {
> - pg_log_error(ngettext("query to get data of sequence \"%s\" returned %d row (expected 1)",
> - "query to get data of sequence \"%s\" returned %d rows (expected 1)",
> - PQntuples(res)),
> - tbinfo->dobj.name, PQntuples(res));
> - exit_nicely(1);
> - }
> + pg_fatal(ngettext("query to get data of sequence \"%s\" returned %d row (expected 1)",
> + "query to get data of sequence \"%s\" returned %d rows (expected 1)",
> + PQntuples(res)),
> + tbinfo->dobj.name, PQntuples(res));

Same as above, PQntuples(res) cannot be 1 so there os just one form used no?

> --- a/src/bin/pg_dump/pg_dumpall.c
> +++ b/src/bin/pg_dump/pg_dumpall.c
> @@ -200,16 +200,15 @@ main(int argc, char *argv[])
> strlcpy(full_path, progname, sizeof(full_path));
>
> if (ret == -1)
> - pg_log_error("The program \"%s\" is needed by %s but was not found in the\n"
> - "same directory as \"%s\".\n"
> - "Check your installation.",
> - "pg_dump", progname, full_path);
> + pg_fatal("The program \"%s\" is needed by %s but was not found in the\n"
> + "same directory as \"%s\".\n"
> + "Check your installation.",
> + "pg_dump", progname, full_path);
> else
> - pg_log_error("The program \"%s\" was found by \"%s\"\n"
> - "but was not the same version as %s.\n"
> - "Check your installation.",
> - "pg_dump", full_path, progname);
> - exit_nicely(1);
> + pg_fatal("The program \"%s\" was found by \"%s\"\n"
> + "but was not the same version as %s.\n"
> + "Check your installation.",

"Check your installation" seems superfluous here given the error. (we do this
in a number of places)

> @@ -1611,9 +1572,9 @@ connectDatabase(const char *dbname, const char *connection_string,
> && (server_version < 90200 ||
> (server_version / 100) > (my_version / 100)))
> {
> - pg_log_error("server version: %s; %s version: %s",
> - remoteversion_str, progname, PG_VERSION);
> pg_log_error("aborting because of server version mismatch");
> + pg_log_error_detail("server version: %s; %s version: %s",
> + remoteversion_str, progname, PG_VERSION);

Capitalization?

> @@ -388,13 +387,13 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
> if (errinfo.wre_errno != 0)
> {
> errno = errinfo.wre_errno;
> - fatal_error("could not read from file %s, offset %d: %m",
> - fname, errinfo.wre_off);
> + pg_fatal("could not read from file %s, offset %d: %m",
> + fname, errinfo.wre_off);
> }
> else
> - fatal_error("could not read from file %s, offset %d: read %d of %d",
> - fname, errinfo.wre_off, errinfo.wre_read,
> - errinfo.wre_req);
> + pg_fatal("could not read from file %s, offset %d: read %d of %d",
> + fname, errinfo.wre_off, errinfo.wre_read,
> + errinfo.wre_req);

Filename %s should be quoted like \"%s\".

> @@ -7379,7 +7254,7 @@ main(int argc, char **argv)
> THREAD_BARRIER_DESTROY(&barrier);
>
> if (exit_code != 0)
> - pg_log_fatal("Run was aborted; the above results are incomplete.");
> + pg_log_error("Run was aborted; the above results are incomplete.");

Capitalization and punctuation?

> @@ -781,10 +772,7 @@ process_psqlrc(char *argv0)
> char *envrc = getenv("PSQLRC");
>
> if (find_my_exec(argv0, my_exec_path) < 0)
> - {
> - pg_log_fatal("could not find own program executable");
> - exit(EXIT_FAILURE);
> - }
> + pg_fatal("could not find own program executable");

I have a feeling this will be confusing to a fair few users. Not sure how to
improve it, but there seems to be some room for it.

> @@ -144,16 +145,10 @@ main(int argc, char *argv[])
> if (alldb)
> {
> if (dbname)
> - {
> - pg_log_error("cannot cluster all databases and a specific one at the same time");
> - exit(1);
> - }
> + pg_fatal("cannot cluster all databases and a specific one at the same time");
>
> if (tables.head != NULL)
> - {
> - pg_log_error("cannot cluster specific table(s) in all databases");
> - exit(1);
> - }
> + pg_fatal("cannot cluster specific table(s) in all databases");

An ngettext() candidate perhaps? There are more like this in main() hunks further down omitted for brevity here.

> diff --git a/src/fe_utils/connect_utils.c b/src/fe_utils/connect_utils.c
> index a30c66f13a..f2e583f9fa 100644
> --- a/src/fe_utils/connect_utils.c
> +++ b/src/fe_utils/connect_utils.c
> @@ -88,11 +88,8 @@ connectDatabase(const ConnParams *cparams, const char *progname,
> conn = PQconnectdbParams(keywords, values, true);
>
> if (!conn)
> - {
> - pg_log_error("could not connect to database %s: out of memory",
> - cparams->dbname);
> - exit(1);
> - }
> + pg_fatal("could not connect to database %s: out of memory",
> + cparams->dbname);

Database name quoted like \"%s\" to match other messages?

--
Daniel Gustafsson https://vmware.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-03-29 10:42:04 Re: Identify missing publications from publisher while create/alter subscription.
Previous Message Julien Rouhaud 2022-03-29 10:08:46 Re: Add parameter jit_warn_above_fraction