Re: Multiple hosts in connection string failed to failover in non-hot standby mode

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hubert Zhang <zhubert(at)vmware(dot)com>, tsunakawa(dot)takay(at)fujitsu(dot)com, pgsql-hackers(at)postgresql(dot)org, Andreas Seltenreich <seltenreich(at)gmx(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: Multiple hosts in connection string failed to failover in non-hot standby mode
Date: 2021-05-06 16:26:51
Message-ID: 20210506162651.GJ27406@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jan 10, 2021 at 05:38:50PM -0500, Tom Lane wrote:
> I wrote:
> > The problems that I see in this area are first that there's no
> > real standardization in libpq as to whether to append error messages
> > together or just flush preceding messages; and second that no effort
> > is made in multi-connection-attempt cases to separate the errors from
> > different attempts, much less identify which error goes with which
> > host or IP address. I think we really need to put some work into
> > that.
>
> I spent some time on this, and here is a patch set that tries to
> improve matters.
>
> 0001 changes the libpq coding rules to be that all error messages should
> be appended to conn->errorMessage, never overwritten (there are a couple
> of exceptions in fe-lobj.c) and we reset conn->errorMessage to empty only
> at the start of a connection request or new query. This is something
> that's been bugging me for a long time and I'm glad to get it cleaned up.
> Formerly it seemed that a dartboard had been used to decide whether to use
> "printfPQExpBuffer" or "appendPQExpBuffer"; now it's always the latter.
> We can also get rid of several hacks that were used to get around the
> mess and force appending behavior.
>
> 0002 then changes the reporting rules in fe-connect.c as I suggested,
> so that you might get errors like this:
>
> $ psql -h localhost,/tmp -p 12345
> psql: error: could not connect to host "localhost" (::1), port 12345: Connection refused
> Is the server running on that host and accepting TCP/IP connections?
> could not connect to host "localhost" (127.0.0.1), port 12345: Connection refused
> Is the server running on that host and accepting TCP/IP connections?
> could not connect to socket "/tmp/.s.PGSQL.12345": No such file or directory
> Is the server running locally and accepting connections on that socket?
>
> and we have a pretty uniform rule that errors coming back from a
> connection attempt will be prefixed with the server address.

52a10224 broke sqlsmith, of all things.

It was using errmsg as a test of success, instead of checking if the connection
result wasn't null:

conn = PQconnectdb(conninfo.c_str());
char *errmsg = PQerrorMessage(conn);
if (strlen(errmsg))
throw dut::broken(errmsg, "08001");

That's clearly the wrong thing to do, but maybe this should be described in the
release notes as a compatibility issue, in case other people had the same idea.
Clearing errorMessage during success is an option..

--
Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message 盏一 2021-05-06 16:30:13 use `proc->pgxactoff` as the value of `index` in `ProcArrayRemove()`
Previous Message Dilip Kumar 2021-05-06 16:03:53 Re: Small issues with CREATE TABLE COMPRESSION