Re: PQHost() undefined behavior if connecting string contains both host and hostaddr types

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: david(dot)g(dot)johnston(at)gmail(dot)com
Cc: kommi(dot)haribabu(at)gmail(dot)com, michael(at)paquier(dot)xyz, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org, michael(dot)paquier(at)gmail(dot)com, robertmhaas(at)gmail(dot)com
Subject: Re: PQHost() undefined behavior if connecting string contains both host and hostaddr types
Date: 2018-03-27 09:21:44
Message-ID: 20180327.182144.210938935.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I apologize in advance that I'm not proper for wordsmithing.

At Tue, 27 Mar 2018 00:24:07 -0700, "David G. Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> wrote in <CAKFQuwaats6dndVT73xrFqBu+RJ4tsBopCNON5XuV+xT8tGVkQ(at)mail(dot)gmail(dot)com>
> On Mon, Mar 26, 2018 at 10:47 PM, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
> wrote:
>
> > updated patch attached with additional doc updates as per the suggestion
> > from the upthreads.
> >
> ​---------------------------------------------------------
> Some comments if the patch remains in-tact:
> ​
> ​Lower-case "i" in "It is not" in the second paragraphs
>
> The ... function returns NULL when the input conn parameter is NULL or an
> empty string if conn cannot be evaluated. Otherwise, the return value is
> the first non-NULL, non-empty, value of either the host or hostaddr
> property of conn. <omit the entire last sentence, Callers of this
> function..., for PQport too>

Mmm. Does the second sentense mean that "host precedes to
hostaddr if any"? The code is written as the above but what users
observe is a connection string. I suppose that that accuracy is
rather confusing. I agree that the sentense "Callers of this.."
is needless since anyway the case is illegal regardsless of this
function.

That is something like the follows.

| The PQhost function returns NULL when the input conn parameter
| is NULL or an empty string if conn cannot be evaluated.
| Otherwise, it returns host if any or otherwise hostaddr of
| connection property.

I personally don't think it is not necessary to mention the NULL
case since other similiar funcions don't have such description.

> Omit "properly" after established - if the connection is established one
> can assume it was done "properly".
>
> Add "the" after calling: "can be checked by calling the
> <function>PQstatus</function> function.
>
> ------------------------------------------------------
> I'm having a bit of concern about the actual specification and
> wording...but don't have time right now to thoroughly read the thread
> history, docs, and/or code at this moment to know whether its just
> inexperience on my part or an actual confusion. But here's where I'm at
> presently...
>
> I'm not sure why there is confusion here in the first place...the docs[1]
> say:
>
> https://www.postgresql.org/docs/10/static/libpq-status.html
> "The following functions return parameter values established at connection."
>
> At which point there is only one meaningful value for them - the value that
> pertains to the established connection.

True.

> As far as "or an empty string if conn cannot be evaluated" should just
> become "an empty string if conn is inactive".

The description tries to clarify the behavior while the
connection is not established. I'm not sure how "evaluate" sounds
exactly but I understand it as "PQhost can return any pausible
value". An inactive (before establising) connection can return a
value, but it can be wrong. It means that we shouldn't ask
PQhost(conn) whether the connection is established or not. So
without deriberate wording we could say;

| The PQhost function returns host if any or otherwise hostaddr
| of connection property. It may return a wrong value for
| connections before establishing and may return NULL if NULL is
| given as the parameter.

The meaning of "establising a connection" is intentionally
obfuscated since it should mean a CONNECTION_OK PGconn for
ordinary users and "after establishing the underlying connection"
for advanced users who looks into the code.

> ----------------------------
> Another odd piece is:
>
> https://www.postgresql.org/docs/10/static/libpq-connect.html#LIBPQ-MULTIPLE-HOSTS
> "The value for host is ignored unless the authentication method requires
> it, in which case it will be used as the host name."
>
> There is no coverage of "1 host, many hostaddr; n host, n hostaddr; and n
> host, m hostaddr" combinations - namely which are valid and how the NxM
> combination would even work if it is even allowed.

In the multi-host case, host and hostaddr *must* have the same
number of elements. Otherwise you get an error.

> psql: could not match 2 host names to 1 hostaddr values

It is described in the same page as below.

| In the Keyword/Value format, the host, hostaddr, and port options
| accept a comma-separated list of values. The same number of
| elements must be given in each option, such that e.g. the first
| hostaddr corresponds to the first host name, the second hostaddr
| corresponds to the second host name, and so forth. As an
| exception, if only one port is specified, it applies to all the
| hosts.

> Then, if we do connect using hostaddr, and authenticate with host, should
> we indicate that fact in the PQhost output or not?

We don't need to identify the case. Authentication is always
performed using the return value of PQhost. Connection is made to
hostaddr if any, otherwise to host.

> Potentially PQhost would be defined to return an actual hostname if it can
> figure one out - even if the active connection was made using hostaddr. My
> take is that PQhost is meant to be user-informative rather than technical,
> and should ever only return the single most appropriate value it can
> compute. Just need to decide on the logic for determining "appropriate"
> then document and implement it.

hostaddr is defined that "it can be specified in order to get rid
of host name resolution". So generating host from hostaddr is out
of the intention. The description for hostaddr in the page is;

| Using hostaddr instead of host allows the application to avoid a
| host name look-up, which might be important in applications with
| time constraints.

And the value of host should be used for authentication
unmodified since it is the user's intention.

> -----------------------------
> Also at: https://www.postgresql.org/docs/10/static/libpq-status.html
> "The following functions return parameter values established at connection.
> These values are fixed for the life of the connection. If a multi-host
> connection string is used, the values of PQhost, PQport, and PQpass can
> change if a new connection is established using the same PGconn object.
> Other values are fixed for the lifetime of the PGconn object."
>
> Maybe something like:
>
> The following functions return parameters values established at connection
> and, except for the multi-valued fail-over accessors PQhost and PQport, as
> well as PQpass, cannot change during the lifetime of the PGconn object.

It is not directly related to fail-over (if the word menas
reconnection to the next available server after an involuntary
disconnection).

Putting that aside, I'm not sure it makes any difference..

> PQpass is a bit odd here given its not multi-valued...not sure what if
> anything to make of that.

password can vary with host and port using .pgpass.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2018-03-27 09:23:59 Re: PostgreSQL crashes with SIGSEGV
Previous Message Simon Riggs 2018-03-27 09:19:37 Re: [HACKERS] logical decoding of two-phase transactions