Re: Reconnect a single connection used by multiple threads in embedded SQL in C application causes error.

From: Noah Misch <noah(at)leadboat(dot)com>
To: "egashira(dot)yusuke(at)fujitsu(dot)com" <egashira(dot)yusuke(at)fujitsu(dot)com>
Cc: 'Kyotaro Horiguchi' <horikyota(dot)ntt(at)gmail(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Reconnect a single connection used by multiple threads in embedded SQL in C application causes error.
Date: 2022-03-14 07:08:07
Message-ID: 20220314070807.GA2317912@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Feb 25, 2022 at 11:29:55AM +0000, egashira(dot)yusuke(at)fujitsu(dot)com wrote:
> > Would you like to propose a patch?
>
> Sure. I will work on creating a document patch.
>
> However, I found another problem while testing to make sure that the use cases were valid.
>
> > - Each thread does its own CONNECT, DISCONNECT, and other commands.
> > Each thread has its own default connection. No sharing at all."
>
> In that case, if I execute CONNECT with same connection-string on each threads without connection-name, the later CONNECT was skipped.
> Then, ECPGdebug shows following message.
>
> ECPGconnect: connection identifier (null) is already in use
>
> This seems to indicate that CONNECT need to specify connection-name even if we use the default connection in each thread.
> This also seems to me to be an undesirable behavior in multiple thread.

Agreed. It might have been a nice feature for every thread to have its own
nameless connection.

> Or do we need to name all connections if we use multiple connections, even in multiple thread?

Yes.

On Fri, Mar 04, 2022 at 11:49:39AM +0000, egashira(dot)yusuke(at)fujitsu(dot)com wrote:
> > It looks like too verbose. Couldn't it be summarized as something
> > like this?
> >
> > > Current connection is managed in a per-thread manner. For the first
> > > use on a thread, it is the most recently used one in the process.

There's no behavior specific to "the first use on a thread", and "most
recently used" is not a factor. Avoid those phrases. (In a thread that has
never opened a connection, each command that retrieves the current connection
will retrieve the non-thread-specific current connection.)

> Yes.
> I thought that I needed a detailed explanation of the current connection for the note below, but this certainly seems to be summable.

For this particular paragraph, I liked your v1's level of verbosity more than
this compact version.

> > This doesn't look like a API documentation but a procedual
> > documentation. IMO there's no particular reason that we should
> > officially concretely suggest how mulitple connectinos are managed on
> > somebody's application. And part of the section is already in the
> > documentation.

I'm okay with your removal of that paragraph. As a general matter of
documentation strategy, it's valid for our manual to contain both API docs and
higher-level advice about known strategies for combining the APIs into an
effective program.

> > | <para>
> > | If your application uses multiple threads of execution, they cannot share a
> > | connection concurrently. You must either explicitly control access to the connection
> > | (using mutexes) or use a connection for each thread.
> > | </para>
> >
> > So, it would boil down to the caution about disconnection.
> >
> > > Note that connections are shared among all threads. Be careful not
> > > to DISCONNECT a connection that is to be used in another thread.
>
> I wanted to make a document that could avoid the use cases that I reported, so I tried to write a note according to the occurrence conditions.
> However, this was also too verbose, so I rewrote the note as you suggested.
> And since there is no mention of the current connection, I changed the section to add it to "36.2.3. Closing a Connection".
>
> I think I might be better to add "In the worst case, this may cause an application crash." to this note so that users can avoid the problem more strongly.
> However, this sentence can also be read as claiming that there is an implementation problem.
> Should not I add this sentence?

It's okay to admit implementation problems. I might write "Behavior is
undefined if the current thread is not the thread that opened the connection."
Referring to undefined behavior is a customary way for API docs to imply that
a crash is possible. If one wanted to be even more precise, "Behavior is
undefined if the connection is the thread-specific current connection of any
other thread." However, the more-precise wording is harder to understand.
Also, being more precise now could limit our freedom to change behavior later.

> --- a/doc/src/sgml/ecpg.sgml
> +++ b/doc/src/sgml/ecpg.sgml
> @@ -222,10 +222,15 @@ EXEC SQL CONNECT TO <replaceable>target</replaceable> <optional>AS <replaceable>
> <para>
> The <replaceable>connection-name</replaceable> is used to handle
> multiple connections in one program. It can be omitted if a
> - program uses only one connection. The most recently opened
> - connection becomes the current connection, which is used by default
> - when an SQL statement is to be executed (see later in this
> - chapter).
> + program uses only one connection.
> + </para>
> +
> + <para>
> + The most recently opened connection becomes the current connection.

This is true. Disconnecting also changes the current connection, to another
one of the open connections. Please document that somewhere appropriate.

> + The current connection is managed in a per-thread manner. For the
> + first use on a thread, it is the most recently opened one in the
> + process. The current connection is used by default when an SQL
> + statement is to be executed (see later in this chapter).
> </para>
>
> <para>
> @@ -276,8 +281,8 @@ EXEC SQL CONNECT TO :target USER :user USING :passwd;
>
> <para>
> SQL statements in embedded SQL programs are by default executed on
> - the current connection, that is, the most recently opened one. If
> - an application needs to manage multiple connections, then there are
> + the current connection, that is, the most recently opened one in each thread.

Likewise, this is not accurate if a disconnect of the current connection
happened after the most recent open. You could simply delete this claim,
which would avoid defining the current connection in two places.

> + If an application needs to manage multiple connections, then there are
> three ways to handle this.
> </para>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2022-03-14 07:51:10 Re: BUG #17409: Unable to alter data type of clustered column which is referenced by foreign key
Previous Message David G. Johnston 2022-03-14 00:15:02 Docs for psql regarding default database name are incorrect