Re: GSSENC'ed connection stalls while reconnection attempts.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: GSSENC'ed connection stalls while reconnection attempts.
Date: 2020-07-10 16:01:10
Message-ID: 2101368.1594396870@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> writes:
> If psql connected using GSSAPI auth and server restarted, reconnection
> sequence stalls and won't return.

Yeah, reproduced here. (I wonder if there's any reasonable way to
exercise this scenario in src/test/kerberos/.)

> I found that psql(libpq) sends startup packet via gss
> encryption. conn->gssenc should be reset when encryption state is
> freed.

Actually, it looks to me like the GSS support was wedged in by somebody
who was paying no attention to how SSL is managed, or else we forgot
to pay attention to GSS the last time we rearranged SSL support. It's
completely broken for the multiple-host-addresses scenario as well,
because try_gss is being set and cleared in the wrong places altogether.
conn->gcred is not being handled correctly either I think --- we need
to make sure that it's dropped in pqDropConnection.

The attached patch makes this all act more like the way SSL is handled,
and for me it resolves the reconnection problem.

> The reason that psql doesn't notice the error is pqPacketSend returns
> STATUS_OK when write error occurred. That behavior contradicts to the
> comment of the function. The function is used only while making
> connection so it's ok to error out immediately by write failure. I
> think other usage of pqFlush while making a connection should report
> write failure the same way.

I'm disinclined to mess with that, because (a) I don't think it's the
actual source of the problem, and (b) it would affect way more than
just GSS mode.

> Finally, It's user-friendly if psql shows error message for error on
> reset attempts. (This perhaps should be arguable.)

Meh, that's changing fairly longstanding behavior that I don't think
we've had many complaints about.

regards, tom lane

Attachment Content-Type Size
fix-bogus-GSS-connection-management-1.patch text/x-diff 2.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-07-10 16:11:15 Re: expose parallel leader in CSV and log_line_prefix
Previous Message Julien Rouhaud 2020-07-10 15:13:26 Re: expose parallel leader in CSV and log_line_prefix