Re: BUG: possible busy loop when connection is closed

From: Hannu Krosing <hannu(at)tm(dot)ee>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: BUG: possible busy loop when connection is closed
Date: 2004-09-23 08:12:48
Message-ID: 1095927167.3552.11.camel@fuji.krosing.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On N, 2004-09-23 at 06:41, Tom Lane wrote:
> Hannu Krosing <hannu(at)tm(dot)ee> writes:
> > We were bitten by the following bug a few times, when our server tried
> > to reestablish connections under bad network conditions:
> >
> > if connection is closed while trying to get response to SSL setup packet
> > (i.e. conn->status is CONNECTION_SSL_STARTUP), we get a busy loop, as
> > line 1035 in 8.0.0.beta2:
> >
> > if (pqWaitTimed(1, 0, conn, finish_time) {
> >
> > tells that there is data to read (returns 0) while actually it is error
> > (POLLERR & POLLHUP) and not POLLIN returned from poll() and

at least on linux it does, we got the following trace:
poll([{fd=11, events=POLLIN|POLLERR, revents=POLLIN|POLLERR|POLLHUP}],
1, -1) = 1
recv(11, "", 1, 0) = 0
poll([{fd=11, events=POLLIN|POLLERR, revents=POLLIN|POLLERR|POLLHUP}],
1, -1) = 1
recv(11, "", 1, 0) = 0
poll([{fd=11, events=POLLIN|POLLERR, revents=POLLIN|POLLERR|POLLHUP}],
1, -1) = 1
recv(11, "", 1, 0) = 0
which seems to say that poll came back on POLLHUP, and as there is just
one fd, it must mean that this one fd is closed. But this may be
non-portable

> This is intentional: the idea is that we should go ahead and do the read
> (or write), which will detect the error condition on the socket. poll()
> in itself doesn't give enough information to determine what the error
> condition is, so it's not appropriate to fail here.
>
> > after that the check on line 1462:
> >
> > if (nread == 0)
> > /* caller failed to wait for data */
> > return PGRES_POLLING_READING;
> >
> > resumes the busy loop
>
> This seems to me to be the bug. pqReadData jumps through hoops to
> determine whether a zero-length read means EOF or not, and I think we
> need to expend some effort to determine that here too.
>
> One possibility is to forget the direct call to recv() and use
> pqReadData --- since conn->ssl isn't set yet, and we aren't expecting
> the server to send more than one byte, this should in theory be safe.

I was scared by the comment before recv(...,1,0) which said we must be
careful not to read more than 1 byte

Is it impossible to not accidentally get more than one and screw up SSL
handshake ?

-------------
Hannu

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Oliver Jowett 2004-09-23 08:17:46 Re: SQL-Invoked Procedures for 8.1
Previous Message Magnus Hagander 2004-09-23 07:57:55 Re: SQL-Invoked Procedures for 8.1