Re: Rare SSL failures on eelpout

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rare SSL failures on eelpout
Date: 2019-03-05 17:07:58
Message-ID: 6920.1551805678@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> You can see that poll() already knew the other end had closed the
> socket. Since this is clearly timing... let's see, yeah, I can make
> it fail every time by adding sleep(1) before the comment "Send the
> startup packet.". I assume that'll work on any Linux machine?

Great idea, but no cigar --- doesn't do anything for me except make
the ssl test really slow. (I tried it on RHEL6 and Fedora 28 and, just
for luck, current macOS.) What this seems to prove is that the thing
that's different about eelpout is the particular kernel it's running,
and that that kernel has some weird timing behavior in this situation.

I've also been experimenting with reducing libpq's SO_SNDBUF setting
on the socket, with more or less the same idea of making the sending
of the startup packet slower. No joy there either.

Annoying. I'd be happier about writing code to fix this if I could
reproduce it :-(

regards, tom lane

PS: but now I'm wondering about trying other non-Linux kernels.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shawn Debnath 2019-03-05 17:15:57 Re: Refactoring the checkpointer's fsync request queue
Previous Message Corey Huinker 2019-03-05 17:01:45 Re: Re: \describe*