Re: Cygwin PostgreSQL Regression Test Problems (Revisited)

From: Jason Tishler <Jason(dot)Tishler(at)dothill(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hiroshi Inoue <Inoue(at)tpf(dot)co(dot)jp>, pgsql-ports(at)postgresql(dot)org
Subject: Re: Cygwin PostgreSQL Regression Test Problems (Revisited)
Date: 2001-04-02 17:19:17
Message-ID: 20010402131917.C798@dothill.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-ports

Tom,

On Sun, Apr 01, 2001 at 01:57:35PM -0400, Tom Lane wrote:
> Jason Tishler <Jason(dot)Tishler(at)dothill(dot)com> writes:
> > I'm glad that you agree. Please post to the list when the change is in
> > CVS and I will test that this solves the Cygwin regression test (i.e.,
> > psql) hangs.
>
> Done as of yesterday; should be in this morning's snapshot.

Thanks.

> > Actually, the blocking connect() change for Cygwin is obviated by the
> > pqWait() fix. So, I am now no longer recommending making the blocking
> > connect() change for Cygwin. Unless, you do so for other Unixes too.
>
> I made both changes in the hope that the blocking connect change would
> suppress your problem with connection-refused failures. If it does not,
> then we may as well reverse out the fe-connect.c change. Let me know.

With both changes or only the fe-connect.c one, psql does not hang and
displays the following error message when the connection is refused:

psql: connectDBStart() -- connect() failed: Connection refused
Is the postmaster running locally
and accepting connections on Unix socket '/tmp/.s.PGSQL.65432'?

With only the fe-misc.c change, psql does not hang and displays the
following error message when the connection is refused:

psql: PQconnectPoll() -- connect() failed: error 10061
Is the postmaster running locally
and accepting connections on Unix socket '/tmp/.s.PGSQL.65432'?

In both cases there are no hangs, just the error messages are different.
Unfortunately, for the non-blocking case the error message is cryptic.
I tried tracking down error "10061" which comes from getsockopt(), but
I was unsuccessful. Is there any way to improve the readability of this
error message?

Also, the blocking connect change did *not* fix the connection refused
(spurious) regression test failures. So this change should probably be
backed out.

> > I'm wondering whether it makes sense to add a simple connection retry
> > policy as suggested above by Hiroshi?
>
> I do not think it is appropriate for libpq to do that.

When I made my suggestion above, I was concerned that may be libpq was not
the right layer to be implementing connection policies and that possibly
psql was the better place.

> For one thing, where would you stop --- why exactly two tries?

This was another one of my concerns too.

> > 2. Change the backlog parameter to listen() in src/backend/libpq/pqcomm.c
> > to a number that will "ensure" that the parallel_schedule version of the
> > regression test does not generate connection refused conditions. Note
> > that I'm not even sure this will really work on all (or any) platforms.
>
> We already use SOMAXCONN which is supposed to be defined by the system
> as the maximum allowed queue depth. If Cygwin fails to define it, or
> defines it as something less than it should be, then we might consider
> installing a Cygwin-specific hack to redefine SOMAXCONN.

Cygwin defines SOMAXCONN to be 5. However, winsock.h defines it to be 5
while winsock2.h defines it to be 0x7fffffff. So, I'm not sure what it
the real Cygwin (i.e., Windows) maximum.

> However Hiroshi says later that he already tried this.

Even if it worked, this would have just pushed the problem instead of
really fixing it.

> I'm inclined to think
> that Cygwin simply has a problem with servicing concurrent connection
> requests, perhaps even before the alleged SOMAXCONN value is reached.

You meant Windows. Right? :,)

In summary, I feel that the fe-connect.c change should be backed out so
that Cygwin will be consistent with other UNIXes. I also hope that the
non-blocking connection failure message can be made more readable and
that make check will not generate spurious failure messages under Cygwin
on slow machines.

Thanks,
Jason

--
Jason Tishler
Director, Software Engineering Phone: +1 (732) 264-8770 x235
Dot Hill Systems Corp. Fax: +1 (732) 264-8798
82 Bethany Road, Suite 7 Email: Jason(dot)Tishler(at)dothill(dot)com
Hazlet, NJ 07730 USA WWW: http://www.dothill.com

In response to

Responses

Browse pgsql-ports by date

  From Date Subject
Next Message Tom Lane 2001-04-02 17:44:14 Re: Cygwin PostgreSQL Regression Test Problems (Revisited)
Previous Message Tom Lane 2001-04-02 17:14:51 Re: patch for minor Win32 makefile bug