Re: Streaming replication and non-blocking I/O

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and non-blocking I/O
Date: 2010-01-13 10:27:10
Message-ID: 4B4D9FFE.6090401@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Fujii Masao wrote:
> Done. Currently there is no new libpq function for replication. The
> walreceiver uses only existing functions like PQconnectdb, PQexec,
> PQgetCopyData, etc.
>
> git://git.postgresql.org/git/users/fujii/postgres.git
> branch: replication

Thanks!

I'm afraid we haven't quite nailed the select/poll issue yet. You copied
pq_wait() from the libpq pqSocketCheck(), but there's one big difference
between the backend and the frontend: the frontend always puts the
connection to non-blocking mode, while the backend uses blocking mode.
At least with SSL, I think it's possible for pq_wait() to return false
positives, if the SSL layer decides to renegotiate the connection
causing data to flow in the other direction in the underlying TCP
connection. A false positive would lead cause walsender to block
indefinitely on the pq_getbyte() call.

I don't even want to think about the changes required to put the backend
socket to non-blocking mode, I don't know that code well enough. Maybe
we could temporarily put it to non-blocking mode, read to see if there's
any data available, and put it back to blocking mode. But even then I
think we'd need to modify at least secure_read() to work correctly with
SSL in non-blocking mode.

Another idea is to use poll() to check for POLLHUP, on those platforms
that have poll(). AFAICS there is no equivalent for that in select(), so
for platforms that don't have poll() we would have to simply ignore the
issue or write some other platform-specific work-around (Windows
WSAEventSelect() seems to have a FD_CLOSE event for that). That would be
a quite localized change.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tim Bunce 2010-01-13 11:30:13 Re: Feature patch 1 for plperl [PATCH]
Previous Message Zdenek Kotala 2010-01-13 10:11:27 Deadlock in vacuum (check fails)