Backend doesn't catch the next command, after SIGUSR2

From: Patrick Samson <p_samson(at)yahoo(dot)com>
To: pgsql-cygwin(at)postgresql(dot)org
Subject: Backend doesn't catch the next command, after SIGUSR2
Date: 2004-03-09 15:25:22
Message-ID: 20040309152522.39029.qmail@web60303.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-cygwin

If I run a test script enough time, it eventually
freezes in this deadlock situation:

The client sends a command to a backend and waits
for an answer. It will wait forever because the
backend
is not aware of the arrival of the request and waits
for a next command.

What happens in the loop is:
SIInsertDataEntry: table is 70% full,
signaling postmaster

In reaction, the postmaster sends to its children:
SignalChildren: sending signal 31 to process <pid>

Most of the time, it works. But at an unpredictable
iteration, it freezes.

This problem appeared first in a replication
machinery, so I reduced the number of components
involved, to get a simpler test case:
A pgtcl script, running a loop with:
create table from another-table
copy table to file
drop table

The 'create table' regularly fires the '70% full'
event, and at some point, the 'copy' never gets
answered.

I attached these files:
- test.tcl: the script to run.
Change these values to meet your context:

set srctable pgr_qryengine_log
set dbname euronetUsers

The source table can be anything empty.
In my case, it's:
CREATE TABLE public.pgr_qryengine_log
(
pgr_sid int4 NOT NULL,
tablename varchar(50),
pgr_gfid int8 NOT NULL,
pgr_grid int8 NOT NULL,
pgr_optype varchar(2),
pgr_when timestamp,
pgr_username varchar(30),
qry_result text
) WITH OIDS;

- postmaster-ok.log
The traces of a successful iteration.
- postmaster-ko.log
The traces of the forever waiting iteration.
EOF is received on a ctrl/c on the client side.

Comparison of the traces shows that the signals
are processed, but the backend doesn't start a
StartTransactionCommand for the expected 'copy'.

I don't know the exact conditions for the freeze to
arise. I just noticed that chances are higher if
there is a lot of postgres.exe processes alive.
I could run 10000 runs without any extra backends.
So I opened a pgAdmin III session to have many
connexions (on multiple db, with different accounts).
With 7 to 10 processes, I reached the freeze at
3392, 2027, 6729, 272, 1871 runs.

I tried to strace the postmaster, but never managed
to have the problem. I guess strace slow down the
system too much.
I just have a strace of a correct iteration.

Done on:
- postgres 7.3.5, W2000 SP2, cygwin 1.5.5-1
- postgres 7.3.5, NT SP6, cygwin 1.5.7-1

I can't tell if the source of the problem is in
cygwin or in postgres, so I post in the two lists.

Would be helpful if anybody can reproduce the
problem, or provide advices to progress on the
debugging work.

Patrick

__________________________________
Do you Yahoo!?
Yahoo! Search - Find what youre looking for faster
http://search.yahoo.com

Attachment Content-Type Size
test.tcl application/octet-stream 1.3 KB
postmaster-ok.log application/octet-stream 4.8 KB
postmaster-ko.log application/octet-stream 3.2 KB

Responses

Browse pgsql-cygwin by date

  From Date Subject
Next Message S. L. 2004-03-10 06:40:31 Re: Postgres - pg_hba.conf entry for host nof found
Previous Message Jones Khoo 2004-03-05 18:34:04 about downloading separate files