Skip site navigation (1) Skip section navigation (2)

Backend doesn't catch the next command, after SIGUSR2

From: Patrick Samson <p_samson(at)yahoo(dot)com>
To: pgsql-cygwin(at)postgresql(dot)org
Subject: Backend doesn't catch the next command, after SIGUSR2
Date: 2004-03-09 15:25:22
Message-ID: 20040309152522.39029.qmail@web60303.mail.yahoo.com (view raw or flat)
Thread:
Lists: pgsql-cygwin
If I run a test script enough time, it eventually
freezes in this deadlock situation:

The client sends a command to a backend and waits
for an answer. It will wait forever because the
backend
is not aware of the arrival of the request and waits
for a next command.

What happens in the loop is:
 SIInsertDataEntry: table is 70% full,
 signaling postmaster

 In reaction, the postmaster sends to its children:
 SignalChildren: sending signal 31 to process <pid>

Most of the time, it works. But at an unpredictable
iteration, it freezes.

This problem appeared first in a replication
machinery, so I reduced the number of components
involved, to get a simpler test case:
A pgtcl script, running a loop with:
 create table from another-table
 copy table to file
 drop table

The 'create table' regularly fires the '70% full'
event, and at some point, the 'copy' never gets
answered.

I attached these files:
- test.tcl: the script to run.
  Change these values to meet your context:

 set srctable pgr_qryengine_log
 set dbname euronetUsers

  The source table can be anything empty.
  In my case, it's:
CREATE TABLE public.pgr_qryengine_log
(
  pgr_sid int4 NOT NULL,
  tablename varchar(50),
  pgr_gfid int8 NOT NULL,
  pgr_grid int8 NOT NULL,
  pgr_optype varchar(2),
  pgr_when timestamp,
  pgr_username varchar(30),
  qry_result text
) WITH OIDS;

- postmaster-ok.log
 The traces of a successful iteration.
- postmaster-ko.log
 The traces of the forever waiting iteration.
 EOF is received on a ctrl/c on the client side.

Comparison of the traces shows that the signals
are processed, but the backend doesn't start a
StartTransactionCommand for the expected 'copy'.

I don't know the exact conditions for the freeze to
arise. I just noticed that chances are higher if
there is a lot of postgres.exe processes alive.
I could run 10000 runs without any extra backends.
So I opened a pgAdmin III session to have many
connexions (on multiple db, with different accounts).
With 7 to 10 processes, I reached the freeze at
3392, 2027, 6729, 272, 1871 runs.

I tried to strace the postmaster, but never managed
to have the problem. I guess strace slow down the
system too much.
I just have a strace of a correct iteration.

Done on:
- postgres 7.3.5, W2000 SP2, cygwin 1.5.5-1
- postgres 7.3.5, NT SP6, cygwin 1.5.7-1

I can't tell if the source of the problem is in
cygwin or in postgres, so I post in the two lists.

Would be helpful if anybody can reproduce the
problem, or provide advices to progress on the
debugging work.

Patrick



__________________________________
Do you Yahoo!?
Yahoo! Search - Find what youre looking for faster
http://search.yahoo.com

Attachment: postmaster-ko.log
Description: application/octet-stream (3.2 KB)
Attachment: postmaster-ok.log
Description: application/octet-stream (4.8 KB)
Attachment: test.tcl
Description: application/octet-stream (1.3 KB)

Responses

pgsql-cygwin by date

Next:From: S. L.Date: 2004-03-10 06:40:31
Subject: Re: Postgres - pg_hba.conf entry for host nof found
Previous:From: Jones KhooDate: 2004-03-05 18:34:04
Subject: about downloading separate files

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group