Re: psql query gets stuck indefinitely

From: tamanna madaan <tamanna(dot)madaan(at)globallogic(dot)com>
To: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: psql query gets stuck indefinitely
Date: 2011-12-01 11:57:50
Message-ID: CAD4qJ_+SOGALGZemEeyXZALuCqThngRMQiwf6c4PESK=L4LDFw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Craig
I am able to reproduce the issue now . I have postgres-8.1.2 installed in
cluster setup.

I have started the below query from one system let say A to system B in
cluster .
psql -U<dbname> -h<ip of system B> -c "select sleep(300);"

while this command is going on , system B is stopped abruptly by taking out
the power cable from it . This caused the above query on system A to hang.
This is still showing in 'ps -eaf' output after one day. I think the tcp
keepalive mechanism which has been set at system level should have closed
this connection. But it didnt . Following keepalive values have been set on
system A :

net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 7200
Why system level keepalive is not working in this case. Well, I learnt
, from the link you have provided, that programs must request keepalive
control for their sockets using the setsockopt interface. I wonder if
postgres8.1.2 supports / request for system level keepalive control ?? If
not, then which release/version of postgres supports that ??

Thanks...
Tamanna

On Tue, Nov 29, 2011 at 4:56 PM, tamanna madaan <
tamanna(dot)madaan(at)globallogic(dot)com> wrote:

> well, one question : Is tcp-keep-alive enabled by default in postgres-8.1.2 .
>
> I am using postgres on linux platform .
>
>
>
> On Tue, Nov 29, 2011 at 8:51 AM, tamanna madaan <
> tamanna(dot)madaan(at)globallogic(dot)com> wrote:
>
>> Hi Craig
>>
>> Thanks for your reply . But unfortunately I dont have that process
>> running right now. I have already killed that process . But I have seen
>> this problem sometimes on my setup.
>> It generally happens when the remote system is going slow for some reason
>> (CPU utilization high etc.) . But whatever is the reason , I would assume
>> that the query should return with some error or so
>> in case the system, the query is running on , is rebooted . But it
>> doesn't return and remain stuck. Moreover, the same query sometimes hangs
>> even if it is run on local postgres database so I dont think
>> network issues have any role in that . Please help.
>>
>> Thanks....
>>
>> Regards
>> Tamanna
>>
>>
>> On Tue, Nov 29, 2011 at 7:58 AM, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>wrote:
>>
>>> On 11/28/2011 05:30 PM, tamanna madaan wrote:
>>>
>>>> Hi All
>>>> I have postgres installed in cluster setup. My system has a script
>>>> which executes the below query on remote system in cluster.
>>>> psql -t -q -Uslon -h<hostip> -d<dbname> -c"select 1;"
>>>> But somehow this query got stuck. It didnt return even after the remote
>>>> system( on which this query was supposed to execute) is rebooted . What
>>>> could be the reason ??
>>>>
>>>
>>> I relised just after sending my last message:
>>>
>>> You should use ps to find out what exactly psql is doing and which
>>> system call it's blocked in in the kernel (if it's waiting on a syscall).
>>> As you didn't mention your OS I'll assume you're on Linux, where you'd use:
>>>
>>> ps -C psql -o wchan:80=
>>>
>>> or
>>>
>>> ps -p 1234 -o wchan:80=
>>>
>>> ... where "1234" is the pid of the stuck psql process. In a psql waiting
>>> for command line input I see it blocked in the kernel routine "n_tty_read"
>>> for example.
>>>
>>>
>>> If you really want to know what it's doing you can also attach gdb and
>>> get a backtrace to see what code it's paused in inside psql:
>>>
>>> gdb -q -p 1234 <<__END__
>>> bt
>>> q
>>> __END__
>>>
>>> If you get a message about "missing debuginfos", lots of lines reading
>>> "no debugging symbols found" or lots of lines ending in "?? ()" then you
>>> need to install debug symbols. How to do that depends on your OS/distro so
>>> I won't go into that; it's documented on the PostgreSQL wiki under "how to
>>> get a stack trace" but you probably won't want to bother if this is just
>>> for curiosity's sake.
>>>
>>> You're looking for output that looks like:
>>>
>>> #1 0x000000369d22a131 in rl_getc () from /lib64/libreadline.so.6
>>> #2 0x000000369d22a8e9 in rl_read_key () from /lib64/libreadline.so.6
>>> #3 0x000000369d215b11 in readline_internal_char () from
>>> /lib64/libreadline.so.6
>>> #4 0x000000369d216065 in readline () from /lib64/libreadline.so.6
>>>
>>> ... etc ...
>>>
>>>
>>> --
>>> Craig Ringer
>>>
>>
>>
>>
>> --
>> Tamanna Madaan | Associate Consultant | GlobalLogic Inc.
>> Leaders in Software R&D Services
>> ARGENTINA | CHILE | CHINA | GERMANY | INDIA | ISRAEL | UKRAINE | UK | USA
>>
>> Office: +0-120-406-2000 x 2971
>>
>> www.globallogic.com
>>
>>
>>
>
>
> --
> Tamanna Madaan | Associate Consultant | GlobalLogic Inc.
> Leaders in Software R&D Services
> ARGENTINA | CHILE | CHINA | GERMANY | INDIA | ISRAEL | UKRAINE | UK | USA
>
> Office: +0-120-406-2000 x 2971
>
> www.globallogic.com
>
>
>

--
Tamanna Madaan | Associate Consultant | GlobalLogic Inc.
Leaders in Software R&D Services
ARGENTINA | CHILE | CHINA | GERMANY | INDIA | ISRAEL | UKRAINE | UK | USA

Office: +0-120-406-2000 x 2971

www.globallogic.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Amitabh Kant 2011-12-01 12:16:38 Conditional left join
Previous Message Gracjan Polak 2011-12-01 11:50:20 Problem with binary data transfer format of TEXT in 8.4