Re: emergency outage requiring database restart

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Oskari Saarenmaa <os(at)ohmu(dot)fi>
Cc: Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: emergency outage requiring database restart
Date: 2016-11-01 13:35:15
Message-ID: CAHyXU0xdigh6u6NbQ3k9THf6wT5Ds3KAFJTVb1AXu8Ev17+i3A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 31, 2016 at 10:32 AM, Oskari Saarenmaa <os(at)ohmu(dot)fi> wrote:
> 27.10.2016, 21:53, Merlin Moncure kirjoitti:
>>
>> As noted earlier, I was not able to reproduce the issue with
>> crashme.sh, which was:
>>
>> NUM_FORKS=16
>> do_parallel psql -p 5432 -c"select PushMarketSample('1740')"
>> castaging_test
>> do_parallel psql -p 5432 -c"select PushMarketSample('4400')"
>> castaging_test
>> do_parallel psql -p 5432 -c"select PushMarketSample('2160')"
>> castaging_test
>> do_parallel psql -p 5432 -c"select PushMarketSample('6680')"
>> castaging_test
>> <snip>
>>
>> (do_parallel is simple wrapper to executing the command in parallel up
>> to NUM_FORKS). This is on the same server and cluster as above.
>> This kind of suggests that either
>> A) there is some concurrent activity from another process that is
>> tripping the issue
>> or
>> B) there is something particular to the session invoking the function
>> that is participating in the problem. As the application is
>> structured, a single threaded node.js app is issuing the query that is
>> high traffic and long lived. It's still running in fact and I'm kind
>> of tempted to find some downtime to see if I can still reproduce via
>> the UI.
>
> Your production system's postgres backends probably have a lot more open
> files associated with them than the simple test case does. Since Postgres
> likes to keep files open as long as possible and only closes them when you
> need to free up fds to open new files, it's possible that your production
> backends have almost all allowed fds used when you execute your pl/sh
> function.
>
> If that's the case, the sqsh process that's executed may not have enough fds
> to do what it wanted to do and because of busted error handling could end up
> writing to fds that were opened by Postgres and point to $PGDATA files.

Does that apply? the mechanics are a sqsh function that basically does:
cat foo.sql | sqsh <args>

pipe redirection opens a new process, right?

merlin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mithun Cy 2016-11-01 13:43:35 Re: Patch: Implement failover on libpq connect level.
Previous Message Robert Haas 2016-11-01 13:31:17 Re: WAL consistency check facility