Re: How to cripple a postgres server

From: Stephen Robert Norris <srn(at)commsecure(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: How to cripple a postgres server
Date: 2002-05-28 23:27:19
Message-ID: 1022628439.25604.2.camel@chinstrap
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, 2002-05-29 at 09:08, Tom Lane wrote:
> Stephen Robert Norris <srn(at)commsecure(dot)com(dot)au> writes:
> > I've already strace'ed the idle backend, and I can see the SIGUSR2 being
> > delivered just before everything goes bad.
>
> >> Yes, but what happens after that?
>
> > The strace stops until I manually kill the connecting process - the
> > machine stops in general until then (vmstat 1 stops producing output,
> > shells stop responding ...). So who knows what happens :(
>
> Hmm, I hadn't quite understood that you were complaining of a
> system-wide lockup and not just Postgres getting wedged. I think the
> chances are very good that this *is* a kernel bug. In any case, no
> self-respecting kernel hacker would be happy with the notion that
> a completely unprivileged user program can lock up the whole machine.
> So even if Postgres has got a problem, the kernel is clearly failing
> to defend itself adequately.
>
> Are you able to reproduce the problem with fewer than 800 backends?
> How about if you try it on a smaller machine?

Yep, on a PIII-800 with 256MB I can do it with fewer backends (I forget
how many) and only a few vacuums. It's much easier, basically, but
there's much less CPU on that machine. It also locks the machine up for
several minutes...

> Another thing that would be entertaining to try is other ways of
> releasing 800 queries at once. For example, on connection 1 do
> BEGIN; LOCK TABLE foo;
> then issue a "SELECT COUNT(*) FROM foo" on each other connection,
> and finally COMMIT on connection 1. If that creates similar misbehavior
> then I think the SI-overrun mechanism is probably not to be blamed.
>
> > ... Sometimes, the
> > SIGUSR2 does just create a very brief load spike (vmstat shows >500
> > processes on the run queue, but the next second everything is back to
> > normal and no unusual amount of CPU is consumed).
>
> That's the behavior I'd expect. We need to figure out what's different
> between that case and the cases where it locks up.
>
> regards, tom lane

Yeah. I'll try your suggestion above and report back.

Stephen

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2002-05-29 00:04:05 Re: Invalid length of startup packet
Previous Message Tom Lane 2002-05-28 23:08:49 Re: How to cripple a postgres server