Re: a heavy duty operation on an "unused" table kills my server

From: Eduardo Piombino <drakorg(at)gmail(dot)com>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: a heavy duty operation on an "unused" table kills my server
Date: 2010-01-13 17:54:36
Message-ID: e24c1d9d1001130954y3553f145raf2a0a367b8a5397@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> OK, I'm not entirely sure this table is not still locking something
> else. If you make a copy by doing something like:
>
> select * into test_table from a;
>
> and then alter test_table do you still get the same problems? If so,
> then it is an IO issue, most likely. If not, then there is some
> client connection still referencing this table or something and that
> could cause this type of behaviour as well.
>

I can guarantee you that the table is not being referenced by any other
thread, table or process, and that it is totally unrelated to everything
else in the system.

Its just a plain table, with 1.8 million records, that no thread knows it
exists. It has no foreign keys that would allow thinking of a possible
"lock" on the parent table, nor it is being referenced by any other table in
the model. It has no triggers associated, and no indexes. It could very well
even be on another database on the same physical server, and still do the
same damage. I did not try this, but I'm pretty sure of the outcome. I
would'nt like to bring the server down just to prove this, but I will do it
if I find it necessary.

The only things that are common to this table and other tables in the
system, as I see are:
RAM, IO, and CPU, at a very low level. One of these is being stressed out by
the thread executing the ALTER, and the other threads (not just pgsql
application threads, but system processes in general) suffer from the lack
of this resource. All the previous discussions tend to induce that the
resource we are talking about is IO.

The fact that the Task Manager does not come up, would also not be explained
by a lock in a client thread.
Besides all that, all the client queries are NO WAIT, thus any lock would
just return immediately, and no retry would be done until the response gets
back to the user and the user confirms it. In that case, all the errors
presented to the final users would be "The element is being processed some
other place", as my default handler to pgsql error code "55P03", instead of
the horrible "Operation timed out", that is what final users got during the
huge slowdown/downtime.

--
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Marlowe 2010-01-13 18:11:32 Re: a heavy duty operation on an "unused" table kills my server
Previous Message Robert Haas 2010-01-13 17:02:34 Re: [PERFORMANCE] work_mem vs temp files issue