Re: a heavy duty operation on an "unused" table kills my server

From: Eduardo Piombino <drakorg(at)gmail(dot)com>
To: Craig James <craig_james(at)emolecules(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: a heavy duty operation on an "unused" table kills my server
Date: 2010-01-13 05:47:30
Message-ID: e24c1d9d1001122147l1c0e2111kce25dc2c7f393087@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I'm sorry.

The server is a production server HP Proliant, I don't remember the exact
model, but the key features were:
4 cores, over 2GHz each (I'm sorry I don't remember the actual specs), I
think it had 16G of RAM (if that is possible?)
It has two 320G disks in RAID (mirrored).

I don't even have the emails with the specs here, but I can give you the
exact configuration by tomorrow.

Operating system: Windows 2003 server, with latest patches.
Postgres version: 8.2.4, with all defaults, except DateStyle and TimeZone.

At any given time, the server is on 0% CPU load, with peaks of 1%, 2%, max.
In normal operation.

I've been digging a little in the archives, and one thing that it helped me
come up with, is that I don't really remember seeing high CPU usage (fact
that surprised me, but i do remember seeing high IO activity). I'm sorry,
its pretty late here.
I know this single statement is enough to almost change everything I've just
asked.
Please try interpreting again my original mail, considering that when I said
"high CPU usage" It might very well be "high IO usage".

The final effect was that the server went non-responsive, for all matters,
not even the TaskManager would come up when i hit CTRL-ALT-DEL, and of
course, every client would suffer horrific (+20 secs) for the simplest
operations like SELECT NOW();

I've just made a little modification to my original questions, to extend to
the possibility of a IO usage issue, instead of just CPU.

*
*
>
> *Bottom line is, I can't seem to do any heavy processing on the database
> (or any operation that would require the server to enter into high CPU usage
> **or IO USAGE), and still expect the server to behave normally. Whatever
> heavy duty operation, DDL, DML, on whatever table (related, or unrelated),
> on whatever thread, would tear down my servers integrity.*
>
> * My question then is: is there a way to limit the CPU or IO USAGEassigned to a specific connection?
> *
> * I mean, I don't care if my ALTER TABLE takes 4 days instead of 4 hours.*
>
> * Something like:*
> * pg_set_max_cpu _or_io_usage(2/100);*
>

On Wed, Jan 13, 2010 at 2:14 AM, Craig James <craig_james(at)emolecules(dot)com>wrote:

> Eduardo Piombino wrote:
>
>> Hi list, I'm having a problem when dealing with operations that asks too
>> much CPU from the server.
>> The scenario is this:
>>
>
> A nice description below, but ... you give no information about your
> system: number of CPUs, disk types and configuration, how much memory, what
> have you changed in your Postgres configuration? And what operating system,
> what version of Postgres, etc., etc. The more information you give, the
> better the answer.
>
> If you're operating on a single disk with a tiny amount of memory, and old,
> misconfigured Postgres on a laptop computer, that's a whole different
> problem than if you're on a big sytem with 16 CPUs and a huge RAID 1+0 with
> battery-backed cache.
>

Craig
>
>
>
>> I have a multithreaded server, each thread with its own connection to the
>> database. Everything is working fine, actually great, actually
>> outstandingly, in normal operation.
>>
>> I've a table named "a" with 1.8 million records, and growing, but I'm ok
>> with it, at least for the moment. Maybe in the near future we will cut it
>> down, backup old data, and free it up. But this is not the issue, as I said,
>> everything is working great. I have a cpl of indexes to help some queries,
>> and that's it.
>>
>> Now my problem started when I tried to do some model refactoring on this
>> production table.
>>
>> First I tried a dumb approach.
>> I connected from pgadmin, opened a new session.
>> I tried an ALTER TABLE on this table just to turn a char(255) field into
>> char(250), and it locked up my system.
>>
>> No surprise, since I had many threads waiting for this alter table to
>> finish. What I did not foresee was that this alter table would take up so
>> much time. Ok, my fault, for not having calculated the time that it would
>> take the ALTER TABLE to complete.
>>
>> Now, with this experience, I tried a simple workaround.
>> Created an empty version of "a" named "a_empty", identical in every sense.
>> renamed "a" to "a_full", and "a_empty" to "a". This procedure costed me
>> like 0 seconds of downtime, and everything kept working smoothly. Maybe a
>> cpl of operations could have failed if they tried to write in the very
>> second that there was actually no table named "a", but since the operation
>> was transactional, the worst scenario was that if the operation should have
>> failed, the client application would just inform of the error and ask the
>> user for a retry. No big deal.
>>
>> Now, this table, that is totally unattached to the system in every way (no
>> one references this table, its like a dumpster for old records), is not
>> begin accessed by no other thread in the system, so an ALTER table on it, to
>> turn a char(255) to char(250), should have no effect on the system.
>>
>> So, with this in mind, I tried the ALTER TABLE this time on the "a_full"
>> (totally unrelated) table.
>> The system went non-responsive again, and this time it had nothing to do
>> with threads waiting for the alter table to complete. The pgAdmin GUI went
>> non-responsive, as well as the application's server GUI, whose threads kept
>> working on the background, but starting to take more and more time for every
>> clients request (up to 25 seconds, which are just ridiculous and completely
>> unacceptable in normal conditions).
>>
>> This resulted in my client applications to start disconnecting after their
>> operations failed due to timeout, and the system basically went down again,
>> from a users point of view.
>>
>> This time, since I saw no relation between my operation on a totally
>> unrelated table, and the server BIG slowdown, I blamed the servers memory.
>>
>> After some tests, I came up to the conclusion that any heavy duty
>> operation on any thread (ALTER TABLE on 1.8 million records tables, updates
>> on this table, or an infinite loop, just to make my point), would affect the
>> whole server.
>>
>> Bottom line is, I can't seem to do any heavy processing on the database
>> (or any operation that would require the server to enter into high CPU usage
>> *or IO USAGE*), and still expect the server to behave normally. Whatever
>> heavy duty operation, DDL, DML, on whatever table (related, or unrelated),
>> on whatever thread, would tear down my servers integrity.
>>
>> My question then is: is there a way to limit the CPU* or **IO USAGE* assigned
>> to a specific connection?
>> I mean, I don't care if my ALTER TABLE takes 4 days instead of 4 hours.
>>
>> Something like:
>> pg_set_max_cpu*_or_io_*usage(2/100);
>>
>> and rest assured that no matter what that thread is asking the database to
>> do, it just wont affect the other running threads. Obviosly, assuring that
>> the process itself does not involve any locking of the other threads.
>>
>> Is something like that possible?
>>
>> Thanks in advance,
>> Eduardo.
>>
>>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Craig Ringer 2010-01-13 06:02:02 Re: a heavy duty operation on an "unused" table kills my server
Previous Message Craig Ringer 2010-01-13 05:41:36 Re: a heavy duty operation on an "unused" table kills my server