Excellent, lots of useful information in your message.
I will follow your advices, and keep you posted on any progress. I have yet
to confirm you with some technical details of my setup, but I'm pretty sure
you hit the nail in any case.
One last question, this IO issue I'm facing, do you think it is just a
matter of RAID configuration speed, or a matter of queue gluttony (and not
leaving time for other processes to get into the IO queue in a reasonable
Because if it was just a matter of speed, ok, with my actual RAID
configuration lets say it takes 10 minutes to process the ALTER TABLE
(leaving no space to other IOs until the ALTER TABLE is done), lets say then
i put the fastest possible RAID setup, or even remove RAID for the sake of
speed, and it completes in lets say again, 10 seconds (an unreal
assumption). But if my table now grows 60 times, I would be facing the very
same problem again, even with the best RAID configuration.
The problem would seem to be in the way the OS (or hardware, or someone
else, or all of them) is/are inserting the IO requests into the queue.
What can I do to control the order in which these IO requests are finally
entered into the queue?
I mean .. what i would like to obtain is:
Considering the ALTER TABLE as a sequence of 100.000 READ/WRITE OPERATIONS
Considering the SELECT * FROM xxx as a sequence of 100 READ OPERATIONS
(totally unrelated in disk)
First i run the ALTER TABLE on a thread...
Lets say by the time it generates 1.000 READ/WRITE OPERATIONS, the other
thread starts with the SELECT * FROM xxx ...
I would expect the IO system to give chance to the those 100 READ OPERATIONS
to execute immediately (with no need to wait for the remaining 990.000
READ/WRITE OPERATIONS finish), that is, to enter the queue at *almost* the
very same moment the IO request were issued.
If I can not guarantee that, I'm kinda doomed, because the largest the
amount of IO operations requested by a "heavy duty operation", the longest
it will take any other thread to start doing anything.
What cards do I have to manipulate the order the IO requests are entered
into the "queue"?
Can I disable this queue?
Should I turn disk's IO operation caches off?
Not use some specific disk/RAID vendor, for instance?
I think I have some serious reading to do on this matter, google will help
of course, but as always, every advice for small it may seem, will be very
Nonetheless, thanks a lot for all the light you already brought me on this
I really appreciate it.
On Wed, Jan 13, 2010 at 3:02 AM, Craig Ringer
> On 13/01/2010 1:47 PM, Eduardo Piombino wrote:
>> I'm sorry.
>> The server is a production server HP Proliant, I don't remember the
>> exact model, but the key features were:
>> 4 cores, over 2GHz each (I'm sorry I don't remember the actual specs), I
>> think it had 16G of RAM (if that is possible?)
>> It has two 320G disks in RAID (mirrored).
> Plain 'ol SATA disks in RAID-1?
> Hardware RAID (and if so, controller model)? With battery backup? Write
> cache on or off?
> Or software RAID? If so, Windows build-in sw raid, or some vendor's
> fakeraid (Highpoint, Promise, Adaptec, etc) ?
> Anyway, with two disks in RAID-1 I'm not surprised you're seeing some
> performance issues with heavy writes, especially since it seems unlikely
> that you have a BBU hardware RAID controller. In RAID-1 a write must hit
> both disks, so a 1Mb write effectively costs twice as much as a 1Mb read.
> Since many controllers try for high throughput (because it looks good in
> benchmarks) at the expense of latency they also tend to try to batch writes
> into long blocks, which keeps the disks busy in extended bursts. That
> slaughters read latencies.
> I had this sort of issue with a 3Ware 8500-8, and landed up modifying and
> recompiling the driver to reduce its built-in queue depth. I also increased
> readahead. It was still pretty awful as I was working with RAID 5 on SATA
> disks, but it made a big difference and more importantly meant that my Linux
> server was able to honour `ionice' priorities and feed more important
> requests to the controller first.
> On windows, I really don't know what to do about it beyond getting a better
> I/O subsystem. Google may help - look into I/O priorities, queue depths,
> reducing read latencies, etc.
> I don't even have the emails with the specs here, but I can give you the
>> exact configuration by tomorrow.
>> Operating system: Windows 2003 server, with latest patches.
>> Postgres version: 8.2.4, with all defaults, except DateStyle and TimeZone.
> Urk. 8.2 ?
> Pg on Windows improves a lot with each release, and that's an old buggy
> version of 8.2 at that. Looking into an upgrade would be a really, REALLY
> good idea.
> Please try interpreting again my original mail, considering that when I
>> said "high CPU usage" It might very well be "high IO usage".
>> The final effect was that the server went non-responsive, for all
>> matters, not even the TaskManager would come up when i hit CTRL-ALT-DEL,
>> and of course, every client would suffer horrific (+20 secs) for the
>> simplest operations like SELECT NOW();
> That sounds a LOT like horrible read latencies caused by total I/O
> overload. It could also be running out of memory and swapping heavily, so do
> keep an eye out for that, but I wouldn't expect to see that with an ALTER
> TABLE - especially on a 16GB server.
> / My question then is: is there a way to limit the CPU* or **IO
>> USAGE* assigned to a specific connection?/
> In win32 you can set CPU priorities manually in Task Manager, but only once
> you already know the process ID of the Pg backend that's going to be
> hammering the machine. Not helpful.
> I don't know of any way to do per-process I/O priorities in Win32, but I
> only use win32 reluctantly and don't use it for anything I care about (like
> a production Pg server) so I'm far from a definitive source.
> Craig Ringer
In response to
pgsql-performance by date
|Next:||From: Greg Smith||Date: 2010-01-13 07:39:30|
|Subject: Re: a heavy duty operation on an "unused" table kills my
|Previous:||From: Jaime Casanova||Date: 2010-01-13 06:31:43|
|Subject: Re: [PERFORMANCE] work_mem vs temp files issue|