performance enhancements for PostgreSQL

From: "Johnson, Shaunn" <SJohnson6(at)bcbsm(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: performance enhancements for PostgreSQL
Date: 2002-11-18 19:21:04
Message-ID: 73309C2FDD95D11192E60008C7B1D5BB04C74253@snt452.corp.bcbsm.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Howdy:

I have a Linux server running PostgreSQL 7.2.1. with about 1 Gig of
memory. The proc speed is about 1.14 GHz.

I'm getting more and more concerned about how often
the database gets used and the days (like today) where I
wonder if buying more memory rather than buying a 2nd CPU
was such a great idea.

In an effort to enhance / streamline performance, I've done
the following:

* memory upgrade from 512M to 1G
* move RAID5 to scsi drives (10K RPM)
* set up cron script to vacuum database weekly
* set number of client connects ( i.e., /usr/bin/postmaster -i -B 128 -N 64
-d 4 )

But I'm at the point now that I can't kill some jobs. Yes, I know
I shouldn't use 'kill' in any forceful way, but just a kill seems to do
nothing (or, if it is doing something, it's not fast enough
for the user community and it's stopping production).

I got this from a co-worker:

[snip]

the server is blocking on access to the metadata tables. not even
logins are being processed. I'm not sure what caused the
problem, but I think a database restart is the best course,
which I have been trying to do. Do not kill -9, as it will corrupt the
WAL.

[/snip]

I see some errors in the messages file regarding the RAID drives
(the filesystem where the database lives)

[snip error]

Nov 18 12:33:51 hmp kernel: scsi : aborting command due to timeout : pid 0,
scsi
2, channel 2, id 0, lun 1 Write (10) 20 11 d2 76 9f 00 00 40 00
Nov 18 12:33:51 hmp kernel: scsi : aborting command due to timeout : pid 0,
scsi
2, channel 2, id 0, lun 1 Write (10) 20 11 d2 76 df 00 00 08 00
Nov 18 12:33:51 hmp kernel: scsi : aborting command due to timeout : pid 0,
scsi
2, channel 2, id 0, lun 1 Write (10) 20 11 d2 76 ef 00 00 08 00
Nov 18 12:33:52 hmp kernel: scsi : aborting command due to timeout : pid 0,
scsi
2, channel 2, id 0, lun 1 Write (10) 20 11 d2 77 27 00 00 40 00
Nov 18 12:33:52 hmp kernel: scsi : aborting command due to timeout : pid 0,
scsi
2, channel 2, id 0, lun 1 Write (10) 20 11 d2 77 67 00 00 40 00
Nov 18 12:33:53 hmp kernel: scsi : aborting command due to timeout : pid 0,
scsi

[/snip error]

my questions are:

* have can I figure out how access to the metadata tables
are being stopped? (my guess is the error on the scsi drive, but ... )
* how to restart PostgreSQL without running the risk of corrupting data?
* what are the benefits to adding a 2nd CPU over, say, more memory?

I'm sorry that I don't have enough information at this time ... I'm getting
swamped by users as I type this.

Thanks!

-X

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bruce Momjian 2002-11-18 19:23:01 Re: [GENERAL] DECLARE CURSOR
Previous Message Gregory Seidman 2002-11-18 19:19:31 Re: More time manipulation..