We recently ran into a similar problem with 7.1.
Let me start by saying: knowing the SQLs you are running, knowing "EXPLAIN"
and knowing pgmonitor go a LONG way in helping you identify the problem.
In our case, the problem came down to this: we had a table ("TABLE_A") with
~200,000 rows that was heavily used (1000 rows added per day). The most
common access to this table was via a foreign key of type int4 where any
select for a specific value would return normally 1 and never more than 10
rows. Obviously, there was an index on this foreign key and the index was
used (resulting in an "INDEXED SCAN") in most cases. However, the problem
was that half the rows in "TABLE_A" had no foreign key relationship and so
had this field set to zero. No one would ever do a select asking for all
rows where foreign key = 0 as that didn't make "Business sense". But,
since the database isn't aware of the business meaning of our data, the
stats gathered by the ANALYSE would think that the overall distribution of
the column for a unique value was between 2000 and 3000 rows.
Mathematically correct, but in reality not useful.
The result was that as the table grew and/or based on differences in the
sample of rows from the most recent nightly VUCUUM, the cost of an INDEX
SCAN (# of estimated rows * random_page_cost) would exceed the cost of a
straight SEQ SCAN (# of data pages in table). Thus, many applications
would start scanning the entire table. While any one scan would take 4
seconds (vs. fractions of a second using the index) and that wasn't too
bad, the result of many users doing this many times quickly made our
machine exhibit the same behavior that you describe.
Our short term fix was to turn down the value of random_page_cost.
However, as Tom Lane very rightly noted in response to a similar posting,
this is a total hack. Our goal is to switch to 7.2 in the hopes that the
"WHERE" extension to the "CREATE INDEX" command coupled with greater
control of the sample space used in statistics will be the true answer. We
also intend to look into whether setting this optional foreign key to be
nullable may effect things in 7.1 .
Overall, my point is that we were in a position where Postgres was
operating exactly as it was intended. There really wasn't any tuning to be
corrected or log file with a silver bullet message starting what to change.
It was all a matter of seeing what the load on the system was and why it
was taking as long as it was.
My advice: EXPLAIN is your friend. pgmonitor is your friend.
Of course, that's just my opinion - I could be wrong...
Marc Mitchell - Senior Technical Architect
Enterprise Information Solutions, Inc.
4910 Main Street
Downers Grove, IL 60515
----- Original Message -----
From: "Robert M. Meyer" <rmeyer(at)installs(dot)com>
To: "Naomi Walker" <nwalker(at)eldocomp(dot)com>
Sent: Tuesday, July 23, 2002 1:29 PM
Subject: Re: [ADMIN] Postgres performance slowly gets worse over a month
> Well, we're running a hardware, three disk RAID5, on an
> sym53c896-0-<2,*>: FAST-20 WIDE SCSI 40.0 MB/s interface. This is a
> Compaq 3500 system with a CR3500 raid controller. An md5sum of a 1.2Gig
> file takes less than two minutes.
> We tried rebuilding the indices but that didn't seem to help. We had an
> outside consultant do the rebuild and he's not available now so I don't
> know what command he ran to do it.
> I've never used 'sar'. If you can believe it, I've been poking around
> with Unix for the last 20 years and I've never even seen the 'man' page
> for 'sar'. I probably should look into it. What flags would give me
> the most information to help figger out what's going on here?
> Of course the troubleshooting increment is going to be a month or more
> so this will probably take some time to resolve :-)
> On Tue, 2002-07-23 at 14:08, Naomi Walker wrote:
> > >Nightly, we're doing a 'vacuumdb -a -z' after stopping and restarting
> > >the database. The performance will get so bad after a month that we
> > >start to see load spikes in excess of 30. Normally, we don't see load
> > >over 2.5 during the heaviest activity and generally less than 1.0 most
> > >of the time.
> > Typically, performance is linked to your I/O, but my first guess in
> > case has to do with your indices. As a test, next time performance
> > really rotten, drop your indicies and rebuild them. It cannot hurt, and
> > might just help.
> > The trick here is to see what is happening while it is tanking. What
> > your disk configuration look like? Is it a raid or stripe where reads
> > spread out among more than one controller? Do sar reports point to
> > anything in particular?
> > ----------------------------------
> > Naomi Walker
> > Eldorado Computing, Inc
> > Chief Information Officer
> > nwalker(at)eldocomp(dot)com
> > 602-604-3100 x242
> Robert M. Meyer
> Sr. Network Administrator
> DigiVision Satellite Services
> 14 Lafayette Sq, Ste 410
> Buffalo, NY 14203-1904
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
In response to
pgsql-admin by date
|Next:||From: Michael G. Martin||Date: 2002-07-23 20:45:32|
|Subject: Re: Postgres performance slowly gets worse over a month|
|Previous:||From: Jeff Boes||Date: 2002-07-23 18:52:01|
|Subject: Re: Recurring 'FATAL 1' error in postmaster log|