Re: When should I worry?

From: Tom Allison <tom(at)tacocat(dot)net>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: When should I worry?
Date: 2007-06-12 01:44:05
Message-ID: 466DFA65.8080604@tacocat.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Greg Smith wrote:
>
> On Mon, 11 Jun 2007, Steve Crawford wrote:
>
>> In my experience the more common situation is to "go off a cliff."
>
> Yeah, I think the idea that you'll notice performance degrading and be
> able to extrapolate future trends using statistical techniques is a
> bit...optimistic.
>
> Anyway, back to the original question here. If you're worried about
> catching when performance starts becoming an issue, you need to do some
> sort of logging of how long statements are taking to execute. The main
> choice is whether to log everything, at which point the logging and
> sorting through all the data generated may become its own performance
> concern, or whether to just log statements that take a long time and
> then count how many of them show up. Either way will give you some sort
> of early warning once you get a baseline; it may take a bit of tweaking
> to figure out where to draw the line at for what constitutes a "long"
> statement if you only want to see how many of those you get.
>
> There are two tools you should look at initially to help process the
> logging information you get back: pgFouine and PQA. Here are intros to
> each that also mention how to configure the postgresql.conf file:
>
> http://pgfouine.projects.postgresql.org/tutorial.html
> http://www.databasejournal.com/features/postgresql/article.php/3323561
>
> As they're similar programs, which would work better for you is hard to
> say; check out both and see which seems more practical or easier to get
> running. For example, if you only have one of PHP/Ruby installed, that
> may make one tool or the easier preferred.
>
> If you can get yourself to the point where you can confidently say
> something like "yesterday we had 346 statements that took more then
> 200ms to execute, which is 25% above this month's average", you'll be in
> a positition to catch performance issues before they completely
> blindside you; makes you look good in meetings, too.
>

Starting to sound like a sane idea.
I've been running a test job for almost 24 hours and have accumulated only 8
million rows. That's another 125 days to get to the big 'B'. I think by then
I'll have blown a hard drive or worse. I'm running this on some very old
hardware that I have available (more of this at the bottom).

However, at this point the machine is running all of the SQL at < 0.2 seconds
each. Which I consider just fine for 7,599,519 rows.

Here's some specifics about the tables:
count() from headers: 890300
count() from tokens: 890000
count() from header_token: 7599519

CREATE TABLE header_token (
header_idx integer NOT NULL,
token_idx integer NOT NULL
);

CREATE TABLE headers (
idx serial NOT NULL,
hash character varying(64) NOT NULL
);

CREATE TABLE tokens (
idx bigserial NOT NULL,
hash character varying(64) NOT NULL
);

ALTER TABLE ONLY headers
ADD CONSTRAINT headers_hash_key UNIQUE (hash);
ALTER TABLE ONLY headers
ADD CONSTRAINT headers_pkey PRIMARY KEY (idx);
ALTER TABLE ONLY header_token
ADD CONSTRAINT pkey_header_token PRIMARY KEY (header_idx, token_idx);
ALTER TABLE ONLY tokens
ADD CONSTRAINT tokens_hash_key UNIQUE (hash);
ALTER TABLE ONLY tokens
ADD CONSTRAINT tokens_pkey PRIMARY KEY (idx);
ALTER TABLE ONLY header_token
ADD CONSTRAINT header_token_header_idx_fkey FOREIGN KEY (header_idx)
REFERENCES headers(idx) ON DELETE CASCADE;
ALTER TABLE ONLY header_token
ADD CONSTRAINT header_token_token_idx_fkey FOREIGN KEY (token_idx)
REFERENCES tokens(idx) ON DELETE CASCADE;

The SQL I was timing were:
select t.hash, h.hash
from headers h, header_token ht, tokens t
where h.idx = ht.header_idx
and ht.token_idx = t.idx
and h.idx = ?

insert into header_token
select $header, idx from tokens where idx in (...)

The SELECT was <0.2
The INSERT was easily <.7 (most of the time -- ranged because the idx IN (..)
varied from 200 to 700. The min was <2 and the max was >1.0 from a few minutes
of observation.

All of this was run on a Pentium II 450 MHz with 412MB RAM and a software linear
0 pair or UDMA 66 7200RPM 8MB Cache drives (really old) on seperate IDE channels
with ReiserFS disk format. The actual script was running on a seperate machine
across a 100-base-T full duplex network through a firewall machine between the
two subnets.

I can't imagine how long it would take to run:
delete from tokens;
with the CASCADE option...

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Greg Smith 2007-06-12 04:00:01 Re: When should I worry?
Previous Message Angelo 2007-06-11 22:59:07 Re: Schema as versioning strategy