Re: CPU 0.1% IOWAIT 99% for decisonnal queries

From: Richard Huxton <dev(at)archonet(dot)com>
To: Patrick Vedrines <patrick(dot)vedrines(at)adpcl(dot)com>
Cc: performance pgsql <pgsql-performance(at)postgresql(dot)org>
Subject: Re: CPU 0.1% IOWAIT 99% for decisonnal queries
Date: 2005-03-22 18:57:35
Message-ID: 42406A9F.9040404@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Patrick Vedrines wrote:
> Hi everyone,
>
> I'm developping a web decisonnal application based on -Red Hat 3 ES
> -Postgresql 8.0.1 -Dell poweredge 2850, Ram 2Gb, 2 procs, 3 Ghz, 1Mb
> cache and 4 disks ext3 10,000 r/mn I am alone in the box and there is
> not any crontab.
>
> I have 2 databases (A and B) with exactly the same schemas: -one main
> table called "aggregate" having no indexes and supporting only SELECT
> statements (loaded one time a month with a new bundle of datas).

Perhaps look into clustering the tables.

> Row
> size # 200 bytes (50 columns of type char(x) or integer) -and several
> small 'reference' tables not shown by the following example for
> clarity reasons. -Database A : aggregate contains 2,300,000 records (
> 500 Mb) -Database B : aggregate contains 9,000,000 records ( 2 Gb)
>
> There is no index on the aggregate table since the criterias, their
> number and their scope are freely choosen by the customers.

Hmm... not convinced this is a good idea.

> The query : select sum(ca) from aggregate where (issue_date >=
> '2004-01' and issue_date <= '2004-02' ); takes 5s on database A (
> 5mn30s* the first time, probably to fill the cache) and 21mn* on
> database B (whatever it is the first time or not).

Because A fits in the cache and B doesn't.

> - I guess this is a cache issue but how can I manage/control it ? Is
> Postgres managing it's own cache or does it use the OS cache ?

Both

> - Is using the cache is a good approach? It does not seem to work for
> large databases : I tryed several different values for postgres.conf
> and /proc/sys/kernel/shmmax without detecting any response time
> enhancement (For example : shared_buffers = 190000 , sort_mem = 4096
> , effective_cache_size = 37000 and kernel/shmmax=1200000000 ) Do I
> have to upgrade the RAM to 6Gb or/and buy faster HD (of what type?) ?
> Moreover, a query on database B will destroy the cache previously
> build for database A, increasing the response time for the next query
> on database A. And I have in fact 15 databases !

If you don't have any indexes and the table isn't clustered then PG has
no choice but to scan the entire table for every query. As you note,
that's going to destroy your cache. You can increase the RAM but sooner
or later, you'll get the same problem.

> - In my case, what should be the best parameters combination between
> postgres.conf and /proc/sys/kernel/shmmax ?

http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php
http://www.powerpostgresql.com/PerfList

> - is there a way to reduce the size of the "aggregate" table files
> (1Gb + 1Gb + 1 Gb + 0.8Gb = 3.8Gb for the "aggregate" table instead
> of 2Gb = 200 * 9,000,000 records) by playing with the data types or
> others parameters (fillfactor ?). Vacuum (even full) seems to be
> useless since the aggregate table supports only 'copy aggregate from'
> and 'select'.

You can replace int4 with int2 and so on (where possible) but that will
only delay problems.

> - is it possible to define a sort of RAM filesystem (as it exists in
> DOS/Windows) which I could create and populate my databases into ?
> ...since the databases does not support updates for this application.

Won't help - your cache is already doing that. Some things you can do
(in order of effort)

1. Cluster the large tables
2. Analyse your customers' queries and try a couple of indexes - some
choices will be more common than others.
3. Split your tables into two - common fields, uncommon fields, that way
filtering on the common fields might take less space.
4. Split your tables by date, one table per month or year. Then re-write
your customers' queries on-the-fly to select from the right table. Will
only help with queries on date of course.
5. Place each database on its own machine or virtual machine so they
don't interfere with each other.

I'd start with items 1,2 and see if that helps though.

PS - it might make sense to have an unusually large shared_mem for PG,
but I'm not familiar enough with the changes in the cache handling in
8.0 to say for sure.
PPS - there are more changes coming for 8.1, but I know even less about
those.

--
Richard Huxton
Archonet Ltd

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Sabio - PSQL 2005-03-22 19:10:08 Re: Too slow
Previous Message Tom Lane 2005-03-22 18:56:10 Re: What about utility to calculate planner cost constants?