Re: optimization ideas for frequent, large(ish) updates in frequently accessed DB?

From: Manfred Koizar <mkoi-pg(at)aon(dot)at>
To: "Marinos J(dot) Yannikos" <mjy(at)geizhals(dot)at>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: optimization ideas for frequent, large(ish) updates in frequently accessed DB?
Date: 2004-02-13 09:26:16
Message-ID: dc2p20dgmji9ol4fngtao2pbfhvlr5io7j@email.aon.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, 13 Feb 2004 01:58:34 +0100, "Marinos J. Yannikos"
<mjy(at)geizhals(dot)at> wrote:
>I'm looking for ideas that might improve the interactive performance of
>the system, without slowing down the updates too much.

IOW, you could accept slower updates. Did you actually try and throttle
down the insert rate?

> Here are the
>characteristics of the table and its use:
>
>- approx. 2 million rows

Doesn't sound worrying. What's the min/max/average size of these rows?
How large is this table?
SELECT relpages FROM pg_class WHERE relname='...';

What else is in this database, how many tables, how large is the
database (du $PGDATA)?

>- approx. 4-5 million rows per day are replaced in short bursts of
>1-200k rows (average ~3000 rows per update)

How often do you VACUUM [ANALYSE]?

>- the table needs 6 indexes (not all indexes are used all the time, but
>keeping them all the time slows the system down less than re-creating
>some of them just before they're needed and dropping them afterwards)

I agree.

>- an "update" means that 1-200k rows with a common value in a particular
>field are replaced with an arbitrary number of new rows (with the same
>value in that field), i.e.:
>
>begin transaction;
> delete from t where id=5;
> insert into t (id,...) values (5,...);
> ... [1-200k rows]
>end;

This is a wide variation in the number of rows. You told us the average
batch size is 3000. Is this also a *typical* batch size? And what is
the number of rows where you start to get the feeling that it slows down
other sessions?

Where do the new values come from? I don't think they are typed in :-)
Do they come from external sources or from the same database? If the
latter, INSERT INTO ... SELECT ... might help.

>The problem is, that a large update of this kind can delay SELECT
>queries running in parallel for several seconds, so the web interface
>used by several people will be unusable for a short while.

Silly question: By SELECT you mean pure SELECT transactions and not
some transaction that *mostly* reads from the database? I mean, you are
sure your SELECT queries are slowed down and not blocked by the
"updates".

Show us the EXPLAIN ANALYSE output for the same SELECT, once when it is
fast and once when it is slow. BTW, what is fast and what is slow?

>Currently, I'm using temporary tables:
> [...]
>This is slightly faster than inserting directly into t (and probably
>faster than using COPY, even though using that might reduce the overall
>load on the database).

You might try using a prepared INSERT statement or COPY.

>shared_buffers=100000
>(I tried many values, this seems to work well for us - 12GB RAM)
>wal_buffers=500
>sort_mem=800000
>checkpoint_segments=16
>effective_cache_size=1000000

See Josh's comments.

>Any help/suggestions would be greatly appreciated... Even if it's
>something like "you need a faster db box, there's no other way" ;-)

We have to find out, what is the bottleneck. Tell us about your
environment (hardware, OS, ...). Run top and/or vmstat and look for
significant differences between times of normal processing and slow
phases. Post top/vmstat output here if you need help.

Servus
Manfred

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Andrew Sullivan 2004-02-13 12:38:57 Re: Disappointing performance in db migrated from MS SQL Server
Previous Message Josh Berkus 2004-02-13 06:28:41 Re: optimization ideas for frequent, large(ish) updates in frequently accessed DB?