optimization ideas for frequent, large(ish) updates in frequently accessed DB?

From: "Marinos J(dot) Yannikos" <mjy(at)geizhals(dot)at>
To: pgsql-performance(at)postgresql(dot)org
Subject: optimization ideas for frequent, large(ish) updates in frequently accessed DB?
Date: 2004-02-13 00:58:34
Message-ID: 402C213A.4030105@geizhals.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

one of our tables has to be updated frequently, but concurrently running
SELECT-queries must also have low latency times (it's being accessed
through a web interface).

I'm looking for ideas that might improve the interactive performance of
the system, without slowing down the updates too much. Here are the
characteristics of the table and its use:

- approx. 2 million rows

- approx. 4-5 million rows per day are replaced in short bursts of
1-200k rows (average ~3000 rows per update)

- the table needs 6 indexes (not all indexes are used all the time, but
keeping them all the time slows the system down less than re-creating
some of them just before they're needed and dropping them afterwards)

- an "update" means that 1-200k rows with a common value in a particular
field are replaced with an arbitrary number of new rows (with the same
value in that field), i.e.:

begin transaction;
delete from t where id=5;
insert into t (id,...) values (5,...);
... [1-200k rows]
end;

The problem is, that a large update of this kind can delay SELECT
queries running in parallel for several seconds, so the web interface
used by several people will be unusable for a short while.

Currently, I'm using temporary tables:

create table xyz as select * from t limit 0;
insert into xyz ...
...
begin transaction;
delete from t where id=5;
insert into t select * from xyz;
end;
drop table xyz;

This is slightly faster than inserting directly into t (and probably
faster than using COPY, even though using that might reduce the overall
load on the database).

What other possibilities are there, other than splitting up the 15
columns of that table into several smaller tables, which is something
I'd like to avoid? Would replication help? (I doubt it, but haven't
tried it yet...) Writing to one table (without indexes) and creating
indexes and renaming it to the "read table" periodically in a double
buffering-like fashion wouldn't work either(?), since the views and
triggers would have to be re-created every time as well and other
problems might arise.

The postgresql.conf options are already reasonably tweaked for
performance(IMHO), but perhaps some settings are particularly critical:

shared_buffers=100000
(I tried many values, this seems to work well for us - 12GB RAM)
wal_buffers=500
sort_mem=800000
checkpoint_segments=16
effective_cache_size=1000000
etc.

Any help/suggestions would be greatly appreciated... Even if it's
something like "you need a faster db box, there's no other way" ;-)

Regards,
Marinos

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2004-02-13 06:28:41 Re: optimization ideas for frequent, large(ish) updates in frequently accessed DB?
Previous Message Chris Ruprecht 2004-02-12 22:19:27 Re: Disappointing performance in db migrated from MS SQL Server