Re: performance of insert/delete/update

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
Cc: Wei Weng <wweng(at)kencast(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: performance of insert/delete/update
Date: 2002-11-21 23:34:53
Message-ID: 200211211534.53358.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance


Scott,

> > This only works up to the limit of the memory you have available for
> > Postgres. If the updates in one transaction exceed your available
> > memory, you'll see a lot of swaps to disk log that will slow things
> > down by a factor of 10-50 times.
>
> Sorry, but that isn't true. MVCC means we don't have to hold all the data
> in memory, we can have multiple versions of the same tuples on disk, and
> use memory for what it's meant for, buffering.

Sorry, you're absolutely correct. I don't know what I was thinking of; 's the
problem with an off-the-cuff response.

Please disregard the previous quote. Instead:

Doing several large updates in a single transaction can lower performance if
the number of updates is sufficient to affect index usability and a VACUUM is
really needed between them. For example, a series of large data
transformation statements on a single table or set of related tables should
have VACCUUM statements between them, thus preventing you from putting them
in a single transaction.

Example, the series:
1. INSERT 10,000 ROWS INTO table_a;
2. UPDATE 100,000 ROWS IN table_a WHERE table_b;
3. UPDATE 100,000 ROWS IN table_c WHERE table_a;

WIll almost certainly need a VACUUM or even VACUUM FULL table_a after 2),
requiring you to split the update series into 2 transactions. Otherwise, the
"where table_a" condition in step 3) will be extremely slow.

> Also note that many folks have replaced foreign keys with triggers and
> gained in performance, as fks in pgsql still have some deadlock issues to
> be worked out.

Yeah. I think Neil Conway is overhauling FKs, which everyone considers a bit
of a hack in the current implementation, including Jan who wrote it.

> > It can be dangerous though ... in the event of a power outage, for
> > example, your database could be corrupted and difficult to recover. So
> > ... "at your own risk".
>
> No, the database will not be corrupted, at least not in my experience.
> however, you MAY lose data from transactions that you thought were
> committed. I think Tom posted something about this a few days back.

Hmmm ... have you done this? I'd like the performance gain, but I don't want
to risk my data integrity. I've seen some awful things in databases (such as
duplicate primary keys) from yanking a power cord repeatedly.

> update table set field1=field1+1
>
> are killers in an MVCC database as well.

Yeah -- don't I know it.

--
-Josh Berkus
Aglio Database Solutions
San Francisco

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Wheeler 2002-11-21 23:38:43 Re: DBD::PostgreSQL compile time /run time version
Previous Message Christopher Kings-Lynne 2002-11-21 23:02:41 Re: Optimizer & boolean syntax

Browse pgsql-performance by date

  From Date Subject
Next Message Mario Weilguni 2002-11-22 07:31:11 Re: [ADMIN] H/W RAID 5 on slower disks versus no raid on
Previous Message scott.marlowe 2002-11-21 22:54:14 Re: performance of insert/delete/update