Re: Massive table (500M rows) update nightmare

From: "Carlo Stonebanks" <stonec(dot)register(at)sympatico(dot)ca>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Massive table (500M rows) update nightmare
Date: 2010-01-07 21:48:19
Message-ID: hi5kpr$184c$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> What is the rationale behind this? How about doing 10k rows in 1
> update, and committing every time?

When we did 10K updates, the application would sometimes appear to have
frozen, and we were concerned that there was a deadlock condition because of
the number of locked rows. While we may have the patience to sit around and
wait five minutes to see if the update would continue, we couldn't risk
having other applications appear frozen if that was the case. In fact, there
is no reason for any user or application to write to the same records we are
writing to - but the audit table is read frequently. We are not explicitly
locking anything, or writing any additional code to arbitrate the lcoking
model anywhere -it's all default UPDATE and SELECT syntax.

Doing the updates in smaller chunks resolved these apparent freezes - or,
more specifically, when the application DID freeze, it didn't do it for more
than 30 seconds. In all likelyhood, this is the OS and the DB thrashing.

We have since modified the updates to process 1000 rows at a time with a
commit every 10 pages. Just this morning, though, the IS manager asked me to
stop the backfill because of the load affect on other processes.

> You could try making the condition on the ctid column, to not have to
> use the index on ID, and process the rows in physical order.

An interesting idea, if I can confirm that the performance problem is
because of the WHERE clause, not the UPDATE.

>'where new_column is null' to the conditions.

Already being done, albeit with a coalesce(val, '') = '' - it's quite
possible that this is hurting the WHERE clause; the EXPLAIN shows the table
using the pkey and then filtering on the COALESCE as one would expect.

Carlo

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Smith 2010-01-07 22:21:47 Re: Air-traffic benchmark
Previous Message marcin mank 2010-01-07 21:05:14 Re: Massive table (500M rows) update nightmare