Skip site navigation (1) Skip section navigation (2)

Re: Massive table (500M rows) update nightmare

From: "Carlo Stonebanks" <stonec(dot)register(at)sympatico(dot)ca>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Massive table (500M rows) update nightmare
Date: 2010-01-07 21:48:19
Message-ID: hi5kpr$184c$1@news.hub.org (view raw or flat)
Thread:
Lists: pgsql-performance
> What is the rationale behind this? How about doing 10k rows in 1
> update, and committing every time?

When we did 10K updates, the application would sometimes appear to have 
frozen, and we were concerned that there was a deadlock condition because of 
the number of locked rows. While we may have the patience to sit around and 
wait five minutes to see if the update would continue, we couldn't risk 
having other applications appear frozen if that was the case. In fact, there 
is no reason for any user or application to write to the same records we are 
writing to - but the audit table is read frequently. We are not explicitly 
locking anything, or writing any additional code to arbitrate the lcoking 
model anywhere -it's all default UPDATE and SELECT syntax.

Doing the updates in smaller chunks resolved these apparent freezes - or, 
more specifically, when the application DID freeze, it didn't do it for more 
than 30 seconds. In all likelyhood, this is the OS and the DB thrashing.

We have since modified the updates to process 1000 rows at a time with a 
commit every 10 pages. Just this morning, though, the IS manager asked me to 
stop the backfill because of the load affect on other processes.

> You could try making the condition on the ctid column, to not have to
> use the index on ID, and process the rows in physical order.

An interesting idea, if I can confirm that the performance problem is 
because of the WHERE clause, not the UPDATE.

>'where new_column is null' to the conditions.

Already being done, albeit with a coalesce(val, '') = '' - it's quite 
possible that this is hurting the WHERE clause; the EXPLAIN shows the table 
using the pkey and then filtering on the COALESCE as one would expect.

Carlo 


In response to

Responses

pgsql-performance by date

Next:From: Greg SmithDate: 2010-01-07 22:21:47
Subject: Re: Air-traffic benchmark
Previous:From: marcin mankDate: 2010-01-07 21:05:14
Subject: Re: Massive table (500M rows) update nightmare

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group