Re: Need to update all my 60 million rows at once without transactional integrity

From: Tom Allison <tom(at)tacocat(dot)net>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: "christian_behrens(at)gmx(dot)net" <christian_behrens(at)gmx(dot)net>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Need to update all my 60 million rows at once without transactional integrity
Date: 2008-04-23 21:35:14
Message-ID: BB3A6D34-4DBD-4347-BAA5-9947FE132BA9@tacocat.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Far from being an expert on postgres, but there are two ideas--
assuming that you cannot afford the time it would take to simply
UPDATE and wait...

Write a script to update all the rows, one at a time. Lowest impact to
operations but would take a very long time.

Assuming you have a sequence primary key value on each row, update by
ID blocks on the order of 10,000's or 100,000's at a time (or more).
This is a balancing act between time to complete and immediate load on
the server.

I've used both options but I don't think I've exceeded 15 million rows.

Sent from my iPhone.

On Apr 23, 2008, at 2:15 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> On Mon, 2008-04-21 at 00:19 +0200, christian_behrens(at)gmx(dot)net wrote:
>
>> How can I make a Update of a column in a very large table for all
>> rows
>> without using the double amount of disc space and without any need
>> for
>> atomic operation?
>>
>> I have a very large table with about 60 million rows. I sometimes
>> need
>> to do a simple update to ALL rows that resets a status-flag to zero.
>>
>> I don't need to have transactional integrity (but of course if the
>> system crashes, there should be no data corruption.
>
> No such thing. Without transactions you have no sensible definition of
> what constitutes data corruption.
>
>> A separate flag in the file system can well save the fact that that
>> bulk update was in progress) for this, I don't care or want a abort
>> or
>> "all or nothing" like SQL mandates. The idea is basically that either
>> this update succeeds or it succeeds or - there is no "not". It must
>> succeed. It must be tried until it works. If the update is halfway
>> finished, that's okay.
>
> Don't reset them to zero, just redefine the meaning of the counter.
> Take
> the max value in the table and then have the app understand that
> anything <= the previous max value means the same thing as whatever
> "status = 0" means now. The app would need to read the new baseline
> value before performing any work.
>
> This is roughly the technique used by Slony to avoid needing to update
> every row in the log table to show that it has successfully replicated
> it. It's also the technique MVCC relies upon internally.
>
> It's less work and crash safe in all cases.
>
> --
> Simon Riggs
> 2ndQuadrant http://www.2ndQuadrant.com
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Robert Treat 2008-04-23 21:51:48 Re: How to modify ENUM datatypes?
Previous Message Steve Crawford 2008-04-23 21:26:31 Re: Create temporary function