Re: Massive table (500M rows) update nightmare

From: "Carlo Stonebanks" <stonec(dot)register(at)sympatico(dot)ca>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Massive table (500M rows) update nightmare
Date: 2010-01-07 17:56:37
Message-ID: hi577c$1lc2$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> If it is possible to lock this audit table exclusively (may be during
> off peak hours) I would look into
> - create new_audit_table as select col1, col2, col3 ... col9,
> 'new_col_value' from old_audit_table;
> - create all indexes
> - drop old_audit_table
> - rename new_audit_table to old_audit_table
>
> That is probably the fasted method you can do, even if you have to join
> the "new_col_value" from an extra helper-table with the correspondig id.
> Remeber, databases are born to join.
>

This has all been done before - the production team was crippled while they
waited for this and the SECOND the table was available again, they jumped on
it - even though it meant recreating the bare minimum of the indexes.

> You could also try to just update the whole table in one go, it is
> probably faster than you expect.

Possibly, but with such a large table you have no idea of the progress, you
cannot interrupt it without rolling back everything. Worse, you have
applications stalling and users wanting to know what is going on - is the OS
and the DB/MVCC trashing while it does internal maintenance? Have you
reached some sort of deadlock condition that you can't see because the
server status is not helpful with so many uncommitted pending updates?

And of course, there is the file bloat.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Gurgel, Flavio 2010-01-07 17:57:01 Re: Air-traffic benchmark
Previous Message Greg Sabino Mullane 2010-01-07 17:56:16 Re: Massive table (500M rows) update nightmare