Re: Long Running Update

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <pgsql-performance(at)postgresql(dot)org>, "Harry Mantheakis" <harry(dot)mantheakis(at)riskcontrollimited(dot)com>
Subject: Re: Long Running Update
Date: 2011-06-23 19:32:24
Message-ID: 4E034E78020000250003EB37@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Harry Mantheakis <harry(dot)mantheakis(at)riskcontrollimited(dot)com> wrote:

> UPDATE
> table_A
> SET
> (
> field_1
> , field_2
> ) = (
> table_B.field_1
> , table_B.field_2
> )
> FROM
> table_B
> WHERE
> table_B.id = table_A.id
> ;

I would have just done:

SET field_1 = table_B.field_1, field_2 = table_B.field_2

instead of using row value constructors. That might be slowing
things down a bit.

> I tested (the logic of) this statement with a very small sample,
> and it worked correctly.

Always a good sign. :-)

> The statement has been running for 18+ hours so far.

> My question is: can I reasonably expect a statement like this to
> complete with such a large data-set, even if it takes several
> days?

If it's not leaking memory, I expect that it will complete.

To get some sense of what it's doing, you could log on to another
connection and EXPLAIN the statement. (NOTE: Be careful *not* to
use EXPLAIN ANALYZE.)

Another thing to consider if you run something like this again is
that an UPDATE is an awful lot like an INSERT combined with a
DELETE. The way PostgreSQL MVCC works, the old version of each row
must remain until the updating transaction completes. If you were
to divide this update into a series of updates by key range, the new
versions of the rows from later updates could re-use the space
previously occupied by the old version of rows from earlier updates.
For similar reasons, you might want to add something like this to
your WHERE clause, to prevent unnecessary updates:

AND (table_B.field_1 IS DISTINCT FROM table_A.field_1
OR table_B.field_2 IS DISTINCT FROM table_A.field_2);

-Kevin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Harry Mantheakis 2011-06-24 11:16:32 Re: Long Running Update
Previous Message Claudio Freire 2011-06-23 17:18:23 Re: Long Running Update