RE: Updating large tables without dead tuples

From: "ldh(at)laurent-hasson(dot)com" <ldh(at)laurent-hasson(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: RE: Updating large tables without dead tuples
Date: 2018-02-24 17:19:21
Message-ID: BY2PR15MB0872D2208E95046125EC577385C30@BY2PR15MB0872.namprd15.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> -----Original Message-----
> From: Stephen Frost [mailto:sfrost(at)snowman(dot)net]
> Sent: Friday, February 23, 2018 19:10
> To: ldh(at)laurent-hasson(dot)com
> Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
> Subject: Re: Updating large tables without dead tuples
>
> Greetings,
>
> * ldh(at)laurent-hasson(dot)com (ldh(at)laurent-hasson(dot)com) wrote:
> > This was done during a maintenance window, and that table is read-only
> except when we ETL data to it on a weekly basis, and so I was just wondering
> why I should pay the "bloat" penalty for this type of transaction. Is there a trick
> that could be use here?
>
> Yes, create a new table and INSERT the data into that table, then swap the new
> table into place as the old table. Another option, if you don't mind the
> exclusive lock taken on the table, is to dump the data to another table, then
> TRUNCATE the current one and then INSERT into it.
>
> There's other options too, involving triggers and such to allow updates and
> other changes to be captured during this process, avoiding the need to lock the
> table, but that gets a bit complicated.
>
> > More generally, I suspect that the MVCC architecture is so deep that
> something like LOCK TABLE, which would guarantee that there won't be
> contentions, couldn't be used as a heuristic to not create dead tuples? That
> would make quite a performance improvement for this type of work though.
>
> I'm afraid it wouldn't be quite that simple, particularly you have to think about
> what happens when you issue a rollback...
>
> Thanks!
>
> Stephen

[Laurent Hasson]
[Laurent Hasson]
This table several other tables with foreign keys into it... So any physical replacement of the table wouldn't work I believe. I'd have to disable/remove the foreign keys across the other tables, do this work, and then re-set the foreign keys. Overall time in aggregate may not be much shorter than the current implementation.

This table represents Hospital visits, off of which hang a lot of other information. The updated column in that Visits table is not part of the key.

As for the rollback, I didn't think about it because in our case, short of a db/hardware failure, this operation wouldn't fail... But the risk is there and I understand the engine must be prepared for anything and fulfill the ACID principles.

With respect to that, I read in many places that an UPDATE is effectively a DELETE + INSERT. Does that mean in the rollback logs, there are 2 entries for each row updated as a result?

Thank you,
Laurent.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Stephen Frost 2018-02-24 20:56:50 Re: Updating large tables without dead tuples
Previous Message Vitaliy Garnashevich 2018-02-24 08:45:14 Re: Bitmap scan is undercosted?