Re: Updating large tables without dead tuples

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: "ldh(at)laurent-hasson(dot)com" <ldh(at)laurent-hasson(dot)com>
Cc: "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Re: Updating large tables without dead tuples
Date: 2018-02-24 00:09:40
Message-ID: 20180224000940.GP2416@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Greetings,

* ldh(at)laurent-hasson(dot)com (ldh(at)laurent-hasson(dot)com) wrote:
> This was done during a maintenance window, and that table is read-only except when we ETL data to it on a weekly basis, and so I was just wondering why I should pay the "bloat" penalty for this type of transaction. Is there a trick that could be use here?

Yes, create a new table and INSERT the data into that table, then swap
the new table into place as the old table. Another option, if you don't
mind the exclusive lock taken on the table, is to dump the data to
another table, then TRUNCATE the current one and then INSERT into it.

There's other options too, involving triggers and such to allow updates
and other changes to be captured during this process, avoiding the need
to lock the table, but that gets a bit complicated.

> More generally, I suspect that the MVCC architecture is so deep that something like LOCK TABLE, which would guarantee that there won't be contentions, couldn't be used as a heuristic to not create dead tuples? That would make quite a performance improvement for this type of work though.

I'm afraid it wouldn't be quite that simple, particularly you have to
think about what happens when you issue a rollback...

Thanks!

Stephen

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Vitaliy Garnashevich 2018-02-24 08:45:14 Re: Bitmap scan is undercosted?
Previous Message ldh@laurent-hasson.com 2018-02-23 23:27:36 Updating large tables without dead tuples