Understanding WAL - large amount of activity from removing data

From: Isaac Morland <isaac(dot)morland(at)gmail(dot)com>
To: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Understanding WAL - large amount of activity from removing data
Date: 2022-11-21 01:24:11
Message-ID: CAMsGm5eJ8BtMDSSLiD7Tf7wMDeZi_fRMjBdrmcetzqjTGVwXWw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I'm encountering some surprising (to me) behaviour related to WAL, and I'm
wondering if anybody can point me at an article that might help me
understand what is happening, or give a brief explanation.

I'm trying to make a slimmed down version of my database for testing
purposes. As part of this, I'm running a query something like this:

UPDATE table1
SET pdfcolumn = 'redacted'
WHERE pdfcolumn IS NOT NULL;

(literally 'redacted', not redacted here for your benefit)

The idea is to replace the actual contents of the column, which are PDF
documents totalling 70GB, with just a short placeholder value, without
affecting the other columns, which are a more ordinary collection - a few
integers and short strings.

The end result will be a database which is way easier to copy around but
which still has all the records of the original; the only change is that an
attempt to access one of the PDFs will not return the actual PDF but rather
a garbage value. For most testing this will make little to no difference.

What I'm finding is that the UPDATE is taking over an hour for 5000
records, and tons of WAL is being generated, several files per minute.
Selecting the non-PDF columns from the entire table takes a few
milliseconds, and the only thing I'm doing with the records is updating
them to much smaller values. Why so much activity just to remove data? The
new rows are tiny.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2022-11-21 02:02:12 Re: Understanding WAL - large amount of activity from removing data
Previous Message Andres Freund 2022-11-21 00:38:15 Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)