Re: POC: Cleaning up orphaned files using undo logs

From: Antonin Houska <ah(at)cybertec(dot)at>
To: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
Cc: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: Cleaning up orphaned files using undo logs
Date: 2021-06-30 07:04:51
Message-ID: 14578.1625036691@antos
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

tsunakawa(dot)takay(at)fujitsu(dot)com <tsunakawa(dot)takay(at)fujitsu(dot)com> wrote:

> I'm crawling like a snail to read the patch set. Below are my first set of review comments, which are all minor.


> (1)
> + <indexterm><primary>tablespace</primary><secondary>temporary</secondary></indexterm>
> temporary -> undo


> (2)
> <term><varname>undo_tablespaces</varname> (<type>string</type>)
> +
> ...
> + The value is a list of names of tablespaces. When there is more than
> + one name in the list, <productname>PostgreSQL</productname> chooses an
> + arbitrary one. If the name doesn't correspond to an existing
> + tablespace, the next name is tried, and so on until all names have
> + been tried. If no valid tablespace is specified, an error is raised.
> + The validation of the name doesn't happen until the first attempt to
> + write undo data.
> CREATE privilege needs to be mentioned like temp_tablespaces.


> (3)
> + The variable can only be changed before the first statement is
> + executed in a transaction.
> Does it include any first statement that doesn't emit undo?

Yes, it does. As soon as XID is assigned, the variable can no longer be set.

> (4)
> + <entry>One row for each undo log, showing current pointers,
> + transactions and backends.
> + See <xref linkend="pg-stat-undo-logs-view"/> for details.
> I think this can just be written like "showing usage information about the
> undo log" just like other statistics views. That way, we can avoid having
> to modify this sentence when we want to change the content of the view
> later.


> (5)
> + <entry><structfield>discard</structfield></entry>
> + <entry><type>text</type></entry>
> + <entry>Location of the oldest data in this undo log.</entry>
> The name does not match the description intuitively. Maybe "oldest"?

Discarding of the undo log is an important term used in the code.

> BTW, how does this information help users? (I don't mean to say we
> shouldn't output information that users cannot interpret; other DBMSs output
> such information probably for technical support staff.)

It's for DBA rather than a user. The value indicates whether discarding is
working well or if it's blocked for some reason. If the latter happens, the
undo log can pile up and consume too much disk space.

> (6)
> + <entry><structfield>insert</structfield></entry>
> + <entry><type>text</type></entry>
> + <entry>Location where the next data will be written in this undo
> + log.</entry>
> ...
> + <entry><structfield>end</structfield></entry>
> + <entry><type>text</type></entry>
> + <entry>Location one byte past the end of the allocated physical storage
> + backing this undo log.</entry>
> Again, how can these be used? If they are useful to calculate the amount of used space, shouldn't they be bigint?

bigint is signed, so it cannot express 64-bit number. I think this deserves a
new SQL type for the undo pointer, like pg_lsn for XLOG.

> (7)
> @@ -65,7 +65,7 @@
> <structfield>smgrid</structfield> <type>integer</type>
> </para>
> <para>
> - Block storage manager ID. 0 for regular relation data.</entry>
> + Block storage manager ID. 0 for regular relation data.
> </para></entry>
> </row>
> I guess this change is mistakenly included?


> (8)
> diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
> @@ -216,6 +216,7 @@ Complete list of usable sgml source files in this directory.
> <!ENTITY pgtesttiming SYSTEM "pgtesttiming.sgml">
> <!ENTITY pgupgrade SYSTEM "pgupgrade.sgml">
> <!ENTITY pgwaldump SYSTEM "pg_waldump.sgml">
> +<!ENTITY pgundodump SYSTEM "pg_undo_dump.sgml">
> <!ENTITY postgres SYSTEM "postgres-ref.sgml">
> @@ -286,6 +286,7 @@
> &pgtesttiming;
> &pgupgrade;
> &pgwaldump;
> + &pgundodump;
> &postgres;
> It looks like this list needs to be ordered alphabetically. So, the new line is better placed between pg_test_timing and pg_upgrade?


> (9)
> I don't want to be disliked because of being picky, but maybe pg_undo_dump should be pg_undodump. Existing commands don't use '_' to separate words after pg_, except for pg_test_fsync and pg_test_timing.


> (10)
> + This utility can only be run by the user who installed the server, because
> + it requires read-only access to the data directory.
> I guess you copied this from pg_waldump or pg_resetwal, but I'm afraid this should be as follows, which is an excerpt from pg_controldata's page. (The pages for pg_waldump and pg_resetwal should be fixed in a separate thread.)
> This utility can only be run by the user who initialized the cluster because it requires read access to the data directory.


> (11)
> + The <option>-m</option> option cannot be used if
> + either <option>-c</option> or <option>-l</option> is used.
> -l -> -r


> Or, why don't we align option characters with pg_waldump? pg_waldump uses -r to filter by rmgr. pg_undodump can output record contents by default like pg_waldump. Considering pg_dump and pg_dumpall also output all data by default, that seems how PostgreSQL commands behave.

I've made the -r value (print out the undo records) the default, will consider
using -r for filtering by rmgr.

> (12)
> + <arg choice="opt"><option>startseg</option><arg choice="opt"><option>endseg</option></arg></arg>
> startseg and endseg are not described.

Fixed. (Of course, this is an evidence that I used pg_waldump as a skeleton

> (13)
> +Undo files backing undo logs in the default tablespace are stored under
> ...
> +Undo log files contain standard page headers as described in the next section,
> Fluctuations in expressions can be seen: undo file and undo log file. I think the following "undo data file" fits best. What do you think?
> + <entry><literal>UndoFileRead</literal></entry>
> + <entry>Waiting for a read from an undo data file.</entry>

"Undo files backing undo logs ..."

My feeling is that "data files" would be distracting here. I think the point
of this sentence is simply that something resides in a file.

"Undo log files contain standard page headers as described in the next section"

I'm not opposed to "data files" here as there are also other kinds of files
written by undo (at least the metadata written during checkpoint). Changed.

> (14)
> +Undo data exists in a 64-bit address space divided into 2^34 undo
> +logs, each with a theoretical capacity of 1TB. The first time a
> +backend writes undo, it attaches to an existing undo log whose
> +capacity is not yet exhausted and which is not currently being used by
> +any other backend; or if no suitable undo log already exists, it
> +creates a new one. To avoid wasting space, each undo log is further
> +divided into 1MB segment files, so that segments which are no longer
> +needed can be removed (possibly recycling the underlying file by
> +renaming it) and segments which are not yet needed do not need to be
> +physically created on disk. An undo segment file has a name like
> +<filename>000004.0001200000</filename>, where
> +<filename>000004</filename> is the undo log number and
> +<filename>0001200000</filename> is the offset of the first byte
> +held in the file.
> The number of undo logs is not 2^34 but 2^24 (2^64 - 2^40 (1 TB)).


> (15) src/backend/access/undo/README
> \ No newline at end of file
> Let's add a newline.


Antonin Houska

Attachment Content-Type Size
undo-20210630.tgz application/gzip 196.3 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2021-06-30 07:45:47 Re: pgbench logging broken by time logic changes
Previous Message Dilip Kumar 2021-06-30 06:33:55 Re: Teach pg_receivewal to use lz4 compression