RE: POC: Cleaning up orphaned files using undo logs

From: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To: Antonin Houska <ah(at)cybertec(dot)at>
Cc: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: POC: Cleaning up orphaned files using undo logs
Date: 2021-03-09 09:19:47
Message-ID: TYAPR01MB2990D438596AF4428ABA1888FE929@TYAPR01MB2990.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I'm crawling like a snail to read the patch set. Below are my first set of review comments, which are all minor.

(1)
+ <indexterm><primary>tablespace</primary><secondary>temporary</secondary></indexterm>

temporary -> undo

(2)
<term><varname>undo_tablespaces</varname> (<type>string</type>)
+
...
+ The value is a list of names of tablespaces. When there is more than
+ one name in the list, <productname>PostgreSQL</productname> chooses an
+ arbitrary one. If the name doesn't correspond to an existing
+ tablespace, the next name is tried, and so on until all names have
+ been tried. If no valid tablespace is specified, an error is raised.
+ The validation of the name doesn't happen until the first attempt to
+ write undo data.

CREATE privilege needs to be mentioned like temp_tablespaces.

(3)
+ The variable can only be changed before the first statement is
+ executed in a transaction.

Does it include any first statement that doesn't emit undo?

(4)
+ <entry>One row for each undo log, showing current pointers,
+ transactions and backends.
+ See <xref linkend="pg-stat-undo-logs-view"/> for details.

I think this can just be written like "showing usage information about the undo log" just like other statistics views. That way, we can avoid having to modify this sentence when we want to change the content of the view later.

(5)
+ <entry><structfield>discard</structfield></entry>
+ <entry><type>text</type></entry>
+ <entry>Location of the oldest data in this undo log.</entry>

The name does not match the description intuitively. Maybe "oldest"?

BTW, how does this information help users? (I don't mean to say we shouldn't output information that users cannot interpret; other DBMSs output such information probably for technical support staff.)

(6)
+ <entry><structfield>insert</structfield></entry>
+ <entry><type>text</type></entry>
+ <entry>Location where the next data will be written in this undo
+ log.</entry>
...
+ <entry><structfield>end</structfield></entry>
+ <entry><type>text</type></entry>
+ <entry>Location one byte past the end of the allocated physical storage
+ backing this undo log.</entry>

Again, how can these be used? If they are useful to calculate the amount of used space, shouldn't they be bigint?

(7)
@@ -65,7 +65,7 @@
<structfield>smgrid</structfield> <type>integer</type>
</para>
<para>
- Block storage manager ID. 0 for regular relation data.</entry>
+ Block storage manager ID. 0 for regular relation data.
</para></entry>
</row>

I guess this change is mistakenly included?

(8)
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
@@ -216,6 +216,7 @@ Complete list of usable sgml source files in this directory.
<!ENTITY pgtesttiming SYSTEM "pgtesttiming.sgml">
<!ENTITY pgupgrade SYSTEM "pgupgrade.sgml">
<!ENTITY pgwaldump SYSTEM "pg_waldump.sgml">
+<!ENTITY pgundodump SYSTEM "pg_undo_dump.sgml">
<!ENTITY postgres SYSTEM "postgres-ref.sgml">

@@ -286,6 +286,7 @@
&pgtesttiming;
&pgupgrade;
&pgwaldump;
+ &pgundodump;
&postgres;

It looks like this list needs to be ordered alphabetically. So, the new line is better placed between pg_test_timing and pg_upgrade?

(9)
I don't want to be disliked because of being picky, but maybe pg_undo_dump should be pg_undodump. Existing commands don't use '_' to separate words after pg_, except for pg_test_fsync and pg_test_timing.

(10)
+ This utility can only be run by the user who installed the server, because
+ it requires read-only access to the data directory.

I guess you copied this from pg_waldump or pg_resetwal, but I'm afraid this should be as follows, which is an excerpt from pg_controldata's page. (The pages for pg_waldump and pg_resetwal should be fixed in a separate thread.)

This utility can only be run by the user who initialized the cluster because it requires read access to the data directory.

(11)
+ The <option>-m</option> option cannot be used if
+ either <option>-c</option> or <option>-l</option> is used.

-l -> -r
Or, why don't we align option characters with pg_waldump? pg_waldump uses -r to filter by rmgr. pg_undodump can output record contents by default like pg_waldump. Considering pg_dump and pg_dumpall also output all data by default, that seems how PostgreSQL commands behave.

(12)
+ <arg choice="opt"><option>startseg</option><arg choice="opt"><option>endseg</option></arg></arg>

startseg and endseg are not described.

(13)
+Undo files backing undo logs in the default tablespace are stored under
...
+Undo log files contain standard page headers as described in the next section,

Fluctuations in expressions can be seen: undo file and undo log file. I think the following "undo data file" fits best. What do you think?

+ <entry><literal>UndoFileRead</literal></entry>
+ <entry>Waiting for a read from an undo data file.</entry>

(14)
+Undo data exists in a 64-bit address space divided into 2^34 undo
+logs, each with a theoretical capacity of 1TB. The first time a
+backend writes undo, it attaches to an existing undo log whose
+capacity is not yet exhausted and which is not currently being used by
+any other backend; or if no suitable undo log already exists, it
+creates a new one. To avoid wasting space, each undo log is further
+divided into 1MB segment files, so that segments which are no longer
+needed can be removed (possibly recycling the underlying file by
+renaming it) and segments which are not yet needed do not need to be
+physically created on disk. An undo segment file has a name like
+<filename>000004.0001200000</filename>, where
+<filename>000004</filename> is the undo log number and
+<filename>0001200000</filename> is the offset of the first byte
+held in the file.

The number of undo logs is not 2^34 but 2^24 (2^64 - 2^40 (1 TB)).

(15) src/backend/access/undo/README
\ No newline at end of file

Let's add a newline.

Regards
Takayuki Tsunakawa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2021-03-09 09:29:34 Re: shared-memory based stats collector
Previous Message Pavel Stehule 2021-03-09 09:18:01 Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]