RE: Disable WAL logging to speed up data loading

From: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
Cc: 'Fujii Masao' <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "ashutosh(dot)bapat(dot)oss(at)gmail(dot)com" <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Disable WAL logging to speed up data loading
Date: 2020-10-28 08:46:02
Message-ID: TYAPR01MB2990CB7F6AF30EA037B8A19EFE170@TYAPR01MB2990.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Osumi, Takamichi/大墨 昂道 <osumi(dot)takamichi(at)fujitsu(dot)com>
> I wrote and attached the first patch to disable WAL logging.
> This patch passes the regression test of check-world already and is formatted

I think make check-world uses the default setting for wal_level. You need to set wal_level = none and do make installcheck-world.

> I compared two wal_levels both 'minimal' and 'none'.
> For both levels, I measured
> (1) cluster's restore from pg_dumpall output,
> (2) COPY initial data to a defined table as initial data loading, and
> (3) COPY data to a defined table with tuples, 3 times each for all cases.
> After that, calculated the average and the ratio of the loading speed.
> The conclusion is that wal_level=none cuts about 20% of the loading speed
> compared to 'minimal' in the three cases above.

Hmm. I wonder why pg_dumpall's performance increases by as much as 20%. On the contrary to my previous hope, pg_dumpall uses COPY to restore data, so it doesn't emit WAL when wal_level = minimal. (Is it brought by the difference of whether DDL's WAL is emitted or not?)

> Sharing other scenario to measure is welcome.

How about running multiple concurrent data loading sessions when adding data to existing tables with data, so that WAL is the bottleneck? That's the use case of the target customer, isn't it?

> The input data was generated from pgbench with 1000 scale factor.
> It's about 9.3GB. For the table definition or the initial data for appended data

IIRC, I thought the scale factor of 1,000 is 1.5 GB. What displayed 9.3 GB? SELECT pg_database_size() or something?

Below are review comments:

(1)
@@ -449,6 +449,13 @@ XLogInsert(RmgrId rmid, uint8 info)
return EndPos;
}

+ /* Issues WAL only for transaction end and check point */
+ if (wal_level == WAL_LEVEL_NONE && rmid != RM_XLOG_ID)
+ {
+ XLogResetInsertion();
+ return GetLatestCheckPointLSN();
+ }
+
do

This does not emit transaction completion WAL records. Their RM ID is RM_XACT_ID. Also, RM_XLOG_ID includes other kinds of WAL records than the checkpoint WAL record. Correct the comment accordingly. I don't have a good idea on how to represent the RM_XLOG_ID,, but the following might help:

[rmgrlist.h]
/* symbol name, textual name, redo, desc, identify, startup, cleanup */
PG_RMGR(RM_XLOG_ID, "XLOG", xlog_redo, xlog_desc, xlog_identify, NULL, NULL, NULL)
PG_RMGR(RM_XACT_ID, "Transaction", xact_redo, xact_desc, xact_identify, NULL, NULL, NULL)

(2)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
@@ -2591,10 +2591,10 @@ include_dir 'conf.d'
data to support WAL archiving and replication, including running
read-only queries on a standby server. <literal>minimal</literal> removes all
logging except the information required to recover from a crash or
- immediate shutdown. Finally,
+ immediate shutdown. <literal>none</literal> generates no WAL in any case. Finally,

According to the previous code, "none" emits some kinds of WAL records. So I think we need to come up with a good name and/or description.

(3)
<literal>logical</literal> adds information necessary to support logical
- decoding. Each level includes the information logged at all lower
- levels. This parameter can only be set at server start.
+ decoding. Each level except for <literal>none</literal> includes the
+ information logged at all lower levels. This parameter can only be set at server start.
</para>

Why is this change necessary?

(4)
+ On the other hand, an unexpected crash of the server makes the database cluster
+ inconsistent. For that reason, before utilizing this level, get a full backup of the cluster and
+ backup of the entire operations that are done under the condition that
+ <varname>wal_level</varname> is <literal>none</literal>.

This gives the impression that the user can start the database server and see inconsistent data. The reality is that the database server does not start, isn't it?

(5)
@@ -1751,7 +1752,8 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
Aside from avoiding the time for the archiver or WAL sender to process the
WAL data, doing this will actually make certain commands faster, because
they do not to write WAL at all if <varname>wal_level</varname>
- is <literal>minimal</literal> and the current subtransaction (or top-level
+ is either <literal>minimal</literal> or <literal>minimal</literal>
+ and the current subtransaction (or top-level
transaction) created or truncated the table or index they change. (They
can guarantee crash safety more cheaply by doing
an <function>fsync</function> at the end than by writing WAL.)

This is not correct. In minimal, some conditions need to hold true for WAL to not be generated as described above. OTOH, wal_level = none does not generate WAL unconditionally.

(6)
14.4.9. Some Notes about pg_dump
...
If using WAL archiving or streaming replication, consider disabling them during the restore. To do that, set archive_mode to off, wal_level to minimal, and max_wal_senders to zero before loading the dump. Afterwards, set them back to the right values and take a fresh base backup.

Why don't you refer to wal_level = none here as well?

(7)
@@ -918,10 +918,13 @@ PostmasterMain(int argc, char *argv[])
ReservedBackends, MaxConnections);
ExitPostmaster(1);
}
- if (XLogArchiveMode > ARCHIVE_MODE_OFF && wal_level == WAL_LEVEL_MINIMAL)
+ if ((XLogArchiveMode > ARCHIVE_MODE_OFF && wal_level == WAL_LEVEL_NONE) ||
+ (XLogArchiveMode > ARCHIVE_MODE_OFF && wal_level == WAL_LEVEL_MINIMAL))
ereport(ERROR,
- (errmsg("WAL archival cannot be enabled when wal_level is \"minimal\"")));
- if (max_wal_senders > 0 && wal_level == WAL_LEVEL_MINIMAL)
+ (errmsg("WAL archival cannot be enabled when wal_level is \"%s\"",
+ wal_level == WAL_LEVEL_MINIMAL ? "minimal" : "none")));
+ if ((max_wal_senders > 0 && wal_level == WAL_LEVEL_NONE) ||
+ (max_wal_senders > 0 && wal_level == WAL_LEVEL_MINIMAL))
ereport(ERROR,
(errmsg("WAL streaming (max_wal_senders > 0) requires wal_level \"replica\" or \"logical\"")));

This style of writing conitions is redundant. You can just change the code to wal_level <= WAL_LEVEL_MINIMAL.
Also, the first message can be '"minimal" or "none"' like the second one.

(8)
@@ -989,6 +992,12 @@ PostmasterMain(int argc, char *argv[])
LocalProcessControlFile(false);

/*
+ * Check some conditions specific to wal_level='none' and ensures the
+ * database isn't inconsistent.
+ */
+ SafelyFinishedNoneLevel();
+
+ /*

This check should be moved to around the beginning of StartupXLOG(). PostmasterMain() is called in multi-user mode. It's not called in single-user mode (postgres --single).
The new function is not necessary. StartupXLOG() can see the control file contents directly.

(9)
+ /*
+ * Detect if we previously crashed under wal_level='none' or not.
+ */
+ unexpected_shutdown = ControlFile->state != DB_SHUTDOWNED &&
+ ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY;
+ if ((ControlFile->wal_level == WAL_LEVEL_NONE && unexpected_shutdown))
+ {
+ ereport(ERROR,
+ (errmsg("Server was unexpectedly shut down when WAL logging was disabled"),
+ errhint("It looks like you need to deploy a new cluster from your full backup again.")));
+ }
+}

Refine the message according to the following message guideline. For example, the primary message has to start with a lower case letter.

https://www.postgresql.org/docs/devel/source.html

(10)
You need to edit this to add none:
src/backend/utils/misc/postgresql.conf.sample

(11)
src/include/access/xlogdefs.h
src/backend/access/transam/varsup.c

Consider modifying the comments in these files that refer to wal_level. Maybe wal_level <= minimal is enough?

(12)
src/include/utils/rel.h

Modify the RelationNeedsWAL() so that it returns false when wal_level = none. Adding wal_level != WAL_LEVEL_NONE is would probably be okay.

(13)
@@ -161,7 +161,8 @@ extern int XLogArchiveMode;
/* WAL levels */
typedef enum WalLevel
{
- WAL_LEVEL_MINIMAL = 0,
+ WAL_LEVEL_NONE = 0,
+ WAL_LEVEL_MINIMAL,
WAL_LEVEL_REPLICA,
WAL_LEVEL_LOGICAL
} WalLevel;

I'm a bit concerned about if we can change the values of existing symbols, because wal_level setting is stored in pg_control file. Having a quick look at source code, there seems to be no problem. However, I'm not sure about pg_upgrade. Can you try, for example,

1. Create the database cluster with an older version, say, PG 13.
2. Start and stop the database server.
3. Run pg_controldata and see that it prints replica for the wal_level value.
4. Upgrade the database cluster with pg_upgrade.
5. Run pg_controldata and see the wal_level value.

If you change the values of existing symbols like your patch, you need to bump PG_CONTROL_VERSION.
If you find we can't tchange the existing values, you can probably set WAL_LEVEL_NONE to -1.

Regards
Takayuki Tsunakawa

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Anastasia Lubennikova 2020-10-28 09:01:58 Re: Implementing Incremental View Maintenance
Previous Message seinoyu 2020-10-28 08:31:14 Re: [PATCH] Add features to pg_stat_statements