Re: Disable WAL logging to speed up data loading

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "ashutosh(dot)bapat(dot)oss(at)gmail(dot)com" <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Disable WAL logging to speed up data loading
Date: 2020-11-09 17:05:21
Message-ID: CAKFQuwZEEJ8o5E4aMvOkMhGR6oAqvD=QVigDZz19SXcn2Ncu_w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 9, 2020 at 8:18 AM Stephen Frost <sfrost(at)snowman(dot)net> wrote:

> Presently, my feeling is that we could address this use-case without
>
having to introduce a new cluster-wide WAL level, and that's the
> direction I'd want to see this going. Perhaps I'm missing something
> about why the approach I've set forth above wouldn't work, and
> wal_level=none would, but I've not seen it yet.
>
>
+1

We are trying to address a performance optimization for an insert-only
scenario on a limited set of tables by placing the entire cluster in a
dangerous state. The "copy table unlogged" solution is definitely closer
to what we want - this is
demonstrably worse.

For this case the fundamental feature that would seem to be required is an
ability for a transaction commit to return only after the system has
ensured that all of the new pages added to the relation during the scope of
the transaction have made it to disk. Something like:

BEGIN UNLOGGED TRANSACTION FOR table1, table2;
-- locking probably allows reads, definitely disallows concurrent writes,
to the named tables
-- Disallow updates and deletes, do not use dead tuple space, for the
tables named. Should be able to do normal stuff for other tables?
-- Always create new pages
COPY TO table1;
COPY TO table2;
COMMIT; -- wait here until data files for table1 and table2 are completely
written and the transaction alive flag is committed to the WAL.

I suppose the above could be written "BEGIN UNLOGGED TRANSACTION FOR ALL
TABLES" and you'd get the initial database population optimization
capability.

If the commit doesn't complete all of the newly created pages are junk.
Otherwise, you have a crash-recoverable state for those tables as regards
those specific pages.

Conceptually, we need an ability to perform a partial CHECKPOINT that names
specific tables, and make sure the crash-recovery works for those tables
while figuring out what amount of effort to expend on informing the dba and
alerting/preventing features that require wal from using those tables.

David J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2020-11-09 17:20:50 Re: Deleting older versions in unique indexes to avoid page splits
Previous Message Alexey Kondratov 2020-11-09 16:45:18 Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm