Re: Replication identifiers, take 4

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Simon Riggs <simon(dot)riggs(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Steve Singer <steve(at)ssinger(dot)info>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Replication identifiers, take 4
Date: 2015-04-17 17:12:32
Message-ID: 55313F00.7010609@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04/17/2015 12:04 PM, Simon Riggs wrote:
> On 17 April 2015 at 09:54, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
>> Hrmpf. Says the person that used a lot of padding, without much
>> discussion, for the WAL level infrastructure making pg_rewind more
>> maintainable.
>
> Sounds bad. What padding are we talking about?

In the new WAL format, the data chunks are stored unaligned, without
padding, to save space. The new format is quite different to the old
one, so it's not straightforward to compare how much that saved. The
fixed-size XLogRecord header is 8 bytes shorter in the new format,
because it doesn't have the xl_len field anymore. But the same
information is stored elsewhere in the record, where it takes 2 or 5
bytes (XLogRecordDataHeaderShort/Long).

But it's a fair point that we could've just made small adjustments to
the old format, without revamping every record type and the way the
block information is stored, and that the space saving of the new format
should be compared with that instead, for a fair comparison.

As an example, one simple thing we could've done with the old format:
remove xl_len, and store the length in place of the two unused padding
bytes instead, as long as it fits in 16 bits. For longer records, set a
flag and store it right after XLogRecord header. For practically all WAL
records, that would've shrunk XLogRecord from 32 to 24 bytes, and made
each record 8 bytes shorter.

I ran the same pgbench test Andres used, with scale 10, and 50000
transactions, and compared the WAL size between master and 9.4:

master: 20738352
9.4: 23915800

According to pg_xlogdump, there were 301153 WAL records. If you take the
9.4 figure, and imagine that we had saved those 8 bytes on each WAL
record, 9.4 would've been 21506576 bytes instead. So yeah, we could've
achieved much of the WAL savings with that much smaller change. That's a
useful thing to compare with.

BTW, those numbers are with wal_level=minimal. With wal_level=logical,
the WAL size from the same test on master was 26503520 bytes. That's
quite a bump. Looking at pg_xlogdump output, it seems that it's all
because the commit records are wider.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2015-04-17 17:36:48 Re: Replication identifiers, take 4
Previous Message Peter Geoghegan 2015-04-17 16:29:15 Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0