Re: [REVIEW] Re: Compression of full-page-writes

From: "Syed, Rahila" <Rahila(dot)Syed(at)nttdata(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [REVIEW] Re: Compression of full-page-writes
Date: 2015-02-18 14:23:10
Message-ID: C3C878A2070C994B9AE61077D46C3846589AF06E@MAIL703.KDS.KEANE.COM
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

>I think we should change the xlog format so that the block_id (which currently is XLR_BLOCK_ID_DATA_SHORT/LONG or a actual block id) isn't the block id but something like XLR_CHUNK_ID. Which is used as is for XLR_CHUNK_ID_DATA_SHORT/LONG, but for backup blocks can be set to to >XLR_CHUNK_BKP_WITH_HOLE, XLR_CHUNK_BKP_COMPRESSED, XLR_CHUNK_BKP_REFERENCE... The BKP blocks will then follow, storing the block id following the chunk id.

>Yes, that'll increase the amount of data for a backup block by 1 byte, but I think that's worth it. I'm pretty sure we will be happy about the added extensibility pretty soon.

To clarify my understanding of the above change,

Instead of a block id to reference different fragments of an xlog record , a single byte field "chunk_id" should be used. chunk_id will be same as XLR_BLOCK_ID_DATA_SHORT/LONG for main data fragments.
But for block references, it will take store following values in order to store information about the backup blocks.
#define XLR_CHUNK_BKP_COMPRESSED 0x01
#define XLR_CHUNK_BKP_WITH_HOLE 0x02
...

The new xlog format should look like follows,

Fixed-size header (XLogRecord struct)
Chunk_id(add a field before id field in XLogRecordBlockHeader struct)
XLogRecordBlockHeader
Chunk_id
XLogRecordBlockHeader
...
...
Chunk_id ( rename id field of the XLogRecordDataHeader struct)
XLogRecordDataHeader[Short|Long]
block data
block data
...
main data

I will post a patch based on this.

Thank you,
Rahila Syed

-----Original Message-----
From: Andres Freund [mailto:andres(at)2ndquadrant(dot)com]
Sent: Monday, February 16, 2015 5:26 PM
To: Syed, Rahila
Cc: Michael Paquier; Fujii Masao; PostgreSQL mailing lists
Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

On 2015-02-16 11:30:20 +0000, Syed, Rahila wrote:
> - * As a trivial form of data compression, the XLOG code is aware that
> - * PG data pages usually contain an unused "hole" in the middle,
> which
> - * contains only zero bytes. If hole_length > 0 then we have removed
> - * such a "hole" from the stored data (and it's not counted in the
> - * XLOG record's CRC, either). Hence, the amount of block data
> actually
> - * present is BLCKSZ - hole_length bytes.
> + * Block images are able to do several types of compression:
> + * - When wal_compression is off, as a trivial form of compression,
> + the
> + * XLOG code is aware that PG data pages usually contain an unused "hole"
> + * in the middle, which contains only zero bytes. If length < BLCKSZ
> + * then we have removed such a "hole" from the stored data (and it is
> + * not counted in the XLOG record's CRC, either). Hence, the amount
> + * of block data actually present is "length" bytes. The hole "offset"
> + * on page is defined using "hole_offset".
> + * - When wal_compression is on, block images are compressed using a
> + * compression algorithm without their hole to improve compression
> + * process of the page. "length" corresponds in this case to the
> + length
> + * of the compressed block. "hole_offset" is the hole offset of the
> + page,
> + * and the length of the uncompressed block is defined by
> + "raw_length",
> + * whose data is included in the record only when compression is
> + enabled
> + * and "with_hole" is set to true, see below.
> + *
> + * "is_compressed" is used to identify if a given block image is
> + compressed
> + * or not. Maximum page size allowed on the system being 32k, the
> + hole
> + * offset cannot be more than 15-bit long so the last free bit is
> + used to
> + * store the compression state of block image. If the maximum page
> + size
> + * allowed is increased to a value higher than that, we should
> + consider
> + * increasing this structure size as well, but this would increase
> + the
> + * length of block header in WAL records with alignment.
> + *
> + * "with_hole" is used to identify the presence of a hole in a block image.
> + * As the length of a block cannot be more than 15-bit long, the
> + extra bit in
> + * the length field is used for this identification purpose. If the
> + block image
> + * has no hole, it is ensured that the raw size of a compressed block
> + image is
> + * equal to BLCKSZ, hence the contents of
> + XLogRecordBlockImageCompressionInfo
> + * are not necessary.
> */
> typedef struct XLogRecordBlockImageHeader {
> - uint16 hole_offset; /* number of bytes before "hole" */
> - uint16 hole_length; /* number of bytes in "hole" */
> + uint16 length:15, /* length of block data in record */
> + with_hole:1; /* status of hole in the block */
> +
> + uint16 hole_offset:15, /* number of bytes before "hole" */
> + is_compressed:1; /* compression status of image */
> +
> + /* Followed by the data related to compression if block is
> +compressed */
> } XLogRecordBlockImageHeader;

Yikes, this is ugly.

I think we should change the xlog format so that the block_id (which currently is XLR_BLOCK_ID_DATA_SHORT/LONG or a actual block id) isn't the block id but something like XLR_CHUNK_ID. Which is used as is for XLR_CHUNK_ID_DATA_SHORT/LONG, but for backup blocks can be set to to XLR_CHUNK_BKP_WITH_HOLE, XLR_CHUNK_BKP_COMPRESSED, XLR_CHUNK_BKP_REFERENCE... The BKP blocks will then follow, storing the block id following the chunk id.

Yes, that'll increase the amount of data for a backup block by 1 byte, but I think that's worth it. I'm pretty sure we will be happy about the added extensibility pretty soon.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

______________________________________________________________________
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2015-02-18 14:28:49 Re: pgaudit - an auditing extension for PostgreSQL
Previous Message Stephen Frost 2015-02-18 13:52:16 Re: pgaudit - an auditing extension for PostgreSQL