Re: [REVIEW] Re: Compression of full-page-writes

From: Rahila Syed <rahilasyed90(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Subject: Re: [REVIEW] Re: Compression of full-page-writes
Date: 2015-02-23 08:28:04
Message-ID: CAH2L28shN4m65HYR9Khz=cGj0OC1O95gqRU_bi3WxxHJMHM6bA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

Attached is a patch which has following changes,

As suggested above block ID in xlog structs has been replaced by chunk ID.
Chunk ID is used to distinguish between different types of xlog record
fragments.
Like,
XLR_CHUNK_ID_DATA_SHORT
XLR_CHUNK_ID_DATA_LONG
XLR_CHUNK_BKP_COMPRESSED
XLR_CHUNK_BKP_WITH_HOLE

In block references, block ID follows the chunk ID. Here block ID retains
its functionality.
This approach increases data by 1 byte for each block reference in an xlog
record. This approach separates ID referring different fragments of xlog
record from the actual block ID which is used to refer block references in
xlog record.

Following are WAL numbers for each scenario,

WAL
FPW compression on 121.652 MB

FPW compression off 148.998 MB

HEAD 148.764 MB

Compression remains nearly same as before. There is some difference in WAL
between HEAD and HEAD+patch+compression OFF. This difference corresponds to
1 byte increase with each block reference of xlog record.

Thank you,
Rahila Syed

On Wed, Feb 18, 2015 at 7:53 PM, Syed, Rahila <Rahila(dot)Syed(at)nttdata(dot)com>
wrote:

> Hello,
>
> >I think we should change the xlog format so that the block_id (which
> currently is XLR_BLOCK_ID_DATA_SHORT/LONG or a actual block id) isn't the
> block id but something like XLR_CHUNK_ID. Which is used as is for
> XLR_CHUNK_ID_DATA_SHORT/LONG, but for backup blocks can be set to to
> >XLR_CHUNK_BKP_WITH_HOLE, XLR_CHUNK_BKP_COMPRESSED,
> XLR_CHUNK_BKP_REFERENCE... The BKP blocks will then follow, storing the
> block id following the chunk id.
>
> >Yes, that'll increase the amount of data for a backup block by 1 byte,
> but I think that's worth it. I'm pretty sure we will be happy about the
> added extensibility pretty soon.
>
> To clarify my understanding of the above change,
>
> Instead of a block id to reference different fragments of an xlog record ,
> a single byte field "chunk_id" should be used. chunk_id will be same as
> XLR_BLOCK_ID_DATA_SHORT/LONG for main data fragments.
> But for block references, it will take store following values in order to
> store information about the backup blocks.
> #define XLR_CHUNK_BKP_COMPRESSED 0x01
> #define XLR_CHUNK_BKP_WITH_HOLE 0x02
> ...
>
> The new xlog format should look like follows,
>
> Fixed-size header (XLogRecord struct)
> Chunk_id(add a field before id field in XLogRecordBlockHeader struct)
> XLogRecordBlockHeader
> Chunk_id
> XLogRecordBlockHeader
> ...
> ...
> Chunk_id ( rename id field of the XLogRecordDataHeader struct)
> XLogRecordDataHeader[Short|Long]
> block data
> block data
> ...
> main data
>
> I will post a patch based on this.
>
> Thank you,
> Rahila Syed
>
> -----Original Message-----
> From: Andres Freund [mailto:andres(at)2ndquadrant(dot)com]
> Sent: Monday, February 16, 2015 5:26 PM
> To: Syed, Rahila
> Cc: Michael Paquier; Fujii Masao; PostgreSQL mailing lists
> Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
>
> On 2015-02-16 11:30:20 +0000, Syed, Rahila wrote:
> > - * As a trivial form of data compression, the XLOG code is aware that
> > - * PG data pages usually contain an unused "hole" in the middle,
> > which
> > - * contains only zero bytes. If hole_length > 0 then we have removed
> > - * such a "hole" from the stored data (and it's not counted in the
> > - * XLOG record's CRC, either). Hence, the amount of block data
> > actually
> > - * present is BLCKSZ - hole_length bytes.
> > + * Block images are able to do several types of compression:
> > + * - When wal_compression is off, as a trivial form of compression,
> > + the
> > + * XLOG code is aware that PG data pages usually contain an unused
> "hole"
> > + * in the middle, which contains only zero bytes. If length < BLCKSZ
> > + * then we have removed such a "hole" from the stored data (and it is
> > + * not counted in the XLOG record's CRC, either). Hence, the amount
> > + * of block data actually present is "length" bytes. The hole "offset"
> > + * on page is defined using "hole_offset".
> > + * - When wal_compression is on, block images are compressed using a
> > + * compression algorithm without their hole to improve compression
> > + * process of the page. "length" corresponds in this case to the
> > + length
> > + * of the compressed block. "hole_offset" is the hole offset of the
> > + page,
> > + * and the length of the uncompressed block is defined by
> > + "raw_length",
> > + * whose data is included in the record only when compression is
> > + enabled
> > + * and "with_hole" is set to true, see below.
> > + *
> > + * "is_compressed" is used to identify if a given block image is
> > + compressed
> > + * or not. Maximum page size allowed on the system being 32k, the
> > + hole
> > + * offset cannot be more than 15-bit long so the last free bit is
> > + used to
> > + * store the compression state of block image. If the maximum page
> > + size
> > + * allowed is increased to a value higher than that, we should
> > + consider
> > + * increasing this structure size as well, but this would increase
> > + the
> > + * length of block header in WAL records with alignment.
> > + *
> > + * "with_hole" is used to identify the presence of a hole in a block
> image.
> > + * As the length of a block cannot be more than 15-bit long, the
> > + extra bit in
> > + * the length field is used for this identification purpose. If the
> > + block image
> > + * has no hole, it is ensured that the raw size of a compressed block
> > + image is
> > + * equal to BLCKSZ, hence the contents of
> > + XLogRecordBlockImageCompressionInfo
> > + * are not necessary.
> > */
> > typedef struct XLogRecordBlockImageHeader {
> > - uint16 hole_offset; /* number of bytes before "hole" */
> > - uint16 hole_length; /* number of bytes in "hole" */
> > + uint16 length:15, /* length of block data in
> record */
> > + with_hole:1; /* status of hole in the
> block */
> > +
> > + uint16 hole_offset:15, /* number of bytes before "hole" */
> > + is_compressed:1; /* compression status of image */
> > +
> > + /* Followed by the data related to compression if block is
> > +compressed */
> > } XLogRecordBlockImageHeader;
>
> Yikes, this is ugly.
>
> I think we should change the xlog format so that the block_id (which
> currently is XLR_BLOCK_ID_DATA_SHORT/LONG or a actual block id) isn't the
> block id but something like XLR_CHUNK_ID. Which is used as is for
> XLR_CHUNK_ID_DATA_SHORT/LONG, but for backup blocks can be set to to
> XLR_CHUNK_BKP_WITH_HOLE, XLR_CHUNK_BKP_COMPRESSED,
> XLR_CHUNK_BKP_REFERENCE... The BKP blocks will then follow, storing the
> block id following the chunk id.
>
> Yes, that'll increase the amount of data for a backup block by 1 byte, but
> I think that's worth it. I'm pretty sure we will be happy about the added
> extensibility pretty soon.
>
> Greetings,
>
> Andres Freund
>
> --
> Andres Freund http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>
> ______________________________________________________________________
> Disclaimer: This email and any attachments are sent in strictest confidence
> for the sole use of the addressee and may contain legally privileged,
> confidential, and proprietary data. If you are not the intended recipient,
> please advise the sender by replying promptly to this email and then delete
> and destroy this email and any attachments without any further use, copying
> or forwarding.
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Attachment Content-Type Size
Support-compression-for-full-page-writes-in-WAL_v20.patch application/octet-stream 30.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rushabh Lathia 2015-02-23 08:57:05 Re: pg_dump gets attributes from tables in extensions
Previous Message Michael Paquier 2015-02-23 06:33:38 Re: Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]