Re: [Proposal] Page Compression for OLTP

From: chenhj <chjischj(at)163(dot)com>
To: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [Proposal] Page Compression for OLTP
Date: 2020-12-09 21:44:18
Message-ID: 4b3ebc9b.118.17649764c77.Coremail.chjischj@163.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

I further improved this Patch, adjusted some of the design, and added related modifications
(pg_rewind,replication,checksum,backup) and basic tests. Any suggestions are welcome.

this patch can also be obtained from here
https://github.com/ChenHuajun/postgres/tree/page_compress_14

# 1. Page storage

The compressed data block is stored in one or more chunks of the compressed data file,
and the size of each chunk is 1/8, 1/4, or 1/2 block size.
The storage location of each compressed data block is represented by an array of chunkno
and stored in the compressed address file.

## 1.1 page compressed address file(_pca)

blk0 1 2 3
+=======+=======+=======+=======+=======+
| head | 1 | 2 | 3,4 | 5 |
+=======+=======+=======+=======+=======+

## 1.2 page compressed data file(_pcd)

chunk1 2 3 4 5
+=========+=========+==========+=========+=========+
| blk0 | blk2 | blk2_1 | blk2_2 | blk3 |
+=========+=========+==========+=========+=========+
| 4K |

# 2. Usage

## 2.1 Set whether to use compression through storage parameters of tables and indexes

- compresstype
Set whether to compress and the compression algorithm used, supported values: none, pglz, zstd

- compresslevel
Set compress level(only zstd support)

- compress_chunk_size

Chunk is the smallest unit of storage space allocated for compressed pages.
The size of the chunk can only be 1/2, 1/4 or 1/8 of BLCKSZ

- compress_prealloc_chunks

The number of chunks pre-allocated for each page. The maximum value allowed is: BLCKSZ/compress_chunk_size -1.
If the number of chunks required for a compressed page is less than `compress_prealloc_chunks`,
It allocates `compress_prealloc_chunks` chunks to avoid future storage fragmentation when the page needs more storage space.

example:
CREATE TABLE tbl_pc(id int, c1 text) WITH(compresstype=zstd, compresslevel=0, compress_chunk_size=1024, compress_prealloc_chunks=2);
CREATE INDEX tbl_pc_idx1 on tbl_pc(c1) WITH(compresstype=zstd, compresslevel=1, compress_chunk_size=4096, compress_prealloc_chunks=0);

## 2.2 Set default compression option when create table in specified tablespace

- default_compresstype
- default_compresslevel
- default_compress_chunk_size
- default_compress_prealloc_chunks

note:temp table and unlogged table will not be affected by the above 4 parameters

example:
ALTER TABLESPACE pg_default SET(default_compresstype=zstd, default_compresslevel=2, default_compress_chunk_size=1024, default_compress_prealloc_chunks=2);

## 2.3 View the storage location of each block of the compressed table

add some functions in pageinspect to inspect compressed relation

- get_compress_address_header(relname text, segno integer)
- get_compress_address_items(relname text, segno integer)

example:
SELECT nblocks, allocated_chunks, chunk_size, algorithm FROM get_compress_address_header('test_compressed',0);
nblocks | allocated_chunks | chunk_size | algorithm
---------+------------------+------------+-----------
1 | 20 | 1024 | 1
(1 row)

SELECT * FROM get_compress_address_items('test_compressed',0);
blkno | nchunks | allocated_chunks | chunknos
-------+---------+------------------+---------------
0 | 0 | 4 | {1,2,3,4}
1 | 0 | 4 | {5,6,7,8}
2 | 0 | 4 | {9,10,11,12}
3 | 0 | 4 | {13,14,15,16}
4 | 0 | 4 | {17,18,19,20}
(5 rows)

## 2.4 Compare the compression ratio of different compression algorithms and compression levels

Use a new function in pageinspect can compare the compression ratio of different compression algorithms and compression levels.
This helps determine what compression parameters to use.

- page_compress(page bytea, algorithm text, level integer)

example:
postgres=# SELECT blk,octet_length(page_compress(get_raw_page('test_compressed', 'main', blk), 'pglz', 0)) compressed_size from generate_series(0,4) blk;
blk | compressed_size
-----+-----------------
0 | 3234
1 | 3516
2 | 3515
3 | 3515
4 | 1571
(5 rows)

postgres=# SELECT blk,octet_length(page_compress(get_raw_page('test_compressed', 'main', blk), 'zstd', 0)) compressed_size from generate_series(0,4) blk;
blk | compressed_size
-----+-----------------
0 | 1640
1 | 1771
2 | 1801
3 | 1813
4 | 806
(5 rows)

# 3. How to ensure crash safe
For the convenience of implementation, when the chunk space is allocated in the compressed address file,
WAL is not written. Therefore, if postgres crashes during the space allocation process,
incomplete data may remain in the compressed address file.

In order to ensure the data consistency of the compressed address file, the following measures have been taken

1. Divide the compressed address file into several 512-byte areas. The address data of each data block is stored in only one area,
and does not cross the area boundary to prevent half of the addresses from being persistent and the other half of the addresses not being persistent.
2. When allocating chunk space, write address information in a fixed order in the address file to avoid inconsistent data midway. details as follows

-Accumulate the total number of allocated chunks in the Header (PageCompressHeader.allocated_chunks)
-Write the chunkno array in the address corresponding to the data block (PageCompressAddr.chunknos)
-Write the number of allocated chunks in the address corresponding to the written data block (PageCompressAddr.nchunks)
-Update the global number of blocks in the Header (PageCompressHeader.nblocks)

typedef struct PageCompressHeader
{
pg_atomic_uint32nblocks;/* number of total blocks in this segment */
pg_atomic_uint32allocated_chunks;/* number of total allocated chunks in data area */
uint16chunk_size;/* size of each chunk, must be 1/2 1/4 or 1/8 of BLCKSZ */
uint8algorithm;/* compress algorithm, 1=pglz, 2=lz4 */
pg_atomic_uint32last_synced_nblocks;/* last synced nblocks */
pg_atomic_uint32last_synced_allocated_chunks;/* last synced allocated_chunks */
TimestampTzlast_recovery_start_time; /* postmaster start time of last recovery */
} PageCompressHeader;

typedef struct PageCompressAddr
{
volatile uint8nchunks;/* number of chunks for this block */
volatile uint8allocated_chunks;/* number of allocated chunks for this block */

/* variable-length fields, 1 based chunk no array for this block, size of the array must be 2, 4 or 8 */
pc_chunk_number_tchunknos[FLEXIBLE_ARRAY_MEMBER];
} PageCompressAddr;

3. Once a chunk is allocated, it will always belong to a specific data block until the relation is truncated(or vacuum tail block),
avoiding frequent changes of address information.
4. When replaying WAL in the recovery phase after a postgres crash, check the address file of all compressed relations opened for the first time,
and repair if inconsistent data (refer to the check_and_repair_compress_address function).

# 4. Problem

- When compress_chunk_size=1024, about 4MB of space is needed to store the address,
which will cause the space of the small file to become larger after compression.
Therefore, should avoid enabling compression for small tables.
- The zstd library needs to be installed separately. Could copy the source code of zstd to postgres?

# 5. TODO list

1. docs
2. optimize code style, error message and so on
3. more test

BTW:
If anyone thinks this Patch is valuable, hope to improve it together.

Best Regards
Chen Hujaun

Attachment Content-Type Size
page_compress_14_v1.patch application/octet-stream 389.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-12-09 21:59:34 Re: [HACKERS] [PATCH] Generic type subscripting
Previous Message Daniel Gustafsson 2020-12-09 21:34:59 Re: Proposed patch for key managment