Re: [REVIEW] Re: Compression of full-page-writes

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Rahila Syed <rahilasyed90(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Rahila Syed <rahilasyed(dot)90(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [REVIEW] Re: Compression of full-page-writes
Date: 2014-07-23 08:21:06
Message-ID: CABOikdOc7J3t5DX+oGOxo6YNhGOFcpx_jMe36Meeqv0bxH6xTw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I'm trying to understand what would it take to have this patch in an
acceptable form before the next commitfest. Both Abhijit and Andres has
done some extensive review of the patch and have given many useful
suggestions to Rahila. While she has incorporated most of them, I feel we
are still some distance away from having something which can be committed.
Here are my observations based on the discussion on this thread so far.

1. Need for compressing full page backups:
There are good number of benchmarks done by various people on this list
which clearly shows the need of the feature. Many people have already
voiced their agreement on having this in core, even as a configurable
parameter. There had been some requests to have more benchmarks such as
response times immediately after a checkpoint or CPU consumption which I'm
not entirely sure if already done.

2. Need for different compression algorithms:
There were requests for comparing different compression algorithms such as
LZ4 and snappy. Based on the numbers that Rahila has posted, I can see LZ4
has the best compression ratio, at least for TPC-C benchmarks she tried.
Having said that, I was hoping to see more numbers in terms of CPU resource
utilization which will demonstrate the trade-off, if any. Anyways, there
were also apprehensions expressed about whether to have pluggable algorithm
in the final patch that gets committed. If we do decide to support more
compression algorithms, I like what Andres had done before i.e. store the
compression algorithm information in the varlena header. So basically, we
should have a abstract API which can take a buffer and the desired
algorithm and returns compressed data, along with varlena header with
encoded information. ISTM that the patch Andres had posted earlier was
focused primarily on toast data, but I think we can make it more generic so
that both toast and FPW can use it.

Having said that, IMHO we should go one step at a time. We are using pglz
for compressing toast data for long, so we can continue to use the same for
compressing full page images. We can simultaneously work on adding more
algorithms to core and choose the right candidate for different scenarios
such as toast or FPW based on test evidences. But that work can happen
independent of this patch.

3. Compressing one block vs all blocks:
Andres suggested that compressing all backup blocks in one go may give us
better compression ratio. This is worth trying. I'm wondering what would
the best way to do so without minimal changes to the xlog insertion code.
Today, we add more rdata items for backup block header(s) and backup blocks
themselves (if there is a "hole" then 2 per backup block) beyond what the
caller has supplied. If we have to compress all the backup blocks together,
then one approach is to copy the backup block headers and the blocks to a
temp buffer, compress that and replace the rdata entries added previously
with a single rdata. Is there a better way to handle multiple blocks in one
go?

We still need a way to tell the restore path that the wal data is
compressed. One way is to always add a varlena header irrespective of
whether the blocks are compressed or not. This looks overkill. Another way
to add a new field to XLogRecord to record this information. Looks like we
can do this without increasing the size of the header since there are 2
bytes padding after the xl_rmid field.

4. Handling holes in backup blocks:
I think we address (3) then this can be easily done. Alternatively, we can
also memzero the "hole" and then compress the entire page. The compression
algorithm should handle that well.

Thoughts/comments?

Thanks,
Pavan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christoph Berg 2014-07-23 08:37:09 Re: [TODO] Process pg_hba.conf keywords as case-insensitive
Previous Message Pavel Stehule 2014-07-23 07:20:40 Re: proposal (9.5) : psql unicode border line styles