Re: Performance Improvement by reducing WAL for Update Operation

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>
Subject: Re: Performance Improvement by reducing WAL for Update Operation
Date: 2014-02-15 14:51:09
Message-ID: 20140215145109.GC19470@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Some quick review comments:

On 2014-02-13 18:14:54 +0530, Amit Kapila wrote:
> + /*
> + * EWT can be generated for all new tuple versions created by Update
> + * operation. Currently we do it when both the old and new tuple versions
> + * are on same page, because during recovery if the page containing old
> + * tuple is corrupt, it should not cascade that corruption to other pages.
> + * Under the general assumption that for long runs most updates tend to
> + * create new tuple version on same page, there should not be significant
> + * impact on WAL reduction or performance.
> + *
> + * We should not generate EWT when we need to backup the whole block in
> + * WAL as in that case there is no saving by reduced WAL size.
> + */
> +
> + if (RelationIsEnabledForWalCompression(reln) &&
> + (oldbuf == newbuf) &&
> + !XLogCheckBufferNeedsBackup(newbuf))
> + {
> + uint32 enclen;

You should note that thew check for RelationIsEnabledForWalCompression()
here is racy and that that's ok because the worst that can happen is
that a uselessly generated delta.
> xlrec.target.node = reln->rd_node;
> xlrec.target.tid = oldtup->t_self;
> xlrec.old_xmax = HeapTupleHeaderGetRawXmax(oldtup->t_data);
> @@ -6619,6 +6657,8 @@ log_heap_update(Relation reln, Buffer oldbuf,
> xlrec.newtid = newtup->t_self;
> if (new_all_visible_cleared)
> xlrec.flags |= XLOG_HEAP_NEW_ALL_VISIBLE_CLEARED;
> + if (compressed)
> + xlrec.flags |= XLOG_HEAP_DELTA_ENCODED;

I think this also needs to unset XLOG_HEAP_CONTAINS_NEW_TUPLE and
conditional on !need_tuple_data.

> /*
> + * Determine whether the buffer referenced has to be backed up. Since we don't
> + * yet have the insert lock, fullPageWrites and forcePageWrites could change
> + * later, but will not cause any problem because this function is used only to
> + * identify whether EWT is required for update.
> + */
> +bool
> +XLogCheckBufferNeedsBackup(Buffer buffer)
> +{

Should note very, very boldly that this can only be used in contexts
where a race is acceptable.

> diff --git a/src/backend/utils/adt/pg_rbcompress.c b/src/backend/utils/adt/pg_rbcompress.c
> new file mode 100644
> index 0000000..877ccd7
> --- /dev/null
> +++ b/src/backend/utils/adt/pg_rbcompress.c
> @@ -0,0 +1,355 @@
> +/* ----------
> + * pg_rbcompress.c -
> + *
> + * This is a delta encoding scheme specific to PostgreSQL and designed
> + * to compress similar tuples. It can be used as it is or extended for
> + * other purpose in PostgrSQL if required.
> + *
> + * Currently, this just checks for a common prefix and/or suffix, but
> + * the output format is similar to the LZ format used in pg_lzcompress.c.
> + *
> + * Copyright (c) 1999-2014, PostgreSQL Global Development Group
> + *
> + * src/backend/utils/adt/pg_rbcompress.c
> + * ----------
> + */

This needs significantly more explanations about the algorithm and the
reasoning behind it.

> +static const PGRB_Strategy strategy_default_data = {
> + 32, /* Data chunks less than 32 bytes are not
> + * compressed */
> + INT_MAX, /* No upper limit on what we'll try to
> + * compress */
> + 35, /* Require 25% compression rate, or not worth
> + * it */
> +};

compression rate looks like it's mismatch between comment and code.

> +/* ----------
> + * pgrb_out_ctrl -
> + *
> + * Outputs the last and allocates a new control byte if needed.
> + * ----------
> + */
> +#define pgrb_out_ctrl(__ctrlp,__ctrlb,__ctrl,__buf) \
> +do { \
> + if ((__ctrl & 0xff) == 0) \
> + { \
> + *(__ctrlp) = __ctrlb; \
> + __ctrlp = (__buf)++; \
> + __ctrlb = 0; \
> + __ctrl = 1; \
> + } \
> +} while (0)
> +

double underscore variables are reserved for the compiler and os.

> +/* ----------
> + * pgrb_out_literal -
> + *
> + * Outputs a literal byte to the destination buffer including the
> + * appropriate control bit.
> + * ----------
> + */
> +#define pgrb_out_literal(_ctrlp,_ctrlb,_ctrl,_buf,_byte) \
> +do { \
> + pgrb_out_ctrl(_ctrlp,_ctrlb,_ctrl,_buf); \
> + *(_buf)++ = (unsigned char)(_byte); \
> + _ctrl <<= 1; \
> +} while (0)
> +
> +
> +/* ----------
> + * pgrb_out_tag -
> + *
> + * Outputs a backward reference tag of 2-4 bytes (depending on
> + * offset and length) to the destination buffer including the
> + * appropriate control bit.
> + * ----------
> + */
> +#define pgrb_out_tag(_ctrlp,_ctrlb,_ctrl,_buf,_len,_off) \
> +do { \
> + pgrb_out_ctrl(_ctrlp,_ctrlb,_ctrl,_buf); \
> + _ctrlb |= _ctrl; \
> + _ctrl <<= 1; \
> + if (_len > 17) \
> + { \
> + (_buf)[0] = (unsigned char)((((_off) & 0xf00) >> 4) | 0x0f); \
> + (_buf)[1] = (unsigned char)(((_off) & 0xff)); \
> + (_buf)[2] = (unsigned char)((_len) - 18); \
> + (_buf) += 3; \
> + } else { \
> + (_buf)[0] = (unsigned char)((((_off) & 0xf00) >> 4) | ((_len) - 3)); \
> + (_buf)[1] = (unsigned char)((_off) & 0xff); \
> + (_buf) += 2; \
> + } \
> +} while (0)
> +

What's the reason to use macros here? Just use inline functions when
dealing with file-local stuff.

> +/* ----------
> + * pgrb_delta_encode - find common prefix/suffix between inputs and encode.
> + *
> + * source is the input data to be compressed
> + * slen is the length of source data
> + * history is the data which is used as reference for compression
> + * hlen is the length of history data
> + * The encoded result is written to dest, and its length is returned in
> + * finallen.
> + * The return value is TRUE if compression succeeded,
> + * FALSE if not; in the latter case the contents of dest
> + * are undefined.
> + * ----------
> + */
> +bool
> +pgrb_delta_encode(const char *source, int32 slen,
> + const char *history, int32 hlen,
> + char *dest, uint32 *finallen,
> + const PGRB_Strategy *strategy)
> +{
> + unsigned char *bp = ((unsigned char *) dest);
> + unsigned char *bstart = bp;
> + const char *dp = source;
> + const char *dend = source + slen;
> + const char *hp = history;
> + unsigned char ctrl_dummy = 0;
> + unsigned char *ctrlp = &ctrl_dummy;
> + unsigned char ctrlb = 0;
> + unsigned char ctrl = 0;
> + int32 result_size;
> + int32 result_max;
> + int32 need_rate;
> + int prefixlen;
> + int suffixlen;
> +
> + /*
> + * Tuples of length greater than PGRB_HISTORY_SIZE are not allowed for
> + * delta encode as this is the maximum size of history offset.
> + * XXX: still true?
> + */

Why didn't you define a maximum tuple size in the strategy definition
above then?

> + if (hlen >= PGRB_HISTORY_SIZE || hlen < PGRB_MIN_MATCH)
> + return false;
> +
> + /*
> + * Our fallback strategy is the default.
> + */
> + if (strategy == NULL)
> + strategy = PGRB_strategy_default;
>
> + /*
> + * If the strategy forbids compression (at all or if source chunk size out
> + * of range), fail.
> + */
> + if (slen < strategy->min_input_size ||
> + slen > strategy->max_input_size)
> + return false;
> +
> + need_rate = strategy->min_comp_rate;
> + if (need_rate < 0)
> + need_rate = 0;
> + else if (need_rate > 99)
> + need_rate = 99;

Is there really need for all this stuff here? This is so specific to the
usecase that I have significant doubts that all the pglz boiler plate
makes much sense.

> + /*
> + * Compute the maximum result size allowed by the strategy, namely the
> + * input size minus the minimum wanted compression rate. This had better
> + * be <= slen, else we might overrun the provided output buffer.
> + */
> + /*if (slen > (INT_MAX / 100))
> + {
> + /* Approximate to avoid overflow */
> + /*result_max = (slen / 100) * (100 - need_rate);
> + }
> + else
> + {
> + result_max = (slen * (100 - need_rate)) / 100;
> + }*/

err?

> +--
> +-- Test to update continuos and non continuos columns
> +--

*continuous

I have to admit, I have serious doubts about this approach. I have a
very hard time believing this won't cause performance regression in many
common cases... More importantly I don't think doing the compression on
this level is that interesting. I know Heikki argued for it, but I think
extending the bitmap that's computed for HOT to cover all columns and
doing this on a column level sounds much more sensible to me.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2014-02-15 15:06:13 Re: Create function prototype as part of PG_FUNCTION_INFO_V1
Previous Message Andres Freund 2014-02-15 14:29:21 Re: commit fest 2014-01 week 4 report