Re: Patch: Write Amplification Reduction Method (WARM)

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Jaime Casanova <jaime(dot)casanova(at)2ndquadrant(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: Write Amplification Reduction Method (WARM)
Date: 2017-02-28 06:49:50
Message-ID: CABOikdNnFon4cJiL=h1mZH3bgUeU+sWHuU4Yr8AB=j3A2p1GiA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Feb 26, 2017 at 2:14 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

>
>
> Fair point, but I've already said why I think the stakes for this
> particular feature are pretty high.
>
>
I understand your concerns and not trying to downplay them. I'm doing my
best to test the patch in different ways to ensure we can catch most of the
bugs before the patch is committed. Hopefully with additional reviews and
tests we can plug remaining holes, if any, and be in a comfortable state.

> >
> > (I have mentioned the idea of overloading ip_posid bits a few times now
> and
> > haven't heard any objection so far. Well, that could either mean that
> nobody
> > has read those emails seriously or there is general acceptance to that
> > idea.. I am assuming latter :-))
>
> I'm not sure about that. I'm not really sure I have an opinion on
> that yet, without seeing the patch. The discussion upthread was a bit
> vague:
>

Attached is a complete set of rebased and finished patches. Patches 0002
and 0003 does what I've in mind as far as OffsetNumber bits.

AFAICS this version is a fully functional implementation of WARM, ready for
serious review/test. The chain conversion is now fully functional and
tested with btrees. I've also added support for chain conversion in hash
indexes by overloading ip_posid high order bits. Even though there is a
free bit available in btree index tuple, the patch now uses the same
ip_posid bit even for btree indexes.

A short summary of all attached patches.

0000_interesting_attrs_v15.patch:

This is Alvaro's patch to refactor HeapSatisfiesHOTandKeyUpdate. We now
return a set of modified attributes and let the caller consume that
information in a way it wants. The main WARM patch uses this refactored API.

0001_track_root_lp_v15.patch:

This implements the logic to store the root offset of the HOT chain in the
t_ctid.ip_posid field. We use a free bit in heap tuple header to mark that
a particular tuple is at the end of the chain and store the root offset in
the ip_posid. For pg_upgraded clusters, this information could be missing
and we do the hard-work of going through the page tuples to find the root
offset.

0002_clear_ip_posid_blkid_refs_v15.patch:

This is mostly a cleanup patch which removes direct references to ip_posid
and ip_blkid from various places and replace them with appropriate
ItemPointer[Get|Set][Offset|Block]Number macros.

0003_freeup_3bits_ip_posid_v15.patch:

This patch frees up the high order 3 bits from ip_posid and makes them
available for other uses. As noted, we only need 13 bits to represent
OffsetNumber and hence the high order bits are unused. This patch should
only be applied along with 0002_clear_ip_posid_blkid_refs_v15.patch

0004_warm_updates_v15.patch:

This implements the main WARM logic, except for chain conversion (which is
implemented in the last patch of the series). It uses another free bit in
the heap tuple header to identify the WARM tuples. When the first WARM
update happens, the old and new versions of the tuple are marked with this
flag. All subsequent HOT tuples in the chain are also marked with this flag
so we never lose information about WARM updates, irrespective of whether it
commits or aborts. We then implement recheck logic to decide which index
pointer should return a tuple from the HOT chain.

WARM is currently supported for hash and btree indexes. If a table has an
index of any other type, WARM is disabled.

0005_warm_chain_conversion_v15.patch:

This patch implements the WARM chain conversion as discussed upthread and
also noted in the README.WARM. This patch requires yet another bit in the
heap tuple header. But since the bit is only set along with the
HEAP_WARM_TUPLE bit, we can safely reuse HEAP_MOVED_OFF bit for this
purpose. We also need a bit to distinguish two copies of index pointers to
know which pointer points to the pre-WARM-update HOT chain (Blue chain) and
which pointer points to post-WARM-update HOT chain (Red chain). We steal
this bit from t_tid.ip_posid field in the index tuple headers. As part of
this patch, I moved XLOG_HEAP2_MULTI_INSERT to RM_HEAP_ID (and renamed it
to XLOG_HEAP_MULTI_INSERT). While it's not necessary, I thought it will
allow us to restrict XLOG_HEAP_INIT_PAGE to RM_HEAP_ID and make that bit
available to define additional opcodes in RM_HEAD2_ID.

I've done some elaborate tests with these patches applied. I've primarily
used make-world, pgbench with additional indexes and the WARM stress test
(which was useful in catching CIC bug) to test the feature. While it does
not mean there are no additional bugs, all bugs that were known to me are
fixed in this version. I'll continue to run more tests, especially around
crash recovery, when indexes are dropped and recreated and also do more
performance tests.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0000_interesting_attrs_v15.patch application/octet-stream 11.6 KB
0005_warm_chain_conversion_v15.patch application/octet-stream 94.9 KB
0004_warm_updates_v15.patch application/octet-stream 120.0 KB
0003_freeup_3bits_ip_posid_v15.patch application/octet-stream 6.8 KB
0002_clear_ip_posid_blkid_refs_v15.patch application/octet-stream 11.2 KB
0001_track_root_lp_v15.patch application/octet-stream 38.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Okano, Naoki 2017-02-28 07:05:35 Re: Keep ECPG comment for log_min_duration_statement
Previous Message Erik Rijkers 2017-02-28 06:38:46 Re: Logical replication existing data copy