Quick Links

Re: Patch: Write Amplification Reduction Method (WARM)

From:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Jaime Casanova <jaime(dot)casanova(at)2ndquadrant(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Patch: Write Amplification Reduction Method (WARM)
Date:	2017-01-31 11:22:39
Message-ID:	CABOikdOcFYSZ4vA2gYfs=M2cdXzXX4qGHeEiW3fu9PCfkHLa2A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Jan 26, 2017 at 2:38 AM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
wrote:

> Looking at your 0002 patch now. It no longer applies, but the conflicts
> are trivial to fix. Please rebase and resubmit.
>
>
Please see rebased and updated patches attached.

> I think having the "recheck" index methods create an ExecutorState looks
> out of place. How difficult is it to pass the estate from the calling
> code?
>
>
I couldn't find a good way to pass estate from the calling code. It would
require changes to many other APIs. I saw all other callers who need to
form index keys do that too. But please suggest if there are better ways.

> OffsetNumber
> heap_get_root_tuple(Page page, OffsetNumber target_offnum)
> {
> OffsetNumber off;
> heap_get_root_tuples_internal(page, target_offnum, &off);
> return off;
> }
>
>
Ok. Changed this way. Definitely looks better.

>
> The simple_heap_update + CatalogUpdateIndexes pattern is getting
> obnoxious. How about creating something like catalog_heap_update which
> does both things at once, and stop bothering each callsite with the WARM
> stuff?
>

What I realised that there are really 2 patterns:
1. simple_heap_insert, CatalogUpdateIndexes
2. simple_heap_update, CatalogUpdateIndexes

There are only couple of places where we already have indexes open or have
more than one tuple to update, so we call CatalogIndexInsert directly. What
I ended up doing in the attached patch is add two new APIs which combines
the two steps of each of these patterns. It seems much cleaner to me and
also less buggy for future users. I hope I am not missing a reason not to
do combine these steps.

> I'm not real sure about the interface between index AM and executor,
> namely IndexScanDesc->xs_tuple_recheck. For example, this pattern:
> if (!scan->xs_recheck)
> scan->xs_tuple_recheck = false;
> else
> scan->xs_tuple_recheck = true;
> can become simply
> scan->xs_tuple_recheck = scan->xs_recheck;
>

Fixed.

> which looks odd. I can't pinpoint exactly what's the problem, though.
> I'll continue looking at this one.
>

What we do is if the index scan is marked to do recheck, we do it for each
tuple anyways. Otherwise recheck is required only if a tuple comes from a
WARM chain.

>
> I wonder if heap_hot_search_buffer() and heap_hot_search() should return
> a tri-valued enum instead of boolean; that idea looks reasonable in
> theory but callers have to do more work afterwards, so maybe not.
>

I did not do anything with this yet. But I agree with you that we need to
make it better/simpler. Will continue to work on that.

I've addressed other review comments on the 0001 patch, except this one.

> +/*
> + * Get TID of next tuple in the update chain. Caller should have checked
that
> + * we are not already at the end of the chain because in that case
t_ctid may
> + * actually store the root line pointer of the HOT chain whose member
this
> + * tuple is.
> + */
> +#define HeapTupleHeaderGetNextTid(tup, next_ctid) \
> +do { \
> + AssertMacro(!((tup)->t_infomask2 & HEAP_LATEST_TUPLE)); \
> + ItemPointerCopy(&(tup)->t_ctid, (next_ctid)); \
> +} while (0)

> Actually, I think this macro could just return the TID so that it can be
> used as struct assignment, just like ItemPointerCopy does internally --
> callers can do
> ctid = HeapTupleHeaderGetNextTid(tup);

While I agree with your proposal, I wonder why we have ItemPointerCopy() in
the first place because we freely copy TIDs as struct assignment. Is there
a reason for that? And if there is, does it impact this specific case?

Other than the review comments, there were couple of bugs that I discovered
while running the stress test notably around visibility map handling. The
patch has those fixes. I also ripped out the kludge to record WARM-ness in
the line pointer because that is no longer needed after I reworked the code
a few versions back.

The other critical bug I found, which unfortunately exists in the master
too, is the index corruption during CIC. The patch includes the same fix
that I've proposed on the other thread. With these changes, WARM stress is
running fine for last 24 hours on a decently powerful box. Multiple
CREATE/DROP INDEX cycles and updates via different indexed columns, with a
mix of FOR SHARE/UPDATE and rollbacks did not produce any consistency
issues. A side note: while performance measurement wasn't a goal of stress
tests, WARM has done about 67% more transaction than master in 24 hour
period (95M in master vs 156M in WARM to be precise on a 30GB table
including indexes). I believe the numbers would be far better had the test
not dropping and recreating the indexes, thus effectively cleaning up all
index bloats. Also the table is small enough to fit in the shared buffers.
I'll rerun these tests with much larger scale factor and without dropping
indexes.

Of course, make check-world, including all TAP tests, passes too.

The CREATE INDEX CONCURRENTLY now works. The way we handle this is by
ensuring that no broken WARM chains are created while the initial index
build is happening. We check the list of attributes of indexes currently
in-progress (i.e. not ready for inserts) and if any of these attributes are
being modified, we don't do a WARM update. This is enough to address CIC
issue and all other mechanisms remain same as HOT. I've updated README to
include CIC algorithm.

There is one issue that bothers me. The current implementation lacks
ability to convert WARM chains into HOT chains. The README.WARM has some
proposal to do that. But it requires additional free bit in tuple header
(which we don't have) and of course, it needs to be vetted and implemented.
If the heap ends up with many WARM tuples, then index-only-scans will
become ineffective because index-only-scan can not skip a heap page, if it
contains a WARM tuple. Alternate ideas/suggestions and review of the design
are welcome!

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
0002_warm_updates_v10.patch	application/octet-stream	214.1 KB
0001_track_root_lp_v10.patch	application/octet-stream	38.4 KB
0000_interesting_attrs.patch	application/octet-stream	11.6 KB

In response to

Re: Patch: Write Amplification Reduction Method (WARM) at 2017-01-25 21:08:32 from Alvaro Herrera

Responses

Re: Patch: Write Amplification Reduction Method (WARM) at 2017-01-31 13:51:21 from Alvaro Herrera
Re: Patch: Write Amplification Reduction Method (WARM) at 2017-02-23 15:51:46 from Bruce Momjian

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Nikita Glukhov	2017-01-31 11:38:39	Re: [PATCH]: fix bug in SP-GiST box_ops
Previous Message	Etsuro Fujita	2017-01-31 11:15:28	Re: An issue in remote query optimization