Re: Patch: Write Amplification Reduction Method (WARM)

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Jaime Casanova <jaime(dot)casanova(at)2ndquadrant(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: Write Amplification Reduction Method (WARM)
Date: 2017-03-14 19:15:23
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

On Tue, Mar 14, 2017 at 5:17 PM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>

> After looking at how index_fetch_heap and heap_hot_search_buffer
> interact, I can't say I'm in love with the idea. I started thinking
> that we should not have index_fetch_heap release the buffer lock only to
> re-acquire it five lines later, so it should keep the buffer lock, do
> the recheck and only release it afterwards (I realize that this means
> there'd be need for two additional "else release buffer lock" branches);

Yes, it makes sense.

> but then this got me thinking that perhaps it would be better to have
> another routine that does both call heap_hot_search_buffer and then call
> recheck -- it occurs to me that what we're doing here is essentially
> heap_warm_search_buffer.
> Does that make sense?

We can do that, but it's not clear to me if that would be a huge
improvement. Also, I think we need to first decide on how to model the
recheck logic since that might affect this function significantly. For
example, if we decide to do recheck at a higher level then we will most
likely end up releasing and reacquiring the lock anyways.

> Another thing is BuildIndexInfo being called over and over for each
> recheck(). Surely we need to cache the indexinfo for each indexscan.
Good point. What should that place be though? Can we just cache them in the
relcache and maintain them along with the list of indexes? Looking at the
current callers, ExecOpenIndices() usually cache them in the ResultRelInfo,
which is sufficient because INSERT/UPDATE/DELETE code paths are the most
relevant paths where caching definitely helps. The only other place where
it may get called once per tuple is unique_key_recheck(), which is used for
deferred unique key tests and hence probably not very common.

BTW I wanted to share some more numbers from a recent performance test. I
thought it's important because the latest patch has fully functional chain
conversion code as well as all WAL-logging related pieces are in place
too. I ran these tests on a box borrowed from Tomas (thanks!). This has
64GB RAM and 350GB SSD with 1GB on-board RAM. I used the same test setup
that I used for the first test results reported on this thread i.e. a
modified pgbench_accounts table with additional columns and additional
indexes (one index on abalance so that every UPDATE is a potential WARM

In a test where table + indexes exceeds RAM, running for 8hrs and
auto-vacuum parameters set such that we get 2-3 autovacuums on the table
during the test, we see WARM delivering more than 100% TPS as compared to
master. In this graph, I've plotted a moving average of TPS and the spikes
that we see coincides with the checkpoints (checkpoint_timeout is set to
20mins and max_wal_size large enough to avoid any xlog-based checkpoints).
The spikes are more prominent on WARM but I guess that's purely because it
delivers much higher TPS. I haven't shown here but I see WARM updates close
to 65-70% of the total updates. Also there is significant reduction in WAL
generated per txn.


Pavan Deolasee
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
Moderate_AV_4Indexes_100FF_SF1200_Duration28800s_Run2.pdf application/pdf 225.5 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2017-03-14 19:15:49 Re: logical replication access control patches
Previous Message Tom Lane 2017-03-14 19:14:44 Re: Write Ahead Logging for Hash Indexes