Re: "Write amplification" is made worse by "getting tired" while inserting into nbtree secondary indexes (Was: Why B-Tree suffix truncation matters)

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: "Write amplification" is made worse by "getting tired" while inserting into nbtree secondary indexes (Was: Why B-Tree suffix truncation matters)
Date: 2018-07-18 05:42:02
Message-ID: CANP8+jL-x_2MbUv49ofdw9YOrYT+z_cDG5KboSkjtk4vx_c5Cw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 17 July 2018 at 23:10, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Peter Geoghegan <pg(at)bowt(dot)ie> writes:
>> I've done plenty of research into the history of this hack. It was
>> your work, but it does actually make sense in the context of today's
>> nbtree code. It is essential with scankey-wise duplicates, since
>> groveling through hundreds or even thousands of pages full of
>> duplicates to find free space (and avoid a page split) is going to
>> have a very serious downside for latency.
>
> Well, the actual problem was O(N^2) behavior:
>
> https://www.postgresql.org/message-id/2378.967216388%40sss.pgh.pa.us
>
> https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=40549e9cb5abd2986603883e4ab567dab34723c6
>
> I certainly have no objection to improving matters, but let's be sure
> we don't re-introduce any two-decade-old problems.

Looking at this in terms of CPU cost for a single insert, insertion at
the end of the list of duplicates was O(N), which was much improved by
the O(k) behavior.
Keeping the index entries in order is O(logN)

So there is a higher cost to keeping the entries in order, though that
becomes a win because of the reduced bloat in cases where we have a
high numbers of deletes. That is a win in insertion time as well,
since by packing the index more tightly we cause fewer costly splits
and less I/O.

If we knew that we were never going to do deletes/non-HOT updates from
the table we could continue to use the existing mechanism, otherwise
we will be better off to use sorted index entries. However, it does
appear as if keeping entries in sorted order would be a win on
concurrency from reduced block contention on the first few blocks of
the index key, so it may also be a win in cases where there are heavy
concurrent inserts but no deletes.

Bulk loading will improve because adding a new data block via COPY
will cause all of the resulting index entries to be added with more
adjacency, so a block full of duplicate entries could be sorted and
then merged efficiently into the index.

I hope we can see a patch that just adds the sorting-by-TID property
so we can evaluate that aspect before we try to add other more
advanced index ideas.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2018-07-18 05:54:44 Re: [Patch] Checksums for SLRU files
Previous Message Andrey Borodin 2018-07-18 05:41:52 Re: [Patch] Checksums for SLRU files