Re: PG 12 draft release notes

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Emre Hasegeli <emre(at)hasegeli(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Surafel Temesgen <surafel3000(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PG 12 draft release notes
Date: 2019-05-21 21:22:53
Message-ID: CAH2-WzkPBoDL_HrT6gvG4uVS2H-zXP1pW0h5M=JYCibJGY_F5w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 21, 2019 at 1:51 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > My concern here (which I believe Alexander shares) is that it doesn't
> > make sense to group these two items together. They're two totally
> > unrelated pieces of work. Alexander's work does more or less help with
> > lock contention with writes, whereas the feature that that was merged
> > with is about preventing index bloat, which is mostly helpful for
> > reads (it helps writes to the extent that writes are also reads).
> >
> > The release notes go on to say that this item "gives better
> > performance for UPDATEs and DELETEs on indexes with many duplicates",
> > which is wrong. That is something that should have been listed below,
> > under the "duplicate index entries in heap-storage order" item.
>
> OK, I understand how the lock stuff improves things, but I have
> forgotten how indexes are made smaller. Is it because of better page
> split logic?

That is clearly the main reason, though suffix truncation (which
represents that trailing/suffix columns in index tuples from branch
pages have "negative infinity" sentinel values) also contributes to
making indexes smaller.

The page split stuff was mostly added by commit fab250243 ("Consider
secondary factors during nbtree splits"), but commit f21668f32 ("Add
"split after new tuple" nbtree optimization") added to that in a way
that really helped the TPC-C indexes. The TPC-C indexes are about 40%
smaller now.

> > > Author: Peter Geoghegan <pg(at)bowt(dot)ie>
> > > 2019-03-20 [dd299df81] Make heap TID a tiebreaker nbtree index column.

> As I remember the benefit currently is that you can find update and
> deleted rows faster, right?

Yes, that's true when writing to the index. But more importantly, it
really helps VACUUM when there are lots of duplicates, which is fairly
common in the real world (imagine an index where 20% of the rows are
NULL, for example). In effect, there are no duplicates anymore,
because all index tuples are unique internally.

Indexes with lots of duplicates group older rows together, and new
rows together, because treating heap TID as a tiebreaker naturally has
that effect. VACUUM will generally dirty far fewer pages, because bulk
deletions tend to be correlated with heap TID. And, VACUUM has a much
better chance of deleting entire leaf pages, because dead tuples end
up getting grouped together.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Juanjo Santamaria Flecha 2019-05-21 21:35:24 Re: MSVC Build support with visual studio 2019
Previous Message Michael Meskes 2019-05-21 21:12:40 Re: SQL statement PREPARE does not work in ECPG