Re: Making all nbtree entries unique by having heap TIDs participate in comparisons

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Subject: Re: Making all nbtree entries unique by having heap TIDs participate in comparisons
Date: 2018-10-18 20:44:08
Message-ID: 20181018204408.tk3km7zwusbbt5gd@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2018-10-18 12:54:27 -0700, Peter Geoghegan wrote:
> I can show a nice improvement in latency on a slightly-rate-limited
> TPC-C workload when backend_flush_after=0 (something like a 40%
> reduction on average), but that doesn't hold up when oltpbench isn't
> rate-limited and/or has backend_flush_after set. Usually, there is a
> 1% - 2% regression, despite the big improvements in index size, and
> despite the big reduction in the amount of buffers that backends must
> write out themselves.

What kind of backend_flush_after values where you trying?
backend_flush_after=0 obviously is the default, so I'm not clear on
that. How large is the database here, and how high is shared_buffers

> The obvious explanation is that throughput is decreased due to our
> doing extra work (truncation) while under an exclusive buffer lock.
> However, I've worked hard on that, and, as I said, I can sometimes
> observe a nice improvement in latency. This makes me doubt the obvious
> explanation. My working theory is that this has something to do with
> shared_buffers eviction. Maybe we're making worse decisions about
> which buffer to evict, or maybe the scalability of eviction is hurt.
> Perhaps both.

Is it possible that there's new / prolonged cases where a buffer is read
from disk after the patch? Because that might require doing *write* IO
when evicting the previous contents of the victim buffer, and obviously
that can take longer if you're running with backend_flush_after > 0.

I wonder if it'd make sense to hack up a patch that logs when evicting a
buffer while already holding another lwlock. That shouldn't be too hard.

> You can download results from a recent benchmark to get some sense of
> this. It includes latency and throughput graphs, plus details
> statistics collector stats:
>
> https://drive.google.com/file/d/1oIjJ3YpSPiyRV_KF6cAfAi4gSm7JdPK1/view?usp=sharing

I'm uncllear which runs are what here? I assume "public" is your
patchset, and master is master? Do you reset the stats inbetween runs?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2018-10-18 20:46:21 Re: Making all nbtree entries unique by having heap TIDs participate in comparisons
Previous Message Andres Freund 2018-10-18 20:31:13 Re: Large writable variables