Re: B-tree cache prefetches

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: B-tree cache prefetches
Date: 2018-10-14 21:38:04
Message-ID: CAEepm=3+V-kKt5JjuMQARb6N66B3Dgu6Djj7yf+ndrMrzbRTBw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 15, 2018 at 12:03 AM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
> 31 авг. 2018 г., в 2:40, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> написал(а):
> A related topic is the cache-unfriendliness of traditional binary
> searches of sorted data. Check out "Array Layouts for
> Comparison-Based Searching"[1] if you haven't already. It says that
> if your array fits in L2 cache, as our btree pages do, then binary
> search still wins against Eytzinger et al, which I was a bit
> disappointed
>
> I've re-read that paper. Their results are not about our case, here's a quote:
> > For arrays small enough _to be kept_ in L2 cache, the branch-free binary search code listed in Listing 2 is the fastest algorithm
>
> Surely, we cannot keep all pages in L2. Eytzinger layout could be realistic possibility, except one simple problem: I do not see how can we insert items into this layout.

I don't know. Aren't we talking about binary search within one 8KB
page here, anyway?

Thinking about some other ideas for cache prefetching in btree code,
here's an idea to improve pipelining of index scans (a bit like the
two-step pipeline idea from the hash join prefetch thread). We know
the TIDs of heap tuples we'll be looking up in the near future, so can
we prefetch all the way to the tuple? There are three random
accesses that follow soon after reading an index tuple:
BufTableLookup(), then page header, then the tuple itself. At the
logical extreme, we could anticipate the probe of SharedBufHash for
the TID from index tuple i + 3 (perhaps dynahash could expose a
hash_prefetch_for_hash() facility), and anticipate the read of the
page header for the tuple pointed to by index tuple i + 2 (perhaps a
dirty read of SharedBufHash that is only prepared to look at the first
entry in the bucket would be OK for this), and anticipate the read of
the tuple for the TID from index tuple i + 1 (perhaps a dirty read of
the page item table). Or some less ambitious subset, if that pipeline
turns out to be too deep; it seems like the simplest experiment would
be to try prefetching just SharedBufHash lookups. This may all be
completely crazy, I haven't tried any of it.

> Also, that research is quite distant from B-tree binsearch: thier data items are small and do not represent reference to actual data. Eytzinger exploits the fact that two posiible future keys share same cache line. But it is hard to provide, given we place items at quite random place of the page.

Based on the the above quotes, I'm not suggesting we should use
Eytzinger for search-within-page. But out of curiosity, I checked,
and saw that you can actually get a pair of index tuples holding a
six-character text key or an integer key into a 64 byte cache line.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-10-14 22:04:12 Re: "SELECT ... FROM DUAL" is not quite as silly as it appears
Previous Message Matheus de Oliveira 2018-10-14 18:30:15 Re: [PATCH] Add support for ON UPDATE/DELETE actions on ALTER CONSTRAINT