Re: B-tree cache prefetches

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: B-tree cache prefetches
Date: 2018-10-15 04:57:29
Message-ID: 74574D24-15C2-4B41-B484-5D84252BBF0B@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> 15 окт. 2018 г., в 2:38, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> написал(а):
>
> On Mon, Oct 15, 2018 at 12:03 AM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>> 31 авг. 2018 г., в 2:40, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> написал(а):
>> A related topic is the cache-unfriendliness of traditional binary
>> searches of sorted data. Check out "Array Layouts for
>> Comparison-Based Searching"[1] if you haven't already. It says that
>> if your array fits in L2 cache, as our btree pages do, then binary
>> search still wins against Eytzinger et al, which I was a bit
>> disappointed
>>
>> I've re-read that paper. Their results are not about our case, here's a quote:
>>> For arrays small enough _to be kept_ in L2 cache, the branch-free binary search code listed in Listing 2 is the fastest algorithm
>>
>> Surely, we cannot keep all pages in L2. Eytzinger layout could be realistic possibility, except one simple problem: I do not see how can we insert items into this layout.
>
> I don't know. Aren't we talking about binary search within one 8KB
> page here, anyway?
The paper discuss multiple searches inside one 8Kb region, while we are doing single search in all-uncached page and move away. That's the difference.

Binserach is optimal, if the page is in L1 before we start search.

>
> Thinking about some other ideas for cache prefetching in btree code,
> here's an idea to improve pipelining of index scans (a bit like the
> two-step pipeline idea from the hash join prefetch thread). We know
> the TIDs of heap tuples we'll be looking up in the near future, so can
> we prefetch all the way to the tuple? There are three random
> accesses that follow soon after reading an index tuple:
> BufTableLookup(), then page header, then the tuple itself. At the
> logical extreme, we could anticipate the probe of SharedBufHash for
> the TID from index tuple i + 3 (perhaps dynahash could expose a
> hash_prefetch_for_hash() facility), and anticipate the read of the
> page header for the tuple pointed to by index tuple i + 2 (perhaps a
> dirty read of SharedBufHash that is only prepared to look at the first
> entry in the bucket would be OK for this), and anticipate the read of
> the tuple for the TID from index tuple i + 1 (perhaps a dirty read of
> the page item table). Or some less ambitious subset, if that pipeline
> turns out to be too deep; it seems like the simplest experiment would
> be to try prefetching just SharedBufHash lookups. This may all be
> completely crazy, I haven't tried any of it.
This idea also looks good to me. One thing bothers me. if we do __builtin_prefetch(&a[b[c[d]]],0,1), and a, b, c and d all are not in cache - will we incur CPU wait time for fetching a,b and c?
This [0] guys are expoiting C++ coroutines for prefetching this kind of stuff, but it seems like too much of hassle.
>
>> Also, that research is quite distant from B-tree binsearch: thier data items are small and do not represent reference to actual data. Eytzinger exploits the fact that two posiible future keys share same cache line. But it is hard to provide, given we place items at quite random place of the page.
>
> Based on the the above quotes, I'm not suggesting we should use
> Eytzinger for search-within-page. But out of curiosity, I checked,
> and saw that you can actually get a pair of index tuples holding a
> six-character text key or an integer key into a 64 byte cache line.

Well, this proves that in theory Eytzinger is an opportunity.

Best regards, Andrey Borodin.

[0] http://www.vldb.org/pvldb/vol11/p1702-jonathan.pdf

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Sandeep Thakkar 2018-10-15 05:46:53 Re: PostgreSQL 11 RC1 + GA Dates
Previous Message Amit Kapila 2018-10-15 03:40:49 Re: WIP: Avoid creation of the free space map for small tables