Re: Memory prefetching while sequentially fetching from SortTuple array, tuplestore

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Memory prefetching while sequentially fetching from SortTuple array, tuplestore
Date: 2015-09-02 23:12:29
Message-ID: 20150902231229.GF8555@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2015-09-02 16:02:00 -0700, Peter Geoghegan wrote:
> On Wed, Sep 2, 2015 at 3:13 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > That's just a question of how to formulate this though?
> >
> > pg_rfetch(((char *) state->memtuples ) + 3 * sizeof(SortTuple) + offsetof(SortTuple, tuple))?
> >
> > For something heavily platform dependent like this that seems ok.
>
> Well, still needs to work for tuplestore, which does not have a SortTuple.

Isn't it even more trivial there? It's just an array of void*'s? So
prefetch(state->memtuples + 3 + readptr->current)?

> Because of the way tuples are fetched across translation unit
> boundaries in the cases addressed by the patch, it isn't hard to see
> why the compiler does not do this automatically (prefetch instructions
> added by the compiler are not common anyway, IIRC).

Hardware prefetchers just have gotten to be rather good and obliterated
most of the cases where it's beneficial.

I'd be interested to see a perf stat -ddd comparison to the patch
with/without prefetches. It'll be interesting to see how the number of
cache hits/misses and prefetches changes.

Which microarchitecture did you test this on?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2015-09-02 23:14:43 Re: Horizontal scalability/sharding
Previous Message Peter Geoghegan 2015-09-02 23:02:00 Re: Memory prefetching while sequentially fetching from SortTuple array, tuplestore