Re: New expression evaluator and indirect jumps

From: Andres Freund <andres(at)anarazel(dot)de>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New expression evaluator and indirect jumps
Date: 2017-04-02 00:54:45
Message-ID: 20170402005445.ps5amvl3s343gbz5@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Jeff,

On 2017-04-01 17:36:42 -0700, Jeff Davis wrote:
> Thank you for your great work on the expression evaluator:
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=b8d7f053c5c2bf2a7e8734fe3327f6a8bc711755
>
> I was looking at the dispatch code, and it goes to significant effort
> (using computed goto) to generate many indirect jumps instead of just
> one. The reasoning is that CPUs can do better branch prediction when
> it can predict separately for each of the indirect jumps.

Right.

> But the paper here: https://hal.inria.fr/hal-01100647/document claims
> that it's not really needed on newer CPUs because they are better at
> branch prediction. I skimmed it, and if I understand correctly, modern
> branch predictors use some history, so it can predict based on the
> instructions executed before it got to the indirect jump.

Yea, it's true that the benefits on modern CPUs are smaller than they
used to be. But, for one the branch history buffers are of very limited
size, which in many cases will make prediction an issue again. For
another, the switch based dispatch has the issue that it'll still
perform boundary checks on the opcode, which has some performance
cost.

> I tried looking through the discussion on this list, but most seemed
> to resolve around which compilers generated the assembly we wanted
> rather than how much it actually improved performance. Can someone
> please point me to the numbers? Do they refute the conclusions in the
> paper, or are we concerned about a wider range of processors?

I ran a lot of benchmarks during development, and either there was no
performance difference between computed gotos and switch based
threading, or computed gotos come out ahead. In expression heavy cases,
e.g. TPC-H Q01, there's a considerable advantage (~3.5% total, making it
something like ~15% expression evaluation speedup). I primarily
evaluated performance on a skylake (i.e. newer than haswell), rather
than on my older nehalem workstation, to avoid optimizing for the wrong
thing.

I am not particularly concerned about !x86 processors, but a lot of them
indeed seem to have a lot less elaborate branch predictors (especially
ARM). Also, nehalem and sandy bridge are still quite common out there,
especially in servers.

Since the cost of maintaining the computed goto stuff isn't that high,
I'm not really concerned here.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2017-04-02 06:53:25 Re: Variable substitution in psql backtick expansion
Previous Message Jeff Davis 2017-04-02 00:36:42 New expression evaluator and indirect jumps