Re: asynchronous and vectorized execution

From: Greg Stark <stark(at)mit(dot)edu>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: asynchronous and vectorized execution
Date: 2016-05-10 04:34:36
Message-ID: CAM-w4HMH9i+2Hub9VBRDzc+Eh2g8c_FSN2K7DGiOF=wWBz8dbw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 9 May 2016 8:34 pm, "David Rowley" <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>
> This project does appear to require that we bloat the code with 100's
> of vector versions of each function. I'm not quite sure if there's a
> better way to handle this. The problem is that the fmgr is pretty much
> a barrier to SIMD operations, and this was the only idea that I've had
> so far about breaking through that barrier. So further ideas here are
> very welcome.

Well yes and no. In practice I think you only need to worry about
vectorised versions of integer and possibly float. For other data types
there either aren't vectorised operators or there's little using them.

And I'll make a bold claim here that the only operators I think really
matter are =

The rain is because using SIMD instructions is a minor win if you have any
further work to do per tuple. The only time it's a big win is if you're
eliminating entire tuples from consideration efficiently. = is going to do
that often, other btree operator classes might be somewhat useful, but
things like + really only would come up in odd examples.

But even that understates things. If you have column oriented storage then
= becomes even more important since every scan has a series of implied
equijoins to reconstruct the tuple. And the coup de grace is that in a
column oriented storage you try to store variable length data as integer
indexes into a dictionary of common values so *everything* is an integer =
operation.

How to do this without punching right through the executor as an
abstraction and still supporting extensible data types and operators was
puzzling me already. I do think it involves having these vector operators
in the catalogue and also some kind of compression mapping to integer
indexes. But I'm not sure that's all that would be needed.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2016-05-10 04:34:57 Re: between not propated into a simple equality join
Previous Message Michael Paquier 2016-05-10 03:10:35 Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Windows service is not starting so there’s message in log: FATAL: "could not create shared memory segment “Global/PostgreSQL.851401618”: Permission denied”