Re: Parallel query execution

From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel query execution
Date: 2013-01-16 04:47:21
Message-ID: CAGTBQpbRYp9GNe2JjbXXCAO1-OsYXdh2fUyZVka+GXY22dj+iQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 16, 2013 at 12:55 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>> If memory serves me correctly (and it does, I suffered it a lot), the
>> performance hit is quite considerable. Enough to make it "a lot worse"
>> rather than "not as good".
>
> I feel like we must not be communicating very well.
>
> If the CPU is pegged at 100% and the I/O system is at 20%, adding
> another CPU at 100% will bring the I/O load up to 40% and you're now
> processing data twice as fast overall

Well, there's the fault in your logic. It won't be as linear. Adding
another sequential scan will decrease bandwidth, if the I/O system was
doing say 10MB/s at 20% load, now it will be doing 20MB/s at 80% load
(maybe even worse). Quite suddenly you'll meet diminishing returns,
and the I/O subsystem which wasn't the bottleneck will become it,
bandwidth being the key. You might end up with less bandwidth than
you've started, if you go far enough past that knee.

Add some concurrent operations (connections) to the mix and it just gets worse.

Figuring out where the knee is may be the hardest problem you'll face.
I don't think it'll be predictable enough to make I/O parallelization
in that case worth the effort.

If you instead think of parallelizing random I/O (say index scans
within nested loops), that might work (or it might not). Again it
depends a helluva lot on what else is contending with the I/O
resources and how far ahead of optimum you push it. I've faced this
problem when trying to prefetch on index scans. If you try to prefetch
too much, you induce extra delays and it's a bad tradeoff.

Feel free to do your own testing.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2013-01-16 04:48:29 Re: Parallel query execution
Previous Message Michael Paquier 2013-01-16 04:37:28 Re: Parallel query execution