Re: Parallel Seq Scan

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-01-26 21:48:39
Message-ID: 54C6B637.9050408@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/23/15 10:16 PM, Amit Kapila wrote:
> Further, if we want to just get the benefit of parallel I/O, then
> I think we can get that by parallelising partition scan where different
> table partitions reside on different disk partitions, however that is
> a matter of separate patch.

I don't think we even have to go that far.

My experience with Postgres is that it is *very* sensitive to IO latency (not bandwidth). I believe this is the case because complex queries tend to interleave CPU intensive code in-between IO requests. So we see this pattern:

Wait 5ms on IO
Compute for a few ms
Wait 5ms on IO
Compute for a few ms
...

We blindly assume that the kernel will magically do read-ahead for us, but I've never seen that work so great. It certainly falls apart on something like an index scan.

If we could instead do this:

Wait for first IO, issue second IO request
Compute
Already have second IO request, issue third
...

We'd be a lot less sensitive to IO latency.

I wonder what kind of gains we would see if every SeqScan in a query spawned a worker just to read tuples and shove them in a queue (or shove a pointer to a buffer in the queue). Similarly, have IndexScans have one worker reading the index and another worker taking index tuples and reading heap tuples...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2015-01-26 21:59:14 Re: proposal: plpgsql - Assert statement
Previous Message Andres Freund 2015-01-26 21:35:55 Re: New CF app deployment