Re: Parallel Seq Scan

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-01-28 14:03:01
Message-ID: CA+Tgmoa-XCu2Bkkieo=6X3sQNuOfzBoo_osMdeK9gR4NuObj7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 28, 2015 at 2:08 AM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> OTOH, spreading the I/O across multiple files is not a good thing, if you
> don't have a RAID setup like that. With a single spindle, you'll just induce
> more seeks.
>
> Perhaps the OS is smart enough to read in large-enough chunks that the
> occasional seek doesn't hurt much. But then again, why isn't the OS smart
> enough to read in large-enough chunks to take advantage of the RAID even
> when you read just a single file?

Suppose we have N spindles and N worker processes and it just so
happens that the amount of computation is such that a each spindle can
keep one CPU busy. Let's suppose the chunk size is 4MB. If you read
from the relation at N staggered offsets, you might be lucky enough
that each one of them keeps a spindle busy, and you might be lucky
enough to have that stay true as the scans advance. You don't need
any particularly large amount of read-ahead; you just need to stay at
least one block ahead of the CPU. But if you read the relation in one
pass from beginning to end, you need at least N*4MB of read-ahead to
have data in cache for all N spindles, and the read-ahead will
certainly fail you at the end of every 1GB segment.

The problem here, as I see it, is that we're flying blind. If there's
just one spindle, I think it's got to be right to read the relation
sequentially. But if there are multiple spindles, it might not be,
but it seems hard to predict what we should do. We don't know what
the RAID chunk size is or how many spindles there are, so any guess as
to how to chunk up the relation and divide up the work between workers
is just a shot in the dark.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2015-01-28 14:05:36 Re: hung backends stuck in spinlock heavy endless loop
Previous Message Stephen Frost 2015-01-28 13:32:39 Re: pg_upgrade and rsync