Re: Parallel Seq Scan

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-01-28 15:40:47
Message-ID: 30549.1422459647@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> The problem here, as I see it, is that we're flying blind. If there's
> just one spindle, I think it's got to be right to read the relation
> sequentially. But if there are multiple spindles, it might not be,
> but it seems hard to predict what we should do. We don't know what
> the RAID chunk size is or how many spindles there are, so any guess as
> to how to chunk up the relation and divide up the work between workers
> is just a shot in the dark.

I thought the proposal to chunk on the basis of "each worker processes
one 1GB-sized segment" should work all right. The kernel should see that
as sequential reads of different files, issued by different processes;
and if it can't figure out how to process that efficiently then it's a
very sad excuse for a kernel.

You are right that trying to do any detailed I/O scheduling by ourselves
is a doomed exercise. For better or worse, we have kept ourselves at
sufficient remove from the hardware that we can't possibly do that
successfully.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-01-28 15:42:27 Re: Parallel Seq Scan
Previous Message Robert Haas 2015-01-28 15:39:46 Re: Misaligned BufferDescriptors causing major performance problems on AMD