Re: Benchmark Data requested --- pgloader CE design ideas

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Benchmark Data requested --- pgloader CE design ideas
Date: 2008-02-06 15:56:03
Message-ID: Pine.GSO.4.64.0802061041230.15780@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, 6 Feb 2008, Simon Riggs wrote:

> For me, it would be good to see a --parallel=n parameter that would
> allow pg_loader to distribute rows in "round-robin" manner to "n"
> different concurrent COPY statements. i.e. a non-routing version.

Let me expand on this. In many of these giant COPY situations the
bottleneck is plain old sequential I/O to a single process. You can
almost predict how fast the rows will load using dd. Having a process
that pulls rows in and distributes them round-robin is good, but it won't
crack that bottleneck. The useful approaches I've seen for other
databases all presume that the data files involved are large enough that
on big hardware, you can start multiple processes running at different
points in the file and beat anything possible with a single reader.

If I'm loading a TB file, odds are good I can split that into 4 or more
vertical pieces (say rows 1-25%, 25-50%, 50-75%, 75-100%), start 4 loaders
at once, and get way more than 1 disk worth of throughput reading. You
have to play with the exact number because if you push the split too far
you introduce seek slowdown instead of improvements, but that's the basic
design I'd like to see one day. It's not parallel loading that's useful
for the cases I'm thinking about until something like this comes around.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2008-02-06 16:00:26 Re: Optimizer : query rewrite and execution plan ?
Previous Message Greg Smith 2008-02-06 15:40:20 Re: Benchmark Data requested