Re: [PERFORM] multi-threaded pgloader makes it in version 2.3.0

From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: [PERFORM] multi-threaded pgloader makes it in version 2.3.0
Date: 2008-03-10 17:57:38
Message-ID: 200803101857.38489.dfontaine@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance

Le lundi 10 mars 2008, Simon Riggs a écrit :
> Not sure when or why I would want an rrqueue_size larger than
> copy_every, and less sounds very strange. Can we get away with it being
> the same thing in all cases?

In fact, that's just that you asked for a reader which reads one line at a
time and feed the workers in a round robin fashion, and I wanted to feed them
more than 1 line at a time, hence this parameter. Of course it could well be
it's not needed, and I'll then deprecate it in next version.
Please note it defaults to what you want it to be, so you can just forget
about it...

I'm beginning to think you asked 1 line at a time for the first version to be
easier to implement... :)

> Do you have some basic performance numbers? It would be good to
> understand the overhead of the parallelism on a large file with 1, 2 and
> 4 threads. Would be good to see if synchronous_commit = off helped speed
> things up as well.

Didn't have the time to test this performance wise, that's why I asked for
testing last time. I've planned some perf tests if only to have the
opportunity to write up some presentation article, but didn't find the time
to run them yet.

> Presumably -V and -T still work when we go parallel, but just issue one
> query?

Still work, of course, the 'controller' thread will issue them before to
parallelize the work or begin to read the input file. Rejecting still works
the same too, threads share a reject object which is protected by a lock
(mutex), so the file don't get mixed line.
I've tried not to compromise any existing feature by adding the parallel ones,
and didn't have to at the end of it.

Regards,
--
dim

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Alban Hertroys 2008-03-10 17:58:23 Re: searching using indexes 8.3
Previous Message Roberts, Jon 2008-03-10 17:57:26 pg_type.relacl

Browse pgsql-performance by date

  From Date Subject
Next Message Joe Mirabal 2008-03-10 20:54:23 Re: count * performance issue
Previous Message Simon Riggs 2008-03-10 17:14:23 Re: [PERFORM] multi-threaded pgloader makes it in version 2.3.0