Re: Benchmark Data requested --- pgloader CE design ideas

From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Benchmark Data requested --- pgloader CE design ideas
Date: 2008-02-07 17:15:44
Message-ID: 20080207171544.GU4201@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Feb 07, 2008 at 12:06:42PM -0500, Greg Smith wrote:
> On Thu, 7 Feb 2008, Dimitri Fontaine wrote:
>
>> I was thinking of not even reading the file content from the controller
>> thread, just decide splitting points in bytes (0..ST_SIZE/4 -
>> ST_SIZE/4+1..2*ST_SIZE/4 etc) and let the reading thread fine-tune by
>> beginning to process input after having read first newline, etc.
>
> The problem I was pointing out is that if chunk#2 moved foward a few bytes
> before it started reading in search of a newline, how will chunk#1 know
> that it's supposed to read up to that further point? You have to stop #1
> from reading further when it catches up with where #2 started. Since the
> start of #2 is fuzzy until some reading is done, what you're describing
> will need #2 to send some feedback to #1 after they've both started, and
> that sounds bad to me. I like designs where the boundaries between threads
> are clearly defined before any of them start and none of them ever talk to
> the others.
>

As long as both processes understand the start condition, there
is not a problem. p1 starts at beginning and processes through chunk2
offset until it reaches the start condition. p2 starts loading from
chunk2 offset plus the amount needed to reach the start condition, ...

DBfile|---------------|--x--------------|x----------------|-x--|
x chunk1----------->
x chunk2-------->
x chunk3----------->...

As long as both pieces use the same test, they will each process
non-overlapping segments of the file and still process 100% of the
file.

Ken

>> In both cases, maybe it would also be needed for pgloader to be able to
>> have a
>> separate thread for COPYing the buffer to the server, allowing it to
>> continue
>> preparing next buffer in the meantime?
>
> That sounds like a V2.0 design to me. I'd only chase after that level of
> complexity if profiling suggests that's where the bottleneck really is.
>
> --
> * Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message andrew klassen 2008-02-07 18:38:52 index usage on arrays
Previous Message Mark Lewis 2008-02-07 17:14:49 Re: Benchmark Data requested --- pgloader CE design ideas