From: | Dimitri Fontaine <dfontaine(at)hi-media(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Cc: | Greg Smith <gsmith(at)gregsmith(dot)com> |
Subject: | Re: Benchmark Data requested --- pgloader CE design ideas |
Date: | 2008-02-06 19:59:04 |
Message-ID: | 200802062059.07174.dfontaine@hi-media.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Le Wednesday 06 February 2008 18:37:41 Dimitri Fontaine, vous avez écrit :
> Le mercredi 06 février 2008, Greg Smith a écrit :
> > If I'm loading a TB file, odds are good I can split that into 4 or more
> > vertical pieces (say rows 1-25%, 25-50%, 50-75%, 75-100%), start 4
> > loaders at once, and get way more than 1 disk worth of throughput
> > reading.
>
> pgloader already supports starting at any input file line number, and limit
> itself to any number of reads:
In fact, the -F option works by having pgloader read the given number of lines
but skip processing them, which is not at all what Greg is talking about here
I think.
Plus, I think it would be easier for me to code some stat() then lseek() then
read() into the pgloader readers machinery than to change the code
architecture to support a separate thread for the file reader.
Greg, what would you think of a pgloader which will separate file reading
based on file size as given by stat (os.stat(file)[ST_SIZE]) and number of
threads: we split into as many pieces as section_threads section config
value.
This behaviour won't be available for sections where type = text and
field_count(*) is given, cause in this case I don't see how pgloader could
reliably recognize a new logical line beginning and start processing here.
In other cases, a logical line is a physical line, so we start after first
newline met from given lseek start position, and continue reading after the
last lseek position until a newline.
*:http://pgloader.projects.postgresql.org/#_text_format_configuration_parameters
Comments?
--
dim
From | Date | Subject | |
---|---|---|---|
Next Message | Dimitri Fontaine | 2008-02-06 20:04:09 | Re: Benchmark Data requested --- pgloader CE design ideas |
Previous Message | Luke Lonergan | 2008-02-06 17:49:56 | Re: Benchmark Data requested --- pgloader CE design ideas |