Re: optimising data load

From: Ron Johnson <ron(dot)l(dot)johnson(at)cox(dot)net>
To: PgSQL Novice ML <pgsql-novice(at)postgresql(dot)org>
Subject: Re: optimising data load
Date: 2002-05-23 02:40:07
Message-ID: 1022121607.19121.81.camel@rebel
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

On Wed, 2002-05-22 at 16:05, John Taylor wrote:
> On Wednesday 22 May 2002 18:40, Ron Johnson wrote:
> > On Wed, 2002-05-22 at 08:45, John Taylor wrote:
> > >
> > > Hi all,
> > >
> > > I'm (still) working on an application to regularly populate my database with some data provided from
> > > a third party DB.
> > >
> > > I'm still having really bad performance problems.
> > > There are 300,000 records to be inserted, but I'm only getting 10,000/hour.
> > > 30 hours to populate a single table is just not on. There must be something seriously bad going on.
> >
> > From what I see of your INSERT statement, you are joining orderheader
> > to orderlines and then inserting those records into orderlines.
> > Is that correct?
>
> Yes, unfortunately I get incomplete data, so I need to select some details before inserting.
>
> >
> > I think I'd make a view from the SELECT portion. Then, nightly:
> > 1. COPY out the view.
> > 2. Drop the index(es) on orderlines
> > 3. COPY into orderlines
> > 4. recreate indexes on orderlines.
>
> So, how would I merge the data from the view with the data to copy in ?

??? Unless I am misunderstanding you:

0. create view v_new_orderlines (theorder, type, stock, line,
ordercurrent, sellingquant, price, discount, vatrate, comment)
as SELECT oh.theorder, 'P', '0310', coalesce(ol.line+1,1), 5, 0,
.52, 0, 0, ''
FROM orderheader oh
LEFT OUTER JOIN orderlines ol
ON oh.theorder = ol.theorder
WHERE oh.account=' MILN1'
AND oh.delivery=1
AND oh.thedate='2002-06-01'
AND oh.ordertype='O'
ORDER BY ol.line ;
1. COPY v_new_orderlines to '/tmp/new_orderlines.tsv';
2. DROP each index from orderlines
3. COPY orderlines from '/tmp/new_orderlines.tsv';
4. CREATE each index on orderlines

> I was thinking of generating the copy file by performing a select for each row,
> but a selecting each row into a file, and then copy in might be more expensive than
> just inserting it like I do now.
>
> >
> > Do the same for orderlineupdates. The thing is is that
> > modifying indexes is a _very_ expensive operation, no matter
> > what RDBMS or hardware configuration you have...
> >
>
> Yes, I was thinking of that for my next step.
> Once I figure how to get copy working, I'll also try dropping indexes.

--
+---------------------------------------------------------+
| Ron Johnson, Jr. Home: ron(dot)l(dot)johnson(at)cox(dot)net |
| Jefferson, LA USA http://ronandheather.dhs.org:81 |
| |
| "I have created a government of whirled peas..." |
| Maharishi Mahesh Yogi, 12-May-2002, |
! CNN, Larry King Live |
+---------------------------------------------------------+

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Thomas A. Lowery 2002-05-23 05:26:57 Re: pl/perl Documentation
Previous Message Josh Berkus 2002-05-23 00:36:31 Re: Moving data to different hard drive /or/ incremental backup