Re: [GENERAL] performance issue using DBI

From: "Joshua b(dot) Jore" <josh(at)greentechnologist(dot)org>
To: Nicolas Nolst <nnolst(at)hotmail(dot)com>
Cc: pgsql-admin(at)postgresql(dot)org, <pgsql-general(at)postgresql(dot)org>
Subject: Re: [GENERAL] performance issue using DBI
Date: 2002-06-06 13:55:40
Message-ID: Pine.BSO.4.44.0206060832210.2703-100000@kitten.greentechnologist.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-general

How much data is 20,000 lines? 20KB? 100MB? You might do well to just
process this in memory and then just COPY the right data to the table.
This gets away from doing the SELECT/INSERT/SELECT/INSERT thing which is
going to be painful for batch jobs.

See, the thing is that on every insert the indexes have to be updated.
They won't be used until after you VACUUM the tables so it does no good
inside your transaction. Drop the indexes.

Also, leave off with your use of currval/nextval. The point of a serial
type is that the column increments each time automagically. For what you
are doing you could turn that serial into an integer, create your sequence
separately, exec nextval('yourseq') and then just use the value you
retrieved. Don't use currval since some other process might alter the
sequence between the time you call nextval and currval. Just store the
value.

I think this problem is better solved on the perl side than on the
PostgreSQL side. Consider using data structures like so. You can probably
create a better structure since you know your data and I don't.

sessions
{ remote_ip => { phone_type => { phone_number => session_id,
phone_number => session_id
},
phone_type => { phone_number => session_id,
phone_number => session_id
}
},
remote_ip => { phone_type => { phone_number => session_id,
phone_number => session_id
},
phone_type => { phone_number => session_id,
phone_number => session_id
}
}
}

actions - now session_id is the array offset.
[ [ url, timestamp ],
[ url, timestamp ],
[ url, timestamp ] ],
[ [ url, timestamp ],
[ url, timestamp ],
[ url, timestamp ] ],
[ [ url, timestamp ],
[ url, timestamp ],
[ url, timestamp ] ],

> If the session already exists I add a line in the table actions with a INSERT
>
> If the session doesn't exist or if the criteria is true, I add a line in the
> table sessions with an INSERT and then add a line with a INSERT in the table actions (I use nextval and currval).
>
> I have put indexes on sessions(session_id), sessions(msisdn),
> actions(session_id) and actions(timestamp). I process one log file of about 20000 lines every day. All the lines are
> processed in one transaction (autocommit set to 0).
>
> My problem is that populating my database is slower when the data gets bigger
> and the performance falls dramatically. I thought that is would be improve with
> my indexes but the problem still persists.
>
> Could you please give me some clues that could solve this issue.
>
> Thanks.
>
>
>
> Nicolas Nolst
> [belgium_gs.gif]
>
> ______________________________________________________________________________________________________________________________
> MSN Photos is the easiest way to share and print your photos: Click Here
>
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Ferdinand Smit 2002-06-06 15:38:51 About a PL/pgSQL function
Previous Message Giorgio Ponza 2002-06-06 10:47:39 Re: [ADMIN] performance issue using DBI

Browse pgsql-general by date

  From Date Subject
Next Message Yutaka tanida 2002-06-06 14:43:09 Re: [HACKERS] PostgreSQL and Windows2000 and defunct processes
Previous Message Gianfranco Masia - Eprom s.r.l. 2002-06-06 12:44:12 Version 7.2.1 is the last up to the new 7.3?