Skip site navigation (1) Skip section navigation (2)

Re: Please comment on pgsql speed at handling 550,000 records

From: Srinivas Iyyer <srini_iyyer_bio(at)yahoo(dot)com>
To: Michael Swierczek <mike(dot)swierczek(at)gmail(dot)com>
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Re: Please comment on pgsql speed at handling 550,000 records
Date: 2006-02-07 18:33:28
Message-ID: 20060207183328.45953.qmail@web34507.mail.mud.yahoo.com (view raw or flat)
Thread:
Lists: pgsql-novice
Hi Mike, 
 I am using a Dell Precission 670, 
RAM: 2 GB
Processor: two IntelĀ® XeonTM  3.00GHz, 2MB L2 Cache
OS : Red HatĀ® Enterprise Linux WS v4,
Chipset : .Intel E7525 chipset
Disks :  1.5TB SATA disk space (3x500); If I am not
wrong they are  7200RPM  disks 

thanks
Srini


--- Michael Swierczek <mike(dot)swierczek(at)gmail(dot)com>
wrote:

> Dear Srinivas,
>       It may be helpful to know what operating
> system you are using, and
> also the amount of system RAM, the processor type,
> and the hard drive sizes
> and speeds.
> 
> -Mike
> 
> On 2/7/06, Srinivas Iyyer
> <srini_iyyer_bio(at)yahoo(dot)com> wrote:
> >
> >
> > dear group,
> > I am running a file with 550,000 lines.  This data
> is
> > not so complex. The scheme is
> > Table A
> >
> > seq_id | seq_name  | seq_identifier
> > 1      | ABC       | NM_0000023
> > ....
> >
> > (23k lines)
> >
> >
> > Table B:
> >
> > gos_id  |  go_id  | go_cat  | go_name
> > 1        GO:00003 | P       | death
> > ......
> > (7k)
> >
> >
> > Now table C
> >
> > seq_id   | gos_id
> > 1           1
> > 1        |  2
> > 1        |  200
> >
> > ...
> > (550K lines)
> >
> >
> >
> >
> > Table Temp_C
> > seq_name  |  go_id
> > ABC       | GO:9993934
> > ABC       | GO:3489343
> > ....
> > (550,500 lines)
> >
> >
> > question:
> >
> > TAble A and B populated easily without question.
> >
> > Table C, is completely a relationship table.
> > I wrote a python script that writes all 500K lines
> > with select statements inside.
> >
> > insert into tablec (seq_id,gos_id) values (
> > (select seq_id from table a where seq_name
> ='xxxx'),
> > (select gos_id from table b where go_id =
> 'xxxx'));
> >
> >
> > Such 500K + insert statements took a long time and
> I
> > had to abort it.
> >
> > 3.5 hrs of time, 300K lines got inserted
> >
> >
> > I wrote a plpgsql function:
> >
> > create function gosup() returns integer AS '
> > DECLARE
> >     referrer_key RECORD;
> > BEIGN
> >     FOR referrer_key IN SELECT * from TEMP_C LOOP
> >        INSERT INTO tableC(seq_id,gos_id) values(
> > (select seq_id from table a where seq_name
> > =referrer_key.seq_name),
> > (select gos_id from table b where go_id =
> > referrer_key.go_id))
> >     END LOOP
> > return 0;
> > END;
> > ' LANGUAGE plpgsql;
> >
> > so this function is written to make the inserting
> > faster.
> >
> > I used \copy command to create temp_c.
> >
> >
> > The ironical point is, even now it is taking over
> an
> > hour (still running as of now).
> >
> > So experts,
> >
> > do you really thing inserting 550K lines is really
> a
> > long time consuming step no matter what method you
> > choose (such as regular insert statements or
> writing a
> > procedure to automate this step on server side).
> >
> > Could any one throw some light on DB performance
> (no
> > matter wheter it is postgres, oracle, SyBase or
> > MySQL).
> >
> > In specific what is the state of efficiency in
> > postgres.
> >
> > Thank you.
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection around
> > http://mail.yahoo.com
> >
> > ---------------------------(end of
> broadcast)---------------------------
> > TIP 1: if posting/reading through Usenet, please
> send an appropriate
> >        subscribe-nomail command to
> majordomo(at)postgresql(dot)org so that your
> >        message can get through to the mailing list
> cleanly
> >
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

In response to

pgsql-novice by date

Next:From: Srinivas IyyerDate: 2006-02-07 18:36:04
Subject: Re: Please comment on pgsql speed at handling 550,000 records
Previous:From: Tom LaneDate: 2006-02-07 18:33:17
Subject: Re: Please comment on pgsql speed at handling 550,000 records

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group