Re: Please comment on pgsql speed at handling 550,000 records

From: Michael Swierczek <mike(dot)swierczek(at)gmail(dot)com>
To: Srinivas Iyyer <srini_iyyer_bio(at)yahoo(dot)com>
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Re: Please comment on pgsql speed at handling 550,000 records
Date: 2006-02-07 18:14:04
Message-ID: 68b5b5880602071014t1daa829aj3bd6cc7d7eb659d8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Dear Srinivas,
It may be helpful to know what operating system you are using, and
also the amount of system RAM, the processor type, and the hard drive sizes
and speeds.

-Mike

On 2/7/06, Srinivas Iyyer <srini_iyyer_bio(at)yahoo(dot)com> wrote:
>
>
> dear group,
> I am running a file with 550,000 lines. This data is
> not so complex. The scheme is
> Table A
>
> seq_id | seq_name | seq_identifier
> 1 | ABC | NM_0000023
> ....
>
> (23k lines)
>
>
> Table B:
>
> gos_id | go_id | go_cat | go_name
> 1 GO:00003 | P | death
> ......
> (7k)
>
>
> Now table C
>
> seq_id | gos_id
> 1 1
> 1 | 2
> 1 | 200
>
> ...
> (550K lines)
>
>
>
>
> Table Temp_C
> seq_name | go_id
> ABC | GO:9993934
> ABC | GO:3489343
> ....
> (550,500 lines)
>
>
> question:
>
> TAble A and B populated easily without question.
>
> Table C, is completely a relationship table.
> I wrote a python script that writes all 500K lines
> with select statements inside.
>
> insert into tablec (seq_id,gos_id) values (
> (select seq_id from table a where seq_name ='xxxx'),
> (select gos_id from table b where go_id = 'xxxx'));
>
>
> Such 500K + insert statements took a long time and I
> had to abort it.
>
> 3.5 hrs of time, 300K lines got inserted
>
>
> I wrote a plpgsql function:
>
> create function gosup() returns integer AS '
> DECLARE
> referrer_key RECORD;
> BEIGN
> FOR referrer_key IN SELECT * from TEMP_C LOOP
> INSERT INTO tableC(seq_id,gos_id) values(
> (select seq_id from table a where seq_name
> =referrer_key.seq_name),
> (select gos_id from table b where go_id =
> referrer_key.go_id))
> END LOOP
> return 0;
> END;
> ' LANGUAGE plpgsql;
>
> so this function is written to make the inserting
> faster.
>
> I used \copy command to create temp_c.
>
>
> The ironical point is, even now it is taking over an
> hour (still running as of now).
>
> So experts,
>
> do you really thing inserting 550K lines is really a
> long time consuming step no matter what method you
> choose (such as regular insert statements or writing a
> procedure to automate this step on server side).
>
> Could any one throw some light on DB performance (no
> matter wheter it is postgres, oracle, SyBase or
> MySQL).
>
> In specific what is the state of efficiency in
> postgres.
>
> Thank you.
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Tom Lane 2006-02-07 18:33:17 Re: Please comment on pgsql speed at handling 550,000 records
Previous Message Srinivas Iyyer 2006-02-07 17:56:30 Please comment on pgsql speed at handling 550,000 records