Re: Time consuming process ...3 million records please help

From: Sean Davis <sdavis2(at)mail(dot)nih(dot)gov>
To: Srinivas Iyyer <srini_iyyer_bio(at)yahoo(dot)com>
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Re: Time consuming process ...3 million records please help
Date: 2006-03-28 01:35:34
Message-ID: 442892E6.7010408@mail.nih.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Srinivas Iyyer wrote:
> Dear group,
> I have two tables and a temp table where I uploaded
> data using \copy. This temp table has "3,348,107"
> lines of data.
>
> I wrote a plpgsql function to read each record in
> temp_table, take the firt col. data (temp_refseq_id)
> and get corresponding 'seq_id' from table B and insert
> into table A.
>
> I started this process 8 hrs back. It has been running
> for last 8 hrs and yet it is not finished. the reason
> i did this temp table thing was to speedup process by
> writing a server side function. I still did not get
> to win over the time issue here. If this is a
> continuous problem I will have to search for another
> db system since my files from now on are huge and has
> over mil records. I am betting a lot of time in this
> case.
>
> Could any one help writing a faster function.
>
> thanks
> looking forward to hear from people.
>
> Temp_table:
>
> temp_refseq_id | temp_imageid | temp_genbankacc
> ----------------+----------------+-----------------
> NM_003604 | IMAGE:1099538 | AA594716
> NM_003604 | IMAGE:853047 | AA668250
> NM_001008860 | IMAGE:3640970 | BC011775
> NM_001008860 | IMAGE:3640970 | BE737509
> NM_001008860 | IMAGE:6040319 | BU079001
> NM_001008860 | IMAGE:6040319 | BU078725
> NM_001008860 | IMAGE:3451448 | BC000957
> NM_001008860 | IMAGE:3451448 | BE539334
> NM_001008860 | IMAGE:4794135 | BG708105
> NM_001008860 | IMAGE:5214087 | BI911674
>
> Table A : (I want to upload data from temp to here)
>
> spota_id | seq_id | spota_imageid | spota_genbacc
> ----------+--------+---------------+--------
> 23 | 54525 | IMAGE:1099538 | AA594716
>
>
>
>
> Table B : This table is seqdump table where seq_id is
> a FK in Table B
>
> seq_id | seq_acc | seq_name
> ------------------------------
> 54519 | NM_152918 | EMR2
> 54520 | NM_001008860| CGGBP1
> 54521 | NM_020040 | TUBB4Q
> 54522 | NM_017525 | CDC42BPG
> 54523 | NM_032158 | WBSCR20C
> 54524 | NM_004047 | ATP6V0B
> 54525 | NM_003604 | PLCB3
>
>
> Function:
>
> CREATE FUNCTION tab_update() RETURNS integer AS '
> DECLARE
> referrer_keys RECORD;
> BEGIN
> FOR referrer_keys IN SELECT * from
> temp_spotanno LOOP
> INSERT INTO spotanno(seq_id,
> spotanno_imageid,spotanno_genbankacc) values((SELECT
> seq_id from seqdump where seq_acc =
> referrer_keys.temp_refseq_id),referrer_keys.temp_imageid,referrer_keys.temp_genbankacc);
> END LOOP;
> return 0;
>
> END;
> ' LANGUAGE plpgsql;
>
>
> Thanks
> Sri
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>
Hi, Sri.

I would do this differently. You don't need the function I don't
think. I would do something like this (untested):

insert into spontanno (seq_id, spontanno_imageid, spontanno_genbankacc)
select seqdump.seq_id, referrer_keys.temp_imageid,
referrer_keys.temp.genbankacc from referrer_keys, seqdump
where referrer_keys.temp_refseq_id=seqdump.seq_acc;

Make sure there is an index on seq_acc and temp_refseq_id. This should
be done in seconds to minutes, not hours.

Sean

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Srinivas Iyyer 2006-03-28 02:17:16 Re: Time consuming process ...3 million records please help
Previous Message Srinivas Iyyer 2006-03-28 00:59:50 Time consuming process ...3 million records please help