Re: Time consuming process ...3 million records please help

From: Sean Davis <sdavis2(at)mail(dot)nih(dot)gov>
To: Srinivas Iyyer <srini_iyyer_bio(at)yahoo(dot)com>
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Re: Time consuming process ...3 million records please help
Date: 2006-03-28 02:31:07
Message-ID: 44289FEB.207@mail.nih.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Srinivas Iyyer wrote:
> Hi Sean,
> thanks for your reply. in my temp table,
> temp_refseq_id column cannot be indexed due to
> duplicate values. will that effect drastically.
> I havent tested it yet. on top of that i killed 8 hrs
> process running previously on tab_update() function.
> :-)

Sri,

Columns with duplicate values can be indexed; only if you declare an
index to be "unique" will you then have a problem with inserting
duplicates.

You just need to do:

1. Create your temp table structure.
2. Copy in your data.
3. Create an index for all fields that show up in a "where" or join
clause (THIS IS NECESSARY FOR ANY RDBMS TO WORK EFFICIENTLY!!!)

create index temp_refseq_id_index on TABLENAME(temp_refseq_id);

4. Vacuum analyze the table(s) after copying and creating the index
(thereby telling postgres what is in the table)

vacuum analyze TABLENAME

5. Do something along the lines of:

insert into ....
select from ....

That should do it, approximately. If you still have problems, then you
will need to be more specific about what the problems are, I think.

As an aside, I have similar data stored in a postgres database, but in
some places on the order of several 10's of millions of records;
postgres is not limiting in this respect.

Sean

> thanks
> sri
>
>
>> Hi, Sri.
>>
>> I would do this differently. You don't need the
>> function I don't
>> think. I would do something like this (untested):
>>
>> insert into spontanno (seq_id, spontanno_imageid,
>> spontanno_genbankacc)
>> select seqdump.seq_id, referrer_keys.temp_imageid,
>> referrer_keys.temp.genbankacc from
>> referrer_keys, seqdump
>> where
>> referrer_keys.temp_refseq_id=seqdump.seq_acc;
>>
>> Make sure there is an index on seq_acc and
>> temp_refseq_id. This should
>> be done in seconds to minutes, not hours.
>>
>> Sean
>
> -
>
>
>
> <snip>
>
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Adam Witney 2006-03-28 09:44:38 Re: Transfer from MySQL to PostgreSQL
Previous Message Srinivas Iyyer 2006-03-28 02:17:16 Re: Time consuming process ...3 million records please help