Re: Performance problem with query

From: "Christian Rengstl" <Christian(dot)Rengstl(at)klinik(dot)uni-regensburg(dot)de>
To: "Merlin Moncure" <mmoncure(at)gmail(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: Performance problem with query
Date: 2006-07-19 08:29:00
Message-ID: 44BE096C.0AD0.0080.0@klinik.uni-regensburg.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

The thing is that in the text file there is a column that is something like xyz_12 and in the table i just need the integer part of it that's what the query is used for. The problem though is not really in the select part, because running the select part on 8 million lines takes about 3 minutes, but i don't know why the insert is taking so long.

"Merlin Moncure" <mmoncure(at)gmail(dot)com> wrote on 07/18/06 9:39 pm:
> On 7/18/06, Christian Rengstl
> <Christian(dot)Rengstl(at)klinik(dot)uni-regensburg(dot)de> wrote:
>> now finally after a long time i have the query plan for the whole filled table. I hope somebody can explain me why it takes so much longer...
>> QUERY PLAN
>>
>> --------------------------------------------------------------------------------
>> --------------------------------------------------------------------------------
>> -----
>> Result (cost=0.09..254643.09 rows=8044000 width=39) (actual time=0.043..0.043
>> rows=0 loops=1)
>> One-Time Filter: ((((split_part(($1)::text, '_'::text, 2))::smallint = 1) IS
>> NOT TRUE) AND (((split_part(($2)::text, '_'::text, 2))::smallint = 22) IS NOT TR
>> UE))
>> InitPlan
>> -> Limit (cost=0.00..0.02 rows=1 width=10) (never executed)
>> -> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width=
>> 10) (never executed)
>> -> Limit (cost=0.00..0.02 rows=1 width=10) (actual time=0.006..0.006 rows
>> =1 loops=1)
>> -> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width=
>> 10) (actual time=0.004..0.004 rows=1 loops=1)
>> -> Limit (cost=0.00..0.02 rows=1 width=10) (never executed)
>> -> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width=
>> 10) (never executed)
>> -> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width=39) (nev
>> er executed)
>> Total runtime: 0.238 ms
>>
>> Result (cost=0.06..254643.06 rows=8044000 width=39) (actual time=0.056..655772
>> .273 rows=8044000 loops=1)
>> One-Time Filter: ((split_part(($1)::text, '_'::text, 2))::smallint = 1)
>> InitPlan
>> -> Limit (cost=0.00..0.02 rows=1 width=10) (actual time=0.003..0.003 rows
>> =1 loops=1)
>> -> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width=
>> 10) (actual time=0.002..0.002 rows=1 loops=1)
>> -> Limit (cost=0.00..0.02 rows=1 width=10) (actual time=0.006..0.007 rows
>> =1 loops=1)
>> -> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width=
>> 10) (actual time=0.004..0.004 rows=1 loops=1)
>> -> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width=39) (act
>> ual time=0.002..191672.344 rows=8044000 loops=1)
>> Total runtime: 62259544.896 ms
>>
>> Result (cost=0.06..254643.06 rows=8044000 width=39) (actual time=1.245..1.245
>> rows=0 loops=1)
>> One-Time Filter: ((split_part(($1)::text, '_'::text, 2))::smallint = 22)
>> InitPlan
>> -> Limit (cost=0.00..0.02 rows=1 width=10) (never executed)
>> -> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width=
>> 10) (never executed)
>> -> Limit (cost=0.00..0.02 rows=1 width=10) (actual time=0.013..0.014 rows
>> =1 loops=1)
>> -> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width=
>> 10) (actual time=0.009..0.009 rows=1 loops=1)
>> -> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width=39) (nev
>> er executed)
>> Total runtime: 22.270 ms
>> (31 Zeilen)
>>
>>
>>
>> >>> "Christian Rengstl" <Christian(dot)Rengstl(at)klinik(dot)uni-regensburg(dot)de> 13.07.06 8.37 Uhr >>>
>> Good morning list,
>>
>> the following query takes about 15 to 20 minutes for around 2 million lines in the file myfile.txt, but with 8 million lines it takes around 5 hours and i just don't understand why there is such a huge discrepancy in performance.
>>
>> COPY public.temp_table FROM 'myfile.txt' WITH DELIMITER '\t';
>>
>> INSERT INTO public.master(pid,smid, val1, val2, chr)
>> SELECT pid, smid, val1, val12, CAST(split_part((SELECT chr from public.temp_table LIMIT 1), '_', 2) as int2)
>> FROM public.temp_table;
>>
>> INSERT INTO public.values(smid, pos, chr)
>> SELECT DISTINCT smid, pos, CAST(split_part((SELECT chr from public.temp_table LIMIT 1), '_', 2) as int2)
>> FROM public.temp_table;
>>
>
> what is this phrase doing exactly?
> CAST(split_part((SELECT chr from public.temp_table LIMIT 1), '_', 2)
>
> it looks fishy.
> merlin
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match

--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Christian Rengstl 2006-07-19 08:32:22 Re: Performance problem with query
Previous Message John DeSoi 2006-07-19 05:31:18 Re: is there any dataware housing tools for postgresql