Re: Performance problem with query

From: Q <qdolan(at)gmail(dot)com>
To: Christian Rengstl <Christian(dot)Rengstl(at)klinik(dot)uni-regensburg(dot)de>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Performance problem with query
Date: 2006-07-19 09:54:27
Message-ID: 5EC6C9D9-BA6C-41BE-B18B-696FB90BDB4B@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 19/07/2006, at 6:32 PM, Christian Rengstl wrote:

> The analyze is from the exact query and i dropped the indexes
> before the insert as well without imrpvement. The target table is
> as well completely empty and the insert is supposed to write, in
> this case, more or less 8 million lines in the table. There is a
> rule though, because i have inheritance table structure with one
> master table and around 20 child tables.

I would say the problem is in the rule. Try doing the insert into a
duplicate table with no rules or inheritance and see how long it takes.

Perhaps you should provide the actual schema of tables and rules that
are involved in the query in question.

> Q <qdolan(at)gmail(dot)com> wrote on 07/19/06 4:37 am:
>>
>> On 19/07/2006, at 4:24 AM, Christian Rengstl wrote:
>>
>>> now finally after a long time i have the query plan for the whole
>>> filled table. I hope somebody can explain me why it takes so much
>>> longer...
>>
>>
>> These explain analyze results don't appear to be from the queries you
>> posted previously. For these results to mean anything you need to
>> include the EXACT queries you used to generate them.
>>
>>> Result (cost=0.06..254643.06 rows=8044000 width=39) (actual
>>> time=0.056..655772
>>> .273 rows=8044000 loops=1)
>>> One-Time Filter: ((split_part(($1)::text, '_'::text,
>>> 2))::smallint = 1)
>>> InitPlan
>>> -> Limit (cost=0.00..0.02 rows=1 width=10) (actual
>>> time=0.003..0.003 rows
>>> =1 loops=1)
>>> -> Seq Scan on temp_table (cost=0.00..194313.00
>>> rows=8044000 width=
>>> 10) (actual time=0.002..0.002 rows=1 loops=1)
>>> -> Limit (cost=0.00..0.02 rows=1 width=10) (actual
>>> time=0.006..0.007 rows
>>> =1 loops=1)
>>> -> Seq Scan on temp_table (cost=0.00..194313.00
>>> rows=8044000 width=
>>> 10) (actual time=0.004..0.004 rows=1 loops=1)
>>> -> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000
>>> width=39) (act
>>> ual time=0.002..191672.344 rows=8044000 loops=1)
>>> Total runtime: 62259544.896 ms
>>
>>
>> This is the query you want to be interested in, the others took no
>> time at all.
>>
>> As a guess I would say the query is an INSERT INTO ... FROM
>> SELECT ... WHERE (split_part(???, '_', 2))::smallint = 1 statement.
>> The majority of the time appears to be taken up on the actual INSERT
>> and not the SELECT part.
>>
>> How many rows are in the target table and what indexes does it have?
>> Does it have any triggers, check constraints, or rules applied to it?
>> All these things can make the insert take longer as the number of
>> rows you have already in the table increases.
>>
>> More than likely you have a target table with a LOT of rows and a
>> bunch of indexes on it and your disks are being thrashed because the
>> indexes are not able to stay cached in RAM. At this point you should
>> ensure your machine is not swapping do disk, and at the very least
>> you should go through one of the many tuning guidelines available and
>> ensure you have allocated the appropriate amount of memory to
>> postgresql for your needs.
>>
>> You may also want to consider dropping the indexes before you do the
>> INSERT and recreate them afterwards.
>>
>>
>>
>>>>>> "Christian Rengstl" <Christian(dot)Rengstl(at)klinik(dot)uni-regensburg(dot)de>
>>>>>> 13.07.06 8.37 Uhr >>>
>>> Good morning list,
>>>
>>> the following query takes about 15 to 20 minutes for around 2
>>> million lines in the file myfile.txt, but with 8 million lines it
>>> takes around 5 hours and i just don't understand why there is such
>>> a huge discrepancy in performance.
>>>
>>> COPY public.temp_table FROM 'myfile.txt' WITH DELIMITER '\t';
>>>
>>> INSERT INTO public.master(pid,smid, val1, val2, chr)
>>> SELECT pid, smid, val1, val12, CAST(split_part((SELECT
>>> chr from public.temp_table LIMIT 1), '_', 2) as int2)
>>> FROM public.temp_table;
>>>
>>> INSERT INTO public.values(smid, pos, chr)
>>> SELECT DISTINCT smid, pos, CAST(split_part((SELECT chr from
>>> public.temp_table LIMIT 1), '_', 2) as int2)
>>> FROM public.temp_table;
>>>
>>> I came up with this query, because i wanted to use the COPY command
>>> to load huge files into the db, but i don't want to load all the
>>> columns contained in the file in only one table but copy some of
>>> them into one table and some in a second table. As i found out with
>>> logging, the data is loaded into temp_table within 15 minutes, but
>>> to transfer it from the temp_table toagain only something like 10
>>> minutes. Can it be that the cast takes up so much more time than
>>> when reading and transferring 2 million lines?
>>
>>
>> --
>> Seeya...Q
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>
>> _____ / Quinton Dolan - qdolan(at)gmail(dot)com
>> __ __/ / / __/ / /
>> / __ / _/ / / Gold Coast, QLD, Australia
>> __/ __/ __/ ____/ / - / Ph: +61 419 729 806
>> _______ /
>> _\
>>
>>
>>
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 2: Don't 'kill -9' the postmaster
>
>
> --
> Christian Rengstl M.A.
> Klinik und Poliklinik für Innere Medizin II
> Kardiologie - Forschung
> Universitätsklinikum Regensburg
> B3 1.388
> Franz-Josef-Strauss-Allee 11
> 93053 Regensburg
> Tel.: +49-941-944-7230

--
Seeya...Q

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

_____ / Quinton Dolan - qdolan(at)gmail(dot)com
__ __/ / / __/ / /
/ __ / _/ / / Gold Coast, QLD, Australia
__/ __/ __/ ____/ / - / Ph: +61 419 729 806
_______ /
_\

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Christian Rengstl 2006-07-19 09:58:30 Re: Performance problem with query
Previous Message Christian Rengstl 2006-07-19 08:32:22 Re: Performance problem with query