From: | Filip Rembiałkowski <plk(dot)zuber(at)gmail(dot)com> |
---|---|
To: | Lutz Fischer <lfischer(at)staffmail(dot)ed(dot)ac(dot)uk> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: problem with large inserts |
Date: | 2012-12-13 16:44:01 |
Message-ID: | CAP_rwwmS-BxfX0v6Asq2AMCLyQN=0=_ackrcp-cZkKqLE6W+3A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hmm, so it is some kind of file / table locking issue, not general IO
system malfunction.
It would be interesting and useful to run this use case on other
postgres instance (or several instances), including non-Windows ones.
OTOH Pg on Windows housekeeping was always "fun" - I advise all my
clients to avoid it for production purposes.
On Thu, Dec 13, 2012 at 5:33 PM, Lutz Fischer
<lfischer(at)staffmail(dot)ed(dot)ac(dot)uk> wrote:
> Thanks a lot you saved my day
>
> create temp table foo AS SELECT DISTINCT ...
> did take a mere 77464.744 ms
> And an additional
> Insert into LPP select * from foo;
> Just 576.909 ms
>
> I don't really understand why it's working via a temp table but not
> directly (or in any reasonable amount of time) - but at least I have a
> solution I can work with.
>
>
> On 13/12/12 16:09, Filip Rembiałkowski wrote:
>> Just an idea - how long does it take to run _only_
>> CREATE TEMP TABLE foo AS <your SELECT here>
>>
>>
>>
>>
>> On Thu, Dec 13, 2012 at 4:37 PM, Lutz Fischer
>> <lfischer(at)staffmail(dot)ed(dot)ac(dot)uk> wrote:
>>> Hi
>>>
>>> I have currently some trouble with inserts into a table
>>>
>>> INSERT INTO LPP (PPID, LID)
>>> SELECT DISTINCT PPid, LID FROM
>>> (SELECT * FROM PP WHERE s_id = sid) pp
>>> INNER JOIN
>>> has_protein hp1
>>> ON pp.p1id = hp1.pid
>>> INNER JOIN
>>> has_protein hp2
>>> ON pp.p2_id = hp2.pid
>>> INNER JOIN
>>> (SELECT * FROM L WHERE s_id = sid) l
>>> ON (hp1.pid = l.p1id AND hp2.pid = l.p2id AND hp1.ppos +
>>> pp.s1 = l.s1 AND hp2.ppos + pp.s2 = l.s2)
>>> OR (hp1.pid = l.p2id AND hp2.pid = l.p1id AND hp1.ppos +
>>> pp.s1 = l.s2 AND hp2.ppos + pp.s2 = l.s1)
>>> ;
>>>
>>> If I run only
>>>
>>> SELECT DISTINCT PPid, LID FROM
>>> (SELECT * FROM PP WHERE s_id = 708) pp
>>> INNER JOIN
>>> has_protein hp1
>>> ON pp.p1id = hp1.pid
>>> INNER JOIN
>>> has_protein hp2
>>> ON pp.p2_id = hp2.pid
>>> INNER JOIN
>>> (SELECT * FROM L WHERE s_id = 708) l
>>> ON (hp1.pid = l.p1id AND hp2.pid = l.p2id AND hp1.ppos +
>>> pp.s1 = l.s1 AND hp2.ppos + pp.s2 = l.s2)
>>> OR (hp1.pid = l.p2id AND hp2.pid = l.p1id AND hp1.ppos +
>>> pp.s1 = l.s2 AND hp2.ppos + pp.s2 = l.s1)
>>> ;
>>>
>>> it returns 200620 rows in 170649 ms ( thats just under 3 minutes). I
>>> stopped the actual insert after about 8h.
>>>
>>> The table that the insert happens to, is following:
>>> CREATE TABLE LPP
>>> (
>>> ppid bigint NOT NULL,
>>> lid bigint NOT NULL,
>>> CONSTRAINT pk_lpp PRIMARY KEY (ppid,lid)
>>> )
>>>
>>> I also tried without the primary key but that one is still running for
>>> more that a day.
>>>
>>> Currently the table LPP holds 471139 rows. Its linking the PP and the L
>>> table.
>>>
>>> There are no foreign keys referring to that table nor are there any
>>> other constraints on it.
>>> Previously I had foreign keys on lid and ppid refering to the L and PP
>>> table. But in a desperate try to get some speed up I deleted these. -
>>> But still...
>>>
>>> I am running postgresql 9.2 on a windows 2008 R2 server with 256 GB and
>>> the database is on something like a raid 1+0 (actually a raid1e)
>>> consisting of 3x4TB disks (limit of what could easily be fitted into the
>>> server).
>>>
>>> At the given time there were no concurrent access to any of the
>>> involved tables.
>>>
>>> Has anybody some idea why the insert takes so long and/or how to speed
>>> things up a bit? I could live with something like half an hour - better
>>> would be in minutes.
>>>
>>>
>>> Thanks for any responds,
>>>
>>> Lutz Fischer
>>>
>>>
>>> --
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>>
>>>
>>>
>>> --
>>> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-performance
>
>
> --
> Lutz Fischer
> lfischer(at)staffmail(dot)ed(dot)ac(dot)uk
> +44 131 6517057
>
>
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
From | Date | Subject | |
---|---|---|---|
Next Message | Kevin Grittner | 2012-12-13 17:07:00 | Re: Limit & offset effect on query plans |
Previous Message | Lutz Fischer | 2012-12-13 16:33:28 | Re: problem with large inserts |