Re: problem with large inserts

From: Filip Rembiałkowski <plk(dot)zuber(at)gmail(dot)com>
To: Lutz Fischer <lfischer(at)staffmail(dot)ed(dot)ac(dot)uk>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: problem with large inserts
Date: 2012-12-13 16:44:01
Message-ID: CAP_rwwmS-BxfX0v6Asq2AMCLyQN=0=_ackrcp-cZkKqLE6W+3A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hmm, so it is some kind of file / table locking issue, not general IO
system malfunction.

It would be interesting and useful to run this use case on other
postgres instance (or several instances), including non-Windows ones.

OTOH Pg on Windows housekeeping was always "fun" - I advise all my
clients to avoid it for production purposes.

On Thu, Dec 13, 2012 at 5:33 PM, Lutz Fischer
<lfischer(at)staffmail(dot)ed(dot)ac(dot)uk> wrote:
> Thanks a lot you saved my day
>
> create temp table foo AS SELECT DISTINCT ...
> did take a mere 77464.744 ms
> And an additional
> Insert into LPP select * from foo;
> Just 576.909 ms
>
> I don't really understand why it's working via a temp table but not
> directly (or in any reasonable amount of time) - but at least I have a
> solution I can work with.
>
>
> On 13/12/12 16:09, Filip Rembiałkowski wrote:
>> Just an idea - how long does it take to run _only_
>> CREATE TEMP TABLE foo AS <your SELECT here>
>>
>>
>>
>>
>> On Thu, Dec 13, 2012 at 4:37 PM, Lutz Fischer
>> <lfischer(at)staffmail(dot)ed(dot)ac(dot)uk> wrote:
>>> Hi
>>>
>>> I have currently some trouble with inserts into a table
>>>
>>> INSERT INTO LPP (PPID, LID)
>>> SELECT DISTINCT PPid, LID FROM
>>> (SELECT * FROM PP WHERE s_id = sid) pp
>>> INNER JOIN
>>> has_protein hp1
>>> ON pp.p1id = hp1.pid
>>> INNER JOIN
>>> has_protein hp2
>>> ON pp.p2_id = hp2.pid
>>> INNER JOIN
>>> (SELECT * FROM L WHERE s_id = sid) l
>>> ON (hp1.pid = l.p1id AND hp2.pid = l.p2id AND hp1.ppos +
>>> pp.s1 = l.s1 AND hp2.ppos + pp.s2 = l.s2)
>>> OR (hp1.pid = l.p2id AND hp2.pid = l.p1id AND hp1.ppos +
>>> pp.s1 = l.s2 AND hp2.ppos + pp.s2 = l.s1)
>>> ;
>>>
>>> If I run only
>>>
>>> SELECT DISTINCT PPid, LID FROM
>>> (SELECT * FROM PP WHERE s_id = 708) pp
>>> INNER JOIN
>>> has_protein hp1
>>> ON pp.p1id = hp1.pid
>>> INNER JOIN
>>> has_protein hp2
>>> ON pp.p2_id = hp2.pid
>>> INNER JOIN
>>> (SELECT * FROM L WHERE s_id = 708) l
>>> ON (hp1.pid = l.p1id AND hp2.pid = l.p2id AND hp1.ppos +
>>> pp.s1 = l.s1 AND hp2.ppos + pp.s2 = l.s2)
>>> OR (hp1.pid = l.p2id AND hp2.pid = l.p1id AND hp1.ppos +
>>> pp.s1 = l.s2 AND hp2.ppos + pp.s2 = l.s1)
>>> ;
>>>
>>> it returns 200620 rows in 170649 ms ( thats just under 3 minutes). I
>>> stopped the actual insert after about 8h.
>>>
>>> The table that the insert happens to, is following:
>>> CREATE TABLE LPP
>>> (
>>> ppid bigint NOT NULL,
>>> lid bigint NOT NULL,
>>> CONSTRAINT pk_lpp PRIMARY KEY (ppid,lid)
>>> )
>>>
>>> I also tried without the primary key but that one is still running for
>>> more that a day.
>>>
>>> Currently the table LPP holds 471139 rows. Its linking the PP and the L
>>> table.
>>>
>>> There are no foreign keys referring to that table nor are there any
>>> other constraints on it.
>>> Previously I had foreign keys on lid and ppid refering to the L and PP
>>> table. But in a desperate try to get some speed up I deleted these. -
>>> But still...
>>>
>>> I am running postgresql 9.2 on a windows 2008 R2 server with 256 GB and
>>> the database is on something like a raid 1+0 (actually a raid1e)
>>> consisting of 3x4TB disks (limit of what could easily be fitted into the
>>> server).
>>>
>>> At the given time there were no concurrent access to any of the
>>> involved tables.
>>>
>>> Has anybody some idea why the insert takes so long and/or how to speed
>>> things up a bit? I could live with something like half an hour - better
>>> would be in minutes.
>>>
>>>
>>> Thanks for any responds,
>>>
>>> Lutz Fischer
>>>
>>>
>>> --
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>>
>>>
>>>
>>> --
>>> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-performance
>
>
> --
> Lutz Fischer
> lfischer(at)staffmail(dot)ed(dot)ac(dot)uk
> +44 131 6517057
>
>
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2012-12-13 17:07:00 Re: Limit & offset effect on query plans
Previous Message Lutz Fischer 2012-12-13 16:33:28 Re: problem with large inserts