problem with large inserts

From: Lutz Fischer <lfischer(at)staffmail(dot)ed(dot)ac(dot)uk>
To: pgsql-performance(at)postgresql(dot)org
Subject: problem with large inserts
Date: 2012-12-13 15:37:33
Message-ID: 50C9F63D.5040400@staffmail.ed.ac.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi

I have currently some trouble with inserts into a table

INSERT INTO LPP (PPID, LID)
SELECT DISTINCT PPid, LID FROM
(SELECT * FROM PP WHERE s_id = sid) pp
INNER JOIN
has_protein hp1
ON pp.p1id = hp1.pid
INNER JOIN
has_protein hp2
ON pp.p2_id = hp2.pid
INNER JOIN
(SELECT * FROM L WHERE s_id = sid) l
ON (hp1.pid = l.p1id AND hp2.pid = l.p2id AND hp1.ppos +
pp.s1 = l.s1 AND hp2.ppos + pp.s2 = l.s2)
OR (hp1.pid = l.p2id AND hp2.pid = l.p1id AND hp1.ppos +
pp.s1 = l.s2 AND hp2.ppos + pp.s2 = l.s1)
;

If I run only

SELECT DISTINCT PPid, LID FROM
(SELECT * FROM PP WHERE s_id = 708) pp
INNER JOIN
has_protein hp1
ON pp.p1id = hp1.pid
INNER JOIN
has_protein hp2
ON pp.p2_id = hp2.pid
INNER JOIN
(SELECT * FROM L WHERE s_id = 708) l
ON (hp1.pid = l.p1id AND hp2.pid = l.p2id AND hp1.ppos +
pp.s1 = l.s1 AND hp2.ppos + pp.s2 = l.s2)
OR (hp1.pid = l.p2id AND hp2.pid = l.p1id AND hp1.ppos +
pp.s1 = l.s2 AND hp2.ppos + pp.s2 = l.s1)
;

it returns 200620 rows in 170649 ms ( thats just under 3 minutes). I
stopped the actual insert after about 8h.

The table that the insert happens to, is following:
CREATE TABLE LPP
(
ppid bigint NOT NULL,
lid bigint NOT NULL,
CONSTRAINT pk_lpp PRIMARY KEY (ppid,lid)
)

I also tried without the primary key but that one is still running for
more that a day.

Currently the table LPP holds 471139 rows. Its linking the PP and the L
table.

There are no foreign keys referring to that table nor are there any
other constraints on it.
Previously I had foreign keys on lid and ppid refering to the L and PP
table. But in a desperate try to get some speed up I deleted these. -
But still...

I am running postgresql 9.2 on a windows 2008 R2 server with 256 GB and
the database is on something like a raid 1+0 (actually a raid1e)
consisting of 3x4TB disks (limit of what could easily be fitted into the
server).

At the given time there were no concurrent access to any of the
involved tables.

Has anybody some idea why the insert takes so long and/or how to speed
things up a bit? I could live with something like half an hour - better
would be in minutes.

Thanks for any responds,

Lutz Fischer

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2012-12-13 15:47:14 Re: Limit & offset effect on query plans
Previous Message Peter Geoghegan 2012-12-13 15:31:06 Re: [PERFORM] encouraging index-only scans