stored proc and inserting hundreds of thousands of rows

From: Joel Reymont <joelr1(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: stored proc and inserting hundreds of thousands of rows
Date: 2011-04-30 16:56:50
Message-ID: D5DE36B9-17CF-4A55-9119-CDC0BDC4E48D@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I have a stored proc that potentially inserts hundreds of thousands, potentially millions, of rows (below).

This stored proc is part of the the sequence of creating an ad campaign and links an ad to documents it should be displayed with.

A few of these stored procs can run concurrently as users create ad campaigns.

We have 2 million documents now and linking an ad to all of them takes 5 minutes on my top-of-the-line SSD MacBook Pro.

Last but not least, the system has to quickly serve ads while documents are being linked which is a problem at the moment.

What can I do to make linking documents to ads faster or have less impact on the system. I would like the system to be as responsive with serving ads while the linking itself is allowed to take a few minutes.

One thing I'm concerned with, for example, is the whole multi-million row insert running within the stored proc transaction. I think inserting rows one by one or in small batches may be an improvement. I don't know how to accomplish this, though.

Thanks, Joel

---

CREATE DOMAIN doc_id AS varchar(64);
CREATE DOMAIN id AS int;

CREATE TABLE doc_ads
(
doc_id id NOT NULL REFERENCES docs,
ad_id id NOT NULL REFERENCES ads,
distance float NOT NULL
);

CREATE INDEX doc_ads_idx ON doc_ads(doc_id);

CREATE OR REPLACE FUNCTION link_doc_to_ads(doc id, threshold float)
RETURNS void AS $$
BEGIN
INSERT INTO doc_ads (doc_id, ad_id, distance)
SELECT doc, (t).ad_id, (t).distance
FROM (SELECT ads_within_distance(topics, threshold) AS t
FROM docs
WHERE id = doc) AS x;
ANALYZE doc_ads;
END;
$$ LANGUAGE plpgsql;

--------------------------------------------------------------------------
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
---------------------+------------+---------------------------------------
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
---------------------+------------+---------------------------------------

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2011-04-30 17:27:55 Re: stored proc and inserting hundreds of thousands of rows
Previous Message Kenneth Marshall 2011-04-30 14:34:21 Re: REINDEX takes half a day (and still not complete!)