Re: Why query takes soo much time

From: Denis de Bernardy <ddebernardy(at)yahoo(dot)com>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Why query takes soo much time
Date: 2011-05-16 11:16:44
Message-ID: 630076.87313.qm@web112408.mail.gq1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

[big nestloop with a huge number of rows]

You're in an edge case, and I doubt you'll get things to run much faster: you want the last 1k rows out of an 18M row result set... It will be slow no matter what you do.

What the plan is currently doing, is it's going through these 18M rows using a for each loop, until it returns the 1k requested rows. Without the offset, the plan is absolutely correct (and quite fast, I take it). With the enormous offset, it's a different story as you've noted.

An alternative plan could have been to hash join the tables together, to sort the result set, and to apply the limit/offset on the resulting set. You can probably force the planner to do so by rewriting your statement using a with statement, too:

EXPLAIN ANALYZE
WITH rows AS (
SELECT c.clause, s.subject ,s.object , s.verb, s.subject_type, s.object_type ,s.doc_id ,s.svo_id 
FROM clause2 c INNER JOIN svo2 s ON (c.clause_id=s.clause_id AND c.source_id=s.doc_id AND c.sentence_id=s.sentence_id)
               INNER JOIN page_content p ON (s.doc_id=p.crawled_page_id)
)
SELECT *
FROM rows
ORDER BY svo_id limit 1000 offset 17929000

I've my doubts that it'll make much of a different, though: you'll still be extracting the last 1k rows out of 18M.

D

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message John Rouillard 2011-05-16 13:17:30 Re: Using pgiosim realistically
Previous Message Craig Ringer 2011-05-16 10:00:03 Re: Why query takes soo much time