From: | Robert Klemme <shortcutter(at)googlemail(dot)com> |
---|---|
To: | Thomas Kappler <tkappler(at)googlemail(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Slow query with self-join, group by, 100m rows |
Date: | 2011-09-21 14:51:26 |
Message-ID: | CAM9pMnPf+TMSpstyQXydVNShtA95Ajd=GWufja4K0KDQe__axg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Tue, Sep 20, 2011 at 7:43 PM, Thomas Kappler <tkappler(at)googlemail(dot)com> wrote:
> [please CC, I'm not on the list]
>
> Hi all,
>
> we have one table that basically uses Postgres as a key-value store.
>
> Table "public.termindex"
> Column | Type | Modifiers
> -------------+---------+-----------
> subject_id | integer |
> indextype | integer |
> cid | integer |
>
> This is with Postgres 9.0.
>
> The table has 96 million rows and an index on each column. It contains
> no NULLs and has no triggers.
>
> subject_id has about 2m distinct values, cid about 200k, and indextype only six.
>
> The table is *read-only* after the initial load.
>
> The query we want to do is (with example values):
>
> select t.cid, count(distinct t1.subject_id)
> from termindex t1, termindex t2
> where t1.cid=20642 and t1.indextype=636 and t2.indextype=636 and
> t2.subject_id=t1.subject_id
> group by t2.cid;
Do you have any multi column indexes? From the text of your query it
seems it could benefit from these two indexes:
(cid, indextype)
(subject_id, indextype)
I do not know whether PostgreSQL can avoid the table if you make the first index
(cid, indextype, subject_id)
in other words: append all the columns needed for the join. In theory
the query could then be satisfied from the indexes.
> Pasting the explain analyze output into
> http://explain.depesz.com/s/Yr4 we see that Postgres is doing an
> external sort using about 150MB of data.
>
> Now, we're not Postgres experts, or even great at relational design.
> Are there better ways of doing that query, or designing the table? For
> the latter we do have a number of constraints, though, that I don't
> want to outline now because this mail is already long enough.
Those are probably important to know.
Kind regards
robert
--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Shaun Thomas | 2011-09-21 14:55:38 | Re: REINDEX not working for wastedspace |
Previous Message | Grzegorz Jaśkiewicz | 2011-09-21 14:49:54 | Re: Prepared statements and suboptimal plans |