Re: How best to index foreign key for inner join

From: Thom Brown <thombrown(at)gmail(dot)com>
To: Nathaniel Trellice <naptrel(at)yahoo(dot)co(dot)uk>
Cc: "pgsql-novice(at)postgresql(dot)org" <pgsql-novice(at)postgresql(dot)org>
Subject: Re: How best to index foreign key for inner join
Date: 2009-11-27 15:16:23
Message-ID: bddc86150911270716lc2993efuac35a30d93d53821@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

2009/11/27 Nathaniel Trellice <naptrel(at)yahoo(dot)co(dot)uk>

> On 27 Nov 2009, at 11:12, Thom Brown <thombrown(at)gmail(dot)com> wrote:
>
> Since for each row of table1 you'll be looking for multiple records on
> table2, I imagine you'd want to index table2.table1_id, and table2.t since
> that's what you're filtering on. If table1 is particularly big, you might
> benefit from also indexing <http://table1.name>table1.name.
>
> I personally prefer to use INNER JOIN syntax as it's clearer, although the
> query planner will probably be identical as it's clever that way:
>
> SELECT table2.x, table2.y, table2.t
> FROM table1
> INNER JOIN table2 ON table1.id = table2.table1_id
> AND <http://table1.name>table1.name = 'some_name'
> WHERE table2.t BETWEEN some_t AND some_other_t;
>
>
> Thanks for the help Thom.
>
> Do you think it's possible to phrase this query in such a way that the
> planner would be able to exploit any benefits from a mutli-column index on
> table2 on either (table1_id, t) or (t, table1_t)?
>
> If you imagine setting up a view that encapsulates the inner join, creating
> a 'virtual' table with columns:
>
> name text
> x real
> y real
> t timestampz
>
> Then the manual (11.3) suggests that ANDed queries on 'name' and 't' will
> be improved by using a multi-column index (name, t).
>
> But the join confuses me. When the rule system breaks down both the query
> and the view's join, will the benefits of the multi-column index still be
> realised?
>
>
> Nathaniel
>
>
Again, it really depends on the size of your tables. If, for example,
table2 contains 5 million rows and 99% of them have the same t value, and
index probably won't be used by the planner since it's value distribution in
the statistics data for that table (gathered by ANALYZE) would suggest a
sequential scan is just as efficient. If you have a lot of varying
combinations between table2.table1_id and table2.t, a multi-column index
will probably benefit you.

A good way to tell how much work is involved in running a query is to put
EXPLAIN ANALYZE before it to see where the more expensive parts of the query
are.

There are many instances where indexes won't be used by the planner purely
because it's statistics tell it that's it's more or equally efficient to a
sequential scan. You need to know your data well and take the size of the
table into account.

I'm hoping someone with better query-planner knowledge can chime in here
with more solid recommendations though.

Thom

In response to

Browse pgsql-novice by date

  From Date Subject
Next Message Jean-Yves F. Barbier 2009-11-27 15:18:55 views
Previous Message Jean-Yves F. Barbier 2009-11-27 15:04:44 Re: index speed-up and automatic tables/procedures creation