Re: Subselect query enhancement

From: Richard Huxton <dev(at)archonet(dot)com>
To: Michael Artz <mlartz(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Subselect query enhancement
Date: 2007-02-01 19:02:40
Message-ID: 45C23950.4070803@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Michael Artz wrote:
>> > I have primary table that holds ip information
>> > and two other tables that hold event data for the specific IP in with
>> > a one-to-many mapping between them, ie:
>> [snip]
>> > There is quite a bit of commonality between the network_events and
>> > host_events schemas, but they do not currently share an ancestor.
>> > ip_info has about 13 million rows, the network_events table has about
>> > 30 million rows, and the host_events table has about 7 million rows.
>> > There are indexes on all the rows.
>>
>> What indexes though. Do you have (name,ip) on the two event tables?
>
> All the columns are indexed individually. The tables are completely
> static, as I reload the whole DB with new data every day.

The point of a (name,ip) index would be to let you read off ip numbers
in order easily.

>> How selective is "name" - are there many different values or just a few?
>> If lots, it might be worth increasing the statistics gathered on that
>> column (ALTER COLUMN ... SET STATISTICS).
>> http://www.postgresql.org/docs/8.2/static/sql-altertable.html
>
> I guess that is the heart of my question. "name" is not very
> selective (there are only 20 or so choices) however other columns are
> fairly selective for certain cases, such as 'port'. When querying on
> and unusual port, the query is very fast, and the single UNIONed
> subselect returns quickly. When 'port' is not very selective (like
> port = '80', which is roughly 1/2 of the rows in the DB), the dual
> subselect query wins, hands-down.
>
> And I have altered the statistics via the config file:
> default_statistics_target = 100
> Perhaps this should be even higher for certain columns?

You're probably better off leaving it at 10 and upping it for the vital
columns. 25 for names should be a good choice.

You could try partial indexes for those cases where you have
particularly common values of name/port:

CREATE INDEX idx1 ON host_events (ip) WHERE port=80;

--
Richard Huxton
Archonet Ltd

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Michael Artz 2007-02-01 19:06:23 Re: Subselect query enhancement
Previous Message Michael Artz 2007-02-01 18:27:32 Re: Subselect query enhancement