Re: same plan, add 1 condition, 1900x slower

From: Mitchell Skinner <mitch(at)egcrc(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Richard Huxton <dev(at)archonet(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: same plan, add 1 condition, 1900x slower
Date: 2005-11-11 16:57:35
Message-ID: 1131728255.10481.51.camel@firebolt
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, 2005-11-11 at 10:53 -0500, Tom Lane wrote:
> After re-reading your explanation of what you're doing with the data,
> I thought of a possible explanation. Is the "source" value exactly
> correlated with the external_id_map primary key?

Sort of. In this case, at the beginning of external_id_map, yes, though
further down the table they're not. For example, if we got new subjects
from 'SCH' at this point, they'd get assigned external_id_map.target_id
(the primary key) values that are totally unrelated to what the current
set are (the values in the external_id_map primary key just come off of
a sequence that we use for everything).

Right now though, since the 'SCH' data came in a contiguous chunk right
at the beginning and hasn't changed or grown since then, the correlation
is pretty exact, I think. It's true that there are no 'SCH' rows in the
table after the first contiguous set (when I get back to work I'll check
exactly what row that is). It's interesting that there are these
correlations in the the data that didn't exist at all in my mental
model.

> what you need to do is incorporate the "source" value into the
> external_id_map index key somehow. Then the index scan would be able to
> realize that there is no possibility of finding another row with source
> = 'SCH'. The simplest way is just to make a 2-column index

I thought that's what I had done with the
external_id_map_source_target_id index:

statgen=> \d util.external_id_map
Table "util.external_id_map"
Column | Type | Modifiers
-----------+-----------------------+-----------
source_id | character varying(32) | not null
source | character(3) | not null
target_id | bigint | not null
Indexes:
"external_id_map_primary_key" PRIMARY KEY, btree (target_id)
"external_id_map_source_source_id_unique" UNIQUE, btree (source,
source_id)
"external_id_map_source" btree (source)
"external_id_map_source_target_id" btree (source, target_id)
Foreign-key constraints:
"external_id_map_source" FOREIGN KEY (source) REFERENCES
util.source(id)

So if I understand your suggestion correctly, we're back to the "why
isn't this query using index foo" FAQ. For the external_id_map table,
the statistics target for "source" is 200; the other two columns are at
the default level because I didn't think of them as being very
interesting statistics-wise. I suppose I should probably go ahead and
raise the targets for every column of that table; I expect the planning
time is negligible, and our queries tend to be large data-wise. Beyond
that, I'm not sure how else to encourage the use of that index. If I
changed that index to be (target_id, source) would it make a difference?

Thanks for your help,
Mitch

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2005-11-11 17:16:33 Re: same plan, add 1 condition, 1900x slower
Previous Message Tom Lane 2005-11-11 15:53:14 Re: same plan, add 1 condition, 1900x slower