Re: Performance query about large tables, lots of concurrent access

From: Karl Wright <kwright(at)metacarta(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance query about large tables, lots of concurrent access
Date: 2007-06-19 14:48:09
Message-ID: 4677ECA9.3000402@metacarta.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tom Lane wrote:
> Karl Wright <kwright(at)metacarta(dot)com> writes:
>> Also, as I said before, I have done extensive query analysis and found
>> that the plans for the queries that are taking a long time are in fact
>> very reasonable. Here's an example from the application log of a query
>> that took way more time than its plan would seem to indicate it should:
>
>> [2007-06-18 09:39:49,797]ERROR Plan: Index Scan using i1181764142395 on
>> intrinsiclink (cost=0.00..14177.29 rows=5 width=253)
>> [2007-06-18 09:39:49,797]ERROR Plan: Index Cond: ((jobid = $2) AND
>> ((childidhash)::text = ($3)::text))
>> [2007-06-18 09:39:49,797]ERROR Plan: Filter: ((childid = ($4)::text)
>> AND ((isnew = ($5)::bpchar) OR (isnew = ($6)::bpchar)))
>
> I see the discussion thread has moved on to consider lack-of-vacuuming
> as the main problem, but I didn't want to let this pass without
> comment. The above plan is not necessarily good at all --- it depends
> on how many rows are selected by the index condition alone (ie, jobid
> and childidhash) versus how many are selected by the index and filter
> conditions. If the index retrieves many rows, most of which are
> eliminated by the filter condition, it's still gonna take a long time.
>
> In this case it looks like the planner is afraid that that's exactly
> what will happen --- a cost of 14177 suggests that several thousand row
> fetches are expected to happen, and yet it's only predicting 5 rows out
> after the filter. It's using this plan anyway because it has no better
> alternative, but you should think about whether a different index
> definition would help.
>
> regards, tom lane
>

Well, that's odd, because the hash in question that it is using is the
SHA-1 hash of a URL. There's essentially one row per URL in this table.
Even with a large table I would not expect more than a couple of
collisions at most.

How does it arrive at that estimate of 14,000?

Karl

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Joshua_Kramer 2007-06-19 14:54:06 Re: [pgsql-advocacy] [PERFORM] Postgres VS Oracle
Previous Message Robert Treat 2007-06-19 14:41:29 Re: [GENERAL] [PERFORM] [ADMIN] Postgres VS Oracle