Re: hashjoin chosen over 1000x faster plan

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: hashjoin chosen over 1000x faster plan
Date: 2007-10-10 14:15:02
Message-ID: 470C9813.EE98.0025.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

>>> On Wed, Oct 10, 2007 at 1:31 AM, in message
<1191997904(dot)4233(dot)125(dot)camel(at)ebony(dot)site>, Simon Riggs <simon(at)2ndquadrant(dot)com>
wrote:
> On Tue, 2007-10-09 at 15:09 -0500, Kevin Grittner wrote:
>
>> I have a situation where a query is running much slower than I would
>> expect. The ANALYZE showed that it is hashing some information which
>> is rarely needed. When I set enable_hashjoin = off for the
>> connection the query run in 1/1000 the time.
>
> Can you confirm the two queries give identical outputs?

I checked; the output is identical.

> It isn't clear
> to me why the second sort is (never executed) in your second plan, which
> I would only expect to see for an inner merge join.

I assume that is because there were no rows to sort. The
CaseTypeHistEvent view is only needed if there is a link to an event
which reopens the charge after it is disposed. This only happens for
about 1% of the Charge records.

The view is a bit weird, but evolved this way. Originally, there
was a table by that name which was maintained statewide by our
organization (the Consolidated Court Automation Programs, or CCAP).
Then there was a decision to allow counties to override certain
columns with their own values. Modifying the central copy and
merging the changes into 72 modified copies was nasty, so we split
the state-maintained portion and the county overrides into two tables
with a B (for Base) and D (for Distributed) suffix, and provided a
view to present the merged form to the existing queries. That way
only the software to maintain the data needed to be modified, rather
than all the references to it.

There aren't a lot of rows in the distributed table; most counties
take the defaults. The ControlRecord table is joined to the base to
show one row of the base data per county in the database. This
performance problem is on the central, consolidated copy of all 72
counties.

> Can you show the details for ControlRecord also.

bigbird=> \d "ControlRecord"
Table "public.ControlRecord"
Column | Type | Modifiers
--------------------+------------------------+-----------
countyNo | "CountyNoT" | not null
dispEventTime | boolean | not null
exportDeletes | boolean | not null
standAloneMode | boolean | not null
sysMailData | character(1) | not null
chargeClsEvent | "EventTypeT" |
checkPrinterDriver | character varying(50) |
cofcCtofcNo | "CtofcNoT" |
ctofcNo | "CtofcNoT" |
defaultDaAttyNo | "AttyNoT" |
districtNo | "DistrictNoT" |
dktFee | "MoneyT" |
dotCourtNo | character(8) |
initialTrafCal | "ActivityTypeT" |
maxToPrint | smallint |
postJdgmtStatus | "StatusCodeT" |
rcptPrinterDriver | character varying(50) |
savedTxtFilePath | character varying(120) |
scffAmt | "MoneyT" |
scsfAmt | "MoneyT" |
taxWarrantNo | "CountyNoT" |
dorAgencyNo | character(10) |
jurorMailerPrntDrv | character varying(50) |
calKioskMessage | "TextT" |
autoAssgnCaseEqual | boolean | not null
sectionLimit | integer | not null
sectionBufferLimit | integer | not null
calKioskKeyboard | character(1) |
saveCFRdoc | boolean |
showAudioRecTab | boolean |
weekdayStartTime | "TimeT" |
weekdayEndTime | "TimeT" |
saturdayStartTime | "TimeT" |
saturdayEndTime | "TimeT" |
sundayStartTime | "TimeT" |
sundayEndTime | "TimeT" |
reportStorageDays | integer |
Indexes:
"ControlRecord_pkey" PRIMARY KEY, btree ("countyNo")

-Kevin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Theo Kramer 2007-10-10 14:45:40 Performance problems with prepared statements
Previous Message Tomáš Vondra 2007-10-10 10:41:49 Re: SQL Monitoring