Re: Poor performance when joining against inherited tables

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Lucas Madar <madar(at)samsix(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Poor performance when joining against inherited tables
Date: 2011-06-30 19:07:12
Message-ID: BANLkTikkxy322Ws60X+nrLN5+su+DAsS7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, May 11, 2011 at 4:47 PM, Lucas Madar <madar(at)samsix(dot)com> wrote:
> On 05/11/2011 09:38 AM, Robert Haas wrote:
>>>
>>> However, if I disable seqscan (set enable_seqscan=false), I get the
>>> following plan:
>>>
>>>  QUERY PLAN
>>> ------------
>>>  Hash Join  (cost=10001298843.53..290002337961.71 rows=8643757 width=506)
>>>   Hash Cond: (f.id = objects.id)
>>>   ->    Append  (cost=10000000000.00..290000536334.43 rows=8643757
>>> width=20)
>>>         ->    Seq Scan on item f  (cost=10000000000.00..10000000026.30
>>> rows=1630 width=20)
>>>         ->    Index Scan using xxx_pkey on itemXX f  (cost=0.00..10.60
>>> rows=90
>>> width=20)
>>>         ->    Index Scan using yyy_pkey on itemYY f  (cost=0.00..25.24
>>> rows=266 width=20)
>>>         ->    Index Scan using zzz_pkey on itemZZ f  (cost=0.00..9.28
>>> rows=2
>>> width=20)
>>>         ...
>>>   ->    Hash  (cost=999347.17..999347.17 rows=3941949 width=490)
>>>         ->    Index Scan using objects_pkey on objects
>>> (cost=0.00..999347.17
>>> rows=3941949 width=490)
>>>
>>> This seems like a much more sensible query plan.
>>
>> I don't think so.  Scanning the index to extract all the rows in a
>> table is typically going to be a lot slower than a sequential scan.
>>
>
> Compared to the previous query plan (omitted in this e-mail, in which the
> planner was scanning all the item tables sequentially), the second query is
> much more desirable. It takes about 12 seconds to complete, versus the other
> query which I canceled after six hours. However, what you propose seems to
> make even more sense.

I was just looking at this email again, and had another thought:
perhaps the tables in question are badly bloated. In your situation,
it seems that the plan didn't change much when you set
enable_seqscan=off: it just replaced full-table seq-scans with
full-table index-scans, which should be slower. But if you have a
giant table that's mostly empty space, then following the index
pointers to the limited number of blocks that contain any useful data
might be faster than scanning all the empty space. If you still have
these tables around somewhere, you could test this hypothesis by
running CLUSTER on all the tables and see whether the seq-scan gets
faster.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Samuel Gendler 2011-06-30 20:34:52 Re: near identical queries have vastly different plans
Previous Message Greg Spiegelberg 2011-06-30 12:37:24 Re: is parallel union all possible over dblink?