Re: wildcard text filter switched to boolean column, performance is way worse

From: Mike Broers <mbroers(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: wildcard text filter switched to boolean column, performance is way worse
Date: 2015-07-07 16:28:17
Message-ID: CAB9893ihUBFE05hu7ic3TJSo2GMr0erHeAEGBj6DuHzHZ+-b3w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

After bumping up work_mem from 12MB to 25MB that last materialize is indeed
hashing and this cut the query time by about 60%. Thanks, this was very
helpful and gives me something else to look for when troubleshooting
explains.

On Tue, Jul 7, 2015 at 11:10 AM, Mike Broers <mbroers(at)gmail(dot)com> wrote:

> Thanks, very informative! I'll experiment with work_mem settings and
> report back.
>
> On Tue, Jul 7, 2015 at 11:02 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Mike Broers <mbroers(at)gmail(dot)com> writes:
>> > I had a query that was filtering with a wildcard search of a text field
>> for
>> > %SUCCESS%. The query took about 5 seconds and was running often so I
>> wanted
>> > to improve it. I suggested that the engineers include a new boolean
>> column
>> > for successful status. They implemented the requested field, but the
>> query
>> > that filters on that new column runs very long (i kill it after letting
>> it
>> > run for about an hour). Can someone help me understand why that is the
>> > case and how to resolve it?
>>
>> It's hashing the subplan output in the first case and not the second:
>>
>> > Seq Scan on lead (cost=130951.81..158059.21 rows=139957 width=369)
>> (actual
>> > time=4699.619..4699.869 rows=1 loops=1)
>> > Filter: ((NOT (hashed SubPlan 1)) AND (("ReferenceNumber")::text <>
>> > ''::text) AND ((NOT (hashed SubPlan 2)) OR (NOT (hashed SubPlan 3))))
>> ^^^^^^^^^^^^^^^^
>> vs
>>
>> > Seq Scan on lead (cost=85775.78..9005687281.12 rows=139957
>> width=369)
>> > Filter: ((NOT (hashed SubPlan 1)) AND (("ReferenceNumber")::text <>
>> > ''::text) AND ((NOT (hashed SubPlan 2)) OR (NOT (SubPlan 3))))
>> ^^^^^^^^^
>>
>> Presumably, the new more-accurate rows count causes the planner to realize
>> that the hash table will exceed work_mem so it doesn't choose to hash ...
>> but for your situation, you'd rather it did, because what you're getting
>> instead is a Materialize node that spills to disk (again, because the data
>> involved exceeds work_mem) and that's a killer for this query. You should
>> be able to get back the old behavior if you raise work_mem enough.
>>
>> Another idea you might think about is changing the OR'd IN conditions
>> to a single IN over a UNION ALL of the subselects. I'm not really sure if
>> that would produce a better plan, but it's worth trying if it wouldn't
>> require too much app-side contortion.
>>
>> regards, tom lane
>>
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2015-07-07 16:35:38 Re: New server: SSD/RAID recommendations?
Previous Message Vitalii Tymchyshyn 2015-07-07 16:27:43 Re: New server: SSD/RAID recommendations?