Re: Proposal: Improve bitmap costing for lossy pages

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Improve bitmap costing for lossy pages
Date: 2017-10-12 04:28:25
Message-ID: CAFiTN-sXB6RZWhaYySpFrPJe3-9f0KBu+zAu4OOGDM5_q+pxqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 6, 2017 at 9:21 PM, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> On Fri, Oct 6, 2017 at 7:24 PM, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>> On Fri, Oct 6, 2017 at 6:08 PM, Alexander Kuzmenkov
>> <a(dot)kuzmenkov(at)postgrespro(dot)ru> wrote:
>>>
>>>> Analysis: The estimated value of the lossy_pages is way higher than
>>>> its actual value and reason is that the total_pages calculated by the
>>>> "Mackert and Lohman formula" is not correct.
>>>
>>>
>>> I think the problem might be that the total_pages includes cache effects and
>>> rescans. For bitmap entries we should use something like relation pages *
>>> selectivity.
>>
>> I have noticed that for the TPCH case if I use "pages * selectivity"
>> it give me better results, but IMHO directly multiplying the pages
>> with selectivity may not be the correct way to calculate the number of
>> heap pages it can only give the correct result when all the TID being
>> fetched are clustered. But on the other hand "Mackert and Lohman
>> formula" formulae consider that all the TID's are evenly distributed
>> across the heap pages which can also give the wrong estimation like we
>> are seeing in our TPCH case.
>
> I agree with the point that the total_pages included the cache effects
> and rescan when loop_count > 1, that can be avoided if we always
> calculate heap_pages as it is calculated in the else part
> (loop_count=0). Fortunately, in all the TPCH query plan what I posted
> up thread bitmap scan was never at the inner side of the NLJ so
> loop_count was always 0. I will fix this.

I have fixed the issue. Now, for calculating the lossy pages it will
not consider the rescan. As mentioned above it will not affect the
TPCH plan so haven't measured the performance again.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
improve_bitmap_cost_v6.patch application/octet-stream 4.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-10-12 04:42:54 Re: BUG #14849: jsonb_build_object doesn't like VARIADIC calls very much
Previous Message Tatsuo Ishii 2017-10-12 04:11:45 Re: [PATCH] Lockable views