Re: Proposal: Improve bitmap costing for lossy pages

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alexander Kumenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Improve bitmap costing for lossy pages
Date: 2017-10-06 06:12:35
Message-ID: CAFiTN-vreUZ9oALKjH76-UNBBqk-ahEg43iAGQTomKBt51aPNg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 5, 2017 at 8:15 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sun, Sep 17, 2017 at 7:04 AM, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>> I used lossy_pages = max(0, total_pages - maxentries / 2). as
>> suggesed by Alexander.
>
> Does that formula accurately estimate the number of lossy pages?

I have printed the total_pages, exact_pages and lossy_pages during
planning time, and for testing purpose, I tweak the code a bit so that
it doesn't consider lossy_pages in cost calculation (same as base
code).

I have tested TPCH scale factor 20. at different work_mem(4MB, 20MB,
64MB) and noted down the estimated pages vs actual pages.

Analysis: The estimated value of the lossy_pages is way higher than
its actual value and reason is that the total_pages calculated by the
"Mackert and Lohman formula" is not correct.

work_mem=4 MB

query:4
estimated: total_pages=552472.000000 exact_pages=32768.000000
lossy_pages=519704.000000
actual: exact=18548 lossy=146141

query:6
estimated: total_pages=1541449.000000 exact_pages=32768.000000
lossy_pages=1508681.000000
actual: exact=13417 lossy=430385

query:8
estimated: total_pages=552472.000000 exact_pages=32768.000000
lossy_pages=519704.000000
actual: exact=56869 lossy=495603

query:14
estimated: total_pages=1149603.000000 exact_pages=32768.000000
lossy_pages=1116835.000000
actual: exact=17115 lossy=280949

work_mem: 20 MB
query:4
estimated: total_pages=552472.000000 exact_pages=163840.000000
lossy_pages=388632.000000
actual: exact=109856 lossy=57761

query:6
estimated: total_pages=1541449.000000 exact_pages=163840.000000
lossy_pages=1377609.000000
actual: exact=59771 lossy=397956

query:8
estimated: total_pages=552472.000000 exact_pages=163840.000000
lossy_pages=388632.000000
actual: Heap Blocks: exact=221777 lossy=330695

query:14
estimated: total_pages=1149603.000000 exact_pages=163840.000000
lossy_pages=985763.000000
actual: exact=63381 lossy=235513

work_mem:64 MB
query:4
estimated: total_pages=552472.000000 exact_pages=552472.000000
lossy_pages=0.000000
actual: exact=166005 lossy=0

query:6
estimated: total_pages=1541449.000000 exact_pages=524288.000000
lossy_pages=1017161.000000
actual: exact=277717 lossy=185919

query:8
estimated: total_pages=552472.000000 exact_pages=552472.000000
lossy_pages=0.000000
actual: exact=552472 lossy=0

query:14
estimated: total_pages=1149603.000000 exact_pages=524288.000000
lossy_pages=625315.000000
actual: exact=309091 lossy=0

>
> The performance results look good, but that's a slightly different
> thing from whether the estimate is accurate.
>
> + nbuckets = tbm_calculate_entires(maxbytes);
>
> entires?

changed to
+ tbm->maxentries = (int) tbm_calculate_entires(maxbytes);

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
improve_bitmap_cost_v4.patch application/octet-stream 3.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Khandekar 2017-10-06 06:33:59 Re: Parallel Append implementation
Previous Message Noah Misch 2017-10-06 06:03:53 Re: Still another race condition in recovery TAP tests