Re: Proposal: Improve bitmap costing for lossy pages

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Alexander Kumenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Improve bitmap costing for lossy pages
Date: 2017-08-29 20:30:20
Message-ID: CA+TgmoboNGVJxxea8wfpWhsfxQ1-qPWJ-5eOhZaf9y_GeJoC2A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 29, 2017 at 1:08 AM, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> (Time in ms)
> Query head patch
>
> 6 23819 14571
> 14 13514 11183
> 15 49980 32400
> 20 204441 188978

These are cool results, but this patch is obviously not ready for
prime time as-is, since there are various other references that will
need to be updated:

* Since we are called as soon as nentries exceeds maxentries, we should
* push nentries down to significantly less than maxentries, or else we'll
* just end up doing this again very soon. We shoot for maxentries/2.

/*
* With a big bitmap and small work_mem, it's possible that we cannot get
* under maxentries. Again, if that happens, we'd end up uselessly
* calling tbm_lossify over and over. To prevent this from becoming a
* performance sink, force maxentries up to at least double the current
* number of entries. (In essence, we're admitting inability to fit
* within work_mem when we do this.) Note that this test will not fire if
* we broke out of the loop early; and if we didn't, the current number of
* entries is simply not reducible any further.
*/
if (tbm->nentries > tbm->maxentries / 2)
tbm->maxentries = Min(tbm->nentries, (INT_MAX - 1) / 2) * 2;

I suggest defining a TBM_FILLFACTOR constant instead of repeating the
value 0.9 in a bunch of places. I think it would also be good to try
to find the sweet spot for that constant. Making it bigger reduces
the number of lossy entries created, but making it smaller reduces
the number of times we have to walk the bitmap. So if, for example,
0.75 is sufficient to produce almost all of the gain, then I think we
would want to prefer 0.75 to 0.9. But if 0.9 is better, then we can
stick with that.

Note that a value higher than 0.9375 wouldn't be sane without some
additional safety precautions because maxentries could be as low as
16.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Dolgov 2017-08-29 20:42:04 Re: [PATCH] Generic type subscripting
Previous Message Michael Paquier 2017-08-29 20:14:33 Re: Improving overflow checks when adding tuple to PGresult Re: [GENERAL] Retrieving query results