Re: Strange Bitmapset manipulation in DiscreteKnapsack()

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Strange Bitmapset manipulation in DiscreteKnapsack()
Date: 2024-01-18 00:34:46
Message-ID: CAApHDvrhnHxqfW6L8MHYgvxfmJUky=9dYacW7y2BFbZ17rWFcw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 16 Jan 2024 at 16:32, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> While working on [1], I noticed some strange code in
> DiscreteKnapsack() which seems to be aiming to copy the Bitmapset.
>
> It's not that obvious at a casual glance, but:
>
> sets[j] = bms_del_members(sets[j], sets[j]);
>
> this is aiming to zero all the words in the set by passing the same
> set in both parameters.
>
> Now that 00b41463c changed Bitmapset to have NULL be the only valid
> representation of an empty set, this code no longer makes sense. We
> may as well just bms_free() the original set and bms_copy() in the new
> set as the bms_del_members() call will always pfree the set anyway.
>
> I've done that in the attached.
>
> I did consider if we might want bms_merge_members() or
> bms_exchange_members() or some other function suitably named function
> to perform a del/add operation, but given the lack of complaints about
> any performance regressions here, I think it's not worthwhile.

After looking at this again and reading more code, I see that
DiscreteKnapsack() goes to some efforts to minimise memory
allocations.

The functions's header comment mentions "The bitmapsets are all
pre-initialized with an unused high bit so that memory allocation is
done only once.".

I tried adding some debugging output to track how many additional
allocations we're now causing as a result of 00b41463c. Previously
there'd have just been max_weight allocations, but now there's those
plus the number that's mentioned for "frees" below.

NOTICE: DiscreteKnapsack: frees = 373, max_weight = 140, extra = 266.43%
NOTICE: DiscreteKnapsack: frees = 373, max_weight = 140, extra = 266.43%
NOTICE: DiscreteKnapsack: frees = 267, max_weight = 100, extra = 267.00%
NOTICE: DiscreteKnapsack: frees = 267, max_weight = 100, extra = 267.00%
NOTICE: DiscreteKnapsack: frees = 200, max_weight = 140, extra = 142.86%
NOTICE: DiscreteKnapsack: frees = 200, max_weight = 140, extra = 142.86%
NOTICE: DiscreteKnapsack: frees = 30, max_weight = 40, extra = 75.00%
NOTICE: DiscreteKnapsack: frees = 110, max_weight = 60, extra = 183.33%
NOTICE: DiscreteKnapsack: frees = 110, max_weight = 60, extra = 183.33%
NOTICE: DiscreteKnapsack: frees = 110, max_weight = 60, extra = 183.33%
NOTICE: DiscreteKnapsack: frees = 110, max_weight = 60, extra = 183.33%

and by the looks of the code, the worst case is much worse.

Given that the code original code was written in a very deliberate way
to avoid reallocations, I now think that we should maintain that.

I've attached a patch which adds bms_replace_members(). It's basically
like bms_copy() but attempts to reuse the member from another set. I
considered if the new function should be called bms_copy_inplace(),
but left it as bms_replace_members() for now.

Now I wonder if this should be backpatched to PG16.

David

Attachment Content-Type Size
fix_DiscreteKnapsack_v2.patch text/plain 2.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2024-01-18 00:56:39 Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Previous Message James Coleman 2024-01-18 00:34:36 Re: Teach predtest about IS [NOT] <boolean> proofs