Re: Optimize planner memory consumption for huge arrays

From: "Lepikhov Andrei" <a(dot)lepikhov(at)postgrespro(dot)ru>
To: "Ashutosh Bapat" <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Евгений Бредня <e(dot)brednya(at)postgrespro(dot)ru>
Subject: Re: Optimize planner memory consumption for huge arrays
Date: 2023-09-08 05:11:31
Message-ID: 1108a71a-e65e-41a5-81ab-beccc80c3628@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 6, 2023, at 8:09 PM, Ashutosh Bapat wrote:
> Hi Lepikhov,
>
> Thanks for using my patch and I am glad that you found it useful.
>
> On Mon, Sep 4, 2023 at 10:56 AM Lepikhov Andrei
> <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>>
>> Hi, hackers,
>>
>> Looking at the planner behaviour with the memory consumption patch [1], I figured out that arrays increase memory consumption by the optimizer significantly. See init.sql in attachment.
>> The point here is that the planner does small memory allocations for each element during estimation. As a result, it looks like the planner consumes about 250 bytes for each integer element.
>
> I guess the numbers you mentioned in init.sql are total memory used by
> the planner (as reported by the patch in the thread) when planning
> that query and not memory consumed by Const nodes themselves. Am I
> right? I think the measurements need to be explained better and also
> the realistic scenario you are trying to oprimize.

Yes, it is the total memory consumed by the planner - I used the numbers generated by your patch [1]. I had been increasing the number of elements in the array to exclude the memory consumed by the planner for other purposes. As you can see, the array with 1 element consumes 12kB of memory, 1E4 elements - 2.6 MB. All of that memory increment is related to the only enlargement of this array. (2600-12)/10 = 260 bytes. So, I make a conclusion: each 4-byte element produces a consumption of 260 bytes of memory.
This scenario I obtained from the user complaint - they had strict restrictions on memory usage and were stuck in this unusual memory usage case.

> I guess, the reason you think that partitioning will increase the
> memory consumed is because each partition will have the clause
> translated for it. Selectivity estimation for each partition will
> create those many Const nodes and hence consume memory. Am I right?

Yes.

> Can you please measure the memory consumed with and without your
> patch.

Done. See test case and results in 'init_parts.sql' in attachment. Short summary below. I varied a number of elements from 1 to 10000 and partitions from 1 to 100. As you can see, partitioning adds a lot of memory consumption by itself. But we see an effect from patch also.

master:
elems 1 1E1 1E2 1E3 1E4
parts
1 28kB 50kB 0.3MB 2.5MB 25MB
10 45kB 143kB 0.6MB 4.8MB 47MB
100 208kB 125kB 3.3MB 27MB 274MB

patched:
elems 1 1E1 1E2 1E3 1E4
parts
1 28kB 48kB 0.25MB 2.2MB 22.8MB
10 44kB 100kB 313kB 2.4MB 23.7MB
100 208kB 101kB 0.9MB 3.7MB 32.4MB

Just for comparison, without partitioning:
elems 1 1E1 1E2 1E3 1E4
master: 12kB 14kB 37kB 266kB 2.5MB
patched: 12kB 11.5kB 13kB 24kB 141kB

>> It is maybe not a problem most of the time. However, in the case of partitions, memory consumption multiplies by each partition. Such a corner case looks weird, but the fix is simple. So, why not?
>
> With vectorized operations becoming a norm these days, it's possible
> to have thousands of element in array of an ANY or IN clause. Also
> will be common to have thousands of partitions. But I think what we
> need to do here is to write a selectivity estimation function which
> takes an const array and return selectivity without requiring to
> create a Const node for each element.

Maybe you're right. Could you show any examples of vectorized usage of postgres to understand your idea more clearly?
Here I propose only quick simple solution. I don't think it would change the way of development.

>> The diff in the attachment is proof of concept showing how to reduce wasting of memory. Having benchmarked a bit, I didn't find any overhead.
>>
>
> You might want to include your benchmarking results as well.

Here is nothing interesting. pgbench TPS and planning time for the cases above doesn't change planning time.

[1] Report planning memory in EXPLAIN ANALYZE

--
Regards,
Andrei Lepikhov

Attachment Content-Type Size
init_parts.sql application/sql 1.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2023-09-08 05:17:16 Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows
Previous Message Andres Freund 2023-09-08 04:45:22 Re: Eager page freeze criteria clarification