Re: A potential memory leak on Merge Join when Sort node is not below Materialize node

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: A potential memory leak on Merge Join when Sort node is not below Materialize node
Date: 2022-09-29 01:12:44
Message-ID: CAApHDvpxQsTX0rYceTQYeSqUZzpyFQkFVKOV=UUB68=NLcJAFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 29 Sept 2022 at 12:07, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> Also potentially relevant: the 2017 commit fa117ee4 anticipated adding
> a "copy" argument to tuplesort_getdatum() (the same commit added such
> a "copy" argument to tuplesort_gettupleslot()). I see that that still
> hasn't happened to tuplesort_getdatum() all these years later. Might
> be a good idea to do it in the next year or two, though.
>
> If David is interested in pursuing this now then I certainly won't object.

Just while this is fresh in my head, I wrote some code to make this
happen. My preference would be not to add the "copy" param to the
existing function and instead just add a new function to prevent
additional branching.

The attached puts back the datum sort in nodeSort.c for byref types
and adjusts process_ordered_aggregate_single() to make use of this
function.

I did a quick benchmark to see if this help DISTINCT aggregate any:

create table t1 (a varchar(32) not null, b varchar(32) not null);
insert into t1 select md5((x%10)::text),md5((x%10)::text) from
generate_Series(1,1000000)x;
vacuum freeze t1;
create index on t1(a);

With a work_mem of 256MBs I get:

query = select max(distinct a), max(distinct b) from t1;

Master:
latency average = 313.197 ms

Patched:
latency average = 304.335 ms

So not a very impressive speedup there (about 3%)

Some excerpts from perf top show:

Master:
1.40% postgres [.] palloc
1.13% postgres [.] tuplesort_getdatum
0.77% postgres [.] datumCopy

Patched:
0.91% postgres [.] tuplesort_getdatum_nocopy
0.65% postgres [.] palloc

I stared for a while at the mode_final() function and thought maybe we
could use the nocopy variant there. I just didn't quite pluck up the
motivation to write any code to see if it could be made faster.

David

Attachment Content-Type Size
add_tuplesort_getdatum_nocopy_function.patch text/plain 5.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2022-09-29 01:31:42 Re: A potential memory leak on Merge Join when Sort node is not below Materialize node
Previous Message Zhang Mingli 2022-09-29 01:05:18 Re: Refactor UnpinBuffer()