Re: Support loser tree for k-way merge

From: Sami Imseih <samimseih(at)gmail(dot)com>
To: cca5507 <cca5507(at)qq(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Support loser tree for k-way merge
Date: 2025-12-03 18:14:25
Message-ID: CAA5RZ0u=WDa-v+8Y_yE4WeRS8VtmcmQGuWVjZFCqZcXx-hEP3w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thanks for raising this.

> Now we use 'heap' during the k-way merge, it's O(n log k). The 'loser tree' is also O(n log k), but
> it's usually has fewer comparisons than the 'heap'. When the tuple comparator is complex, the
> 'loser tree' can significantly speed up the k-way merge.

I was playing around with this and I do also see the performance
benefits that you
mention with loser tree.

```
DROP TABLE IF EXISTS t;
CREATE TABLE t (
col1 INT,
col2 INT
);

INSERT INTO t (col1, col2)
SELECT
(i % 10000000),
(i % 100)
FROM generate_series(1, 10000000) AS s(i);
```

Using something like the above, testing an "external merge" on the
high cardinality
column there is a bit of an improvement with loser tree, but there is
a bit of a loss on
low-cardinality.

In general though, I see a bit of an issue with allowing a GUC to
change strategies. What happens if there are multiple "external merge"
nodes and they can each benefit from different strategies? Maybe that's
not an issue since all the other enable_ GUCs have that problem, but
it is something to consider.

Can we drive the decision for what to do based on optimizer
stats, i.e. n_distinct and row counts? Not sure what the calculation would
be specifically, but something else to consider.

We can still provide the GUC to override the optimizer decisions,
but at least the optimizer, given up-to-date stats, may get it right most
of the time.

--
Sami Imseih
Amazon Web Services (AWS)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kirill Reshke 2025-12-03 18:17:12 Re: [PATCH] Add enable_copy_program GUC to control COPY PROGRAM
Previous Message Zsolt Parragi 2025-12-03 18:06:51 Proposal: Add a callback data parameter to GetNamedDSMSegment