Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

From: Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: [PATCH] Use optimized single-datum tuplesort in ExecSort
Date: 2021-07-06 13:36:10
Message-ID: CAEudQAp7rvXHU=-OATqVe9U+tMXNhdvBzV50EMcKgMFW9agvCQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Em ter., 6 de jul. de 2021 às 10:19, James Coleman <jtc331(at)gmail(dot)com>
escreveu:

> Adding David since this patch is likely a precondition for [1].
>
> On Tue, Jul 6, 2021 at 2:15 AM Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
> wrote:
> >
> > Hello,
> >
> > While testing the patch "Add proper planner support for ORDER BY /
> DISTINCT
> > aggregates" [0] I discovered the performance penalty from adding a sort
> node
> > essentially came from not using the single-datum tuplesort optimization
> in
> > ExecSort (contrary to the sorting done in ExecAgg).
> >
> > I originally proposed this patch as a companion in the same thread [1],
> but
> > following James suggestion I'm making a separate thread just for this as
> the
> > optimization is worthwhile independently of David's patch: it looks like
> we
> > can expect a 2x speedup on a "select a single ordered column" case.
> >
> > The patch aimed to be as simple as possible: we only turn this
> optimization on
> > when the tuple being sorted has only one attribute, it is "byval" (so as
> not
> > to incur copies which would be hard to track in the execution tree) and
> > unbound (again, not having to deal with copying borrowed datum anywhere).
>
> Thanks again for finding this and working up a patch.
>
> I've taken a look, and while I haven't dug into testing it yet, I have
> a few comments.
>
> First, the changes are lacking any explanatory comments. Probably we
> should follow how nodeAgg does this and add both comments to the
> ExecSort function header as well as specific comments above the "if"
> around the new tuplesort_begin_datum explaining the specific
> conditions that are required for the optimization to be useful and
> safe.
>
> That leads to a question I had: I don't follow why bounded mode (when
> using byval) needs to be excluded. Comments should be added if there's
> a good reason (as noted above), but maybe it's a case we can handle
> safely?
>
> A second question: at first glance it's intuitively the case we might
> not be able to handle byref values. But nodeAgg doesn't seem to have
> that restriction. What's the difference here?
>
> A few small code observations:
> - In my view the addition of unlikely() in ExecSort is unlikely to be
> of benefit because it's a single call for the entire node's execution
> (not in the tuple loop).
>
No objection. And I agree that testing is complex and needs to remain as it
is.

- It seems clearer to change the "if (!node->is_single_val)" to flip
> the true/false cases so we don't need the negation.
>
I think yes, it can be better.

regards,
Ranier Vilela

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2021-07-06 13:39:38 Re: [PATCH] Use optimized single-datum tuplesort in ExecSort
Previous Message Stephen Frost 2021-07-06 13:34:58 Re: .ready and .done files considered harmful