Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: [PATCH] Use optimized single-datum tuplesort in ExecSort
Date: 2021-07-06 13:39:38
Message-ID: CAFiTN-v=sBvLz6r_RWUC942vTMj4758hzUuR+WKGrvpmWniU8g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 6, 2021 at 6:49 PM James Coleman <jtc331(at)gmail(dot)com> wrote:
>
> Adding David since this patch is likely a precondition for [1].
>
> On Tue, Jul 6, 2021 at 2:15 AM Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io> wrote:
> >
> > Hello,
> >
> > While testing the patch "Add proper planner support for ORDER BY / DISTINCT
> > aggregates" [0] I discovered the performance penalty from adding a sort node
> > essentially came from not using the single-datum tuplesort optimization in
> > ExecSort (contrary to the sorting done in ExecAgg).
> >
> > I originally proposed this patch as a companion in the same thread [1], but
> > following James suggestion I'm making a separate thread just for this as the
> > optimization is worthwhile independently of David's patch: it looks like we
> > can expect a 2x speedup on a "select a single ordered column" case.
> >
> > The patch aimed to be as simple as possible: we only turn this optimization on
> > when the tuple being sorted has only one attribute, it is "byval" (so as not
> > to incur copies which would be hard to track in the execution tree) and
> > unbound (again, not having to deal with copying borrowed datum anywhere).
>
> Thanks again for finding this and working up a patch.
>
> I've taken a look, and while I haven't dug into testing it yet, I have
> a few comments.
>
> First, the changes are lacking any explanatory comments. Probably we
> should follow how nodeAgg does this and add both comments to the
> ExecSort function header as well as specific comments above the "if"
> around the new tuplesort_begin_datum explaining the specific
> conditions that are required for the optimization to be useful and
> safe.
>
> That leads to a question I had: I don't follow why bounded mode (when
> using byval) needs to be excluded. Comments should be added if there's
> a good reason (as noted above), but maybe it's a case we can handle
> safely?
>
> A second question: at first glance it's intuitively the case we might
> not be able to handle byref values. But nodeAgg doesn't seem to have
> that restriction. What's the difference here?
>

I think tuplesort_begin_datum, doesn't have any such limitation, it
can handle any type of Datum so I think we don't need to consider the
only attbyval, we can consider any type of attribute for this
optimization.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2021-07-06 14:07:33 Re: proposal - log_full_scan
Previous Message Ranier Vilela 2021-07-06 13:36:10 Re: [PATCH] Use optimized single-datum tuplesort in ExecSort