Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

From: James Coleman <jtc331(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: [PATCH] Use optimized single-datum tuplesort in ExecSort
Date: 2021-07-19 13:10:46
Message-ID: CAAaqYe9SrGK3kWE7O4kX0Q68vaDk=YwcgvcaF+5QjhAYfsn-MA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jul 17, 2021 at 4:36 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> On Sat, 17 Jul 2021 at 01:14, James Coleman <jtc331(at)gmail(dot)com> wrote:
> > The only remaining question I have is whether or not costing needs to
> > change, given the very significant speedup for datum sort.
>
> I'm looking at cost_tuplesort and the only thing that I think might
> make sense would be to adjust how the input_bytes value is calculated.
> For now, that's done with the following function that's used in quite
> a number of places.
>
> static double
> relation_byte_size(double tuples, int width)
> {
> return tuples * (MAXALIGN(width) + MAXALIGN(SizeofHeapTupleHeader));
> }
>
> It seems, at least in the case of Sort, that using SizeofHeapTupleHead
> is just always wrong as it should be SizeofMinimalTupleHeader. I know
> that's also the case for Memoize too. I've not checked the other
> locations.
>
> The only thing I can really see that we might do would be not add the
> MAXALIGN(SizeofHeapTupleHeader) when there's just a single column.
> We'd need to pass down the number of attributes from
> create_sort_path() so we'd know when and when not to add that. I'm not
> saying that we should do this. I'm just saying that I don't really see
> what else we might do.
>
> I can imagine another patch might just want to do a complete overhaul
> of all locations that use relation_byte_size(). There are various
> things that function just does not account for. e.g, the fact that we
> allocate chunks in powers of 2 and that there's a chunk header added
> on. Of course, "width" is just an estimate, so maybe trying to
> calculate something too precisely wouldn't be too wise. However,
> there's a bit of a chicken and the egg problem there as there'd be
> little incentive to improve "width" unless we started making more
> accurate use of the value.
>
> Anyway, none of the above take into account that the Datum sort is
> just a little faster, The only thing that exists in the existing cost
> modal that we could use to adjust the cost of an in memory sort is the
> comparison_cost. The problem there is that the comparison is exactly
> the same in both Datum and Tuple sorts. The only thing that really
> changes between Datum and Tuple sort is the fact that we don't make a
> MinimalTuple when doing a Datum sort. The cost modal, unfortunately,
> does not account for that. That kinda makes me think that we should
> do nothing as if we start to account for making MemoryTuples then
> we'll just penalise Tuple sorts and that might cause someone to be
> upset.

To be clear up front: I'm in favor of the patch, and I don't want to
put unnecessary stumbling blocks up for it getting committed. So if we
decide to proceed as is, that's fine with me.

But I'm not sure that the "cost model, unfortunately, does not account
for that" is entirely accurate. The end of cost_tuplesort contains a
cost "per extracted tuple". It does, however, note that it doesn't
charge cpu_tuple_cost, which maybe is what you'd want to fully
incorporate this into the model. But given this run_cost isn't about
accounting for comparison cost (that's been done earlier) which is the
part that'd be the same between tuple and datum sort, it seems to me
that we could lower the cpu_operator_cost here by something like 10%
if it's byref and 30% if it's byval?

James

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Nancarrow 2021-07-19 13:25:06 Re: Re[3]: On login trigger: take three
Previous Message tushar 2021-07-19 12:32:43 Re: refactoring basebackup.c