Re: POC: converting Lists into arrays

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: converting Lists into arrays
Date: 2019-07-03 06:56:25
Message-ID: CAKJS1f9RUaLrujzpqf15gg_7Wo9-ai7b3-MR=bMH7a-pOWdeZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2 Jul 2019 at 11:27, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> My previous patch would have had you replace this with a loop using
> an integer list-position index. You can still do that if you like,
> but it's less change to convert the loop to a foreach(), drop the
> prev/next variables, and replace the list_delete_cell call with
> foreach_delete_current:
>
> foreach(cell, state->enterKeys)
> {
> TrgmStateKey *existingKey = (TrgmStateKey *) lfirst(cell);
>
> if (need to delete)
> state->enterKeys = foreach_delete_current(state->enterKeys,
> cell);
> }
>
> So I think this is a win, and attached is v7.

It's pretty nice to get rid of those. I also like you've handled the
changes in SRFs.

I've now read over the entire patch and have noted down the following:

1. MergeAttributes() could do with a comment mention why you're not
using foreach() on the outer loop. I almost missed the
list_delete_nth_cell() call that's a few branches deep in the outer
loop.

2. In expandTupleDesc(), couldn't you just change the following:

int i;
for (i = 0; i < offset; i++)
{
if (aliascell)
aliascell = lnext(eref->colnames, aliascell);
}

to:

aliascell = offset < list_length(eref->colnames) ?
list_nth_cell(eref->colnames, offset) : NULL;

3. Worth Assert(list != NIL); in new_head_cell() and new_tail_cell() ?

4. Do you think it would be a good idea to document that the 'pos' arg
in list_insert_nth and co must be <= list_length(). I know you've
mentioned that in insert_new_cell, but that's static and
list_insert_nth is not. I think it would be better not to have to
chase down comments of static functions to find out how to use an
external function.

5. Why does enlarge_list() return List *? Can't it just return void?
I noticed this after looking at add_new_cell_after() and reading your
cautionary comment and then looking at lappend_cell(). At first, it
seemed that lappend_cell() could end up reallocating List to make way
for the new cell, but from looking at enlarge_list() it seems we
always maintain the original allocation of the header. So why bother
returning List * in that function?

6. Is there a reason to use memmove() in list_concat() rather than
just memcpy()? I don't quite believe the comment you've written. As
far as I can see you're not overwriting any useful memory so the order
of the copy should not matter.

7. The last part of the following comment might not age well.

/*
* Note: unlike the individual-list-cell deletion functions, we never make
* any effort to move the surviving list cells to new storage. This is
* because none of them can move in this operation, so it's the same as
* the old implementation in terms of what callers may assume.
*/

The old comment about extending the list seems more fitting.

9. I see your XXX comment in list_qsort(), but wouldn't it be better
to just invent list_qsort_internal() as a static function and just
have it qsort the list in-place, then make list_qsort just return
list_qsort_internal(list_copy(list)); and keep the XXX comment so that
the fixup would just remove the list_copy()? That way, when it comes
to fixing that inefficiency we can just cut and paste the internal
implementation into list_qsort(). It'll be much less churn, especially
if you put the internal version directly below the external one.

10. I wonder if we can reduce a bit of pain for extension authors by
back patching a macro that wraps up a lnext() call adding a dummy
argument for the List. That way they don't have to deal with their
own pre-processor version dependent code. Downsides are we'd need to
keep the macro into the future, however, it's just 1 line of code...

I also did some more benchmarking of the patch. Basically, I patched
with the attached file (converted to .txt not to upset the CFbot) then
ran make installcheck. This was done on an AWS m5d.large instance.
The patch runs the planner 10 times then LOGs the average time of
those 10 runs. Taking the sum of those averages I see:

Master: 0.482479 seconds
Patched: 0.471949 seconds

Which makes the patched version 2.2% faster than master on that run.
I've resisted attaching the spreadsheet since there are almost 22k
data points per run.

Apart from the 10 points above, I think the patch is good to go.

I also agree with keeping the further improvements like getting rid of
the needless list_copy() in list_concat() calls as a followup patch. I
also agree with Tom about moving quickly with this one. Reviewing it
in detail took me a long time, I'd really rather not do it again after
leaving it to rot for a while.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
benchmark_list.txt text/plain 918 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2019-07-03 07:05:28 Re: [PATCH v4] Add \warn to psql
Previous Message Etsuro Fujita 2019-07-03 06:44:16 Re: [HACKERS] advanced partition matching algorithm for partition-wise join