Re: POC: converting Lists into arrays

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: converting Lists into arrays
Date: 2019-07-16 18:52:22
Message-ID: 10784.1563303142@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> * Rationalize places that are using combinations of list_copy and
> list_concat, probably by inventing an additional list-concatenation
> primitive that modifies neither input.

I poked around to see what we have in this department. There seem to
be several identifiable use-cases:

* Concat two Lists that are freshly built, or at least not otherwise
referenced. In the old code, list_concat serves fine, leaking the
second List's header but not any of its cons cells. As of 1cff1b95a,
the second List's storage is leaked completely. We could imagine
inventing a list_concat variant that list_free's its second input,
but I'm unconvinced that that's worth the trouble. Few if any
callers can't stand to leak any storage, and if there are any where
it seems worth the trouble, adding an explicit list_free seems about
as good as calling a variant of list_concat. (If we do want such a
variant, we need a name for it. list_join, maybe, by analogy to
bms_join?)

* Concat two lists where there exist other pointers to the second list,
but it's okay if the lists share cons cells afterwards. As of the
new code, they don't actually share any storage, which seems strictly
better. I don't think we need to do anything here, except go around
and adjust the comments that explain that that's what's happening.

* Concat two lists where there exist other pointers to the second list,
and it's not okay to share storage. This is currently implemented by
list_copy'ing the second argument, but we can just drop that (and
adjust comments where needed).

* Concat two lists where we mustn't modify either input list.
This is currently implemented by list_copy'ing both arguments.
I'm inclined to replace this pattern with a function like
"list_concat_copy(const List *, const List *)", although settling
on a suitable name might be difficult.

* There's a small number of places that list_copy the first argument
but not the second. I believe that all of these are either of the form
"x = list_concat(list_copy(y), x)", ie replacing the only reference to
the second argument, or are relying on the "it's okay to share storage"
assumption to not copy a second argument that has other references.
I think we can just replace these with list_concat_copy. We'll leak
the second argument's storage in the cases where another list is being
prepended onto a working list, but I doubt it's worth fussing over.
(But, if this is repeated a lot of times, maybe it is worth fussing
over? Conceivably you could leak O(N^2) storage while building a
long working list, if you prepend many shorter lists onto it.)

* Note that some places are applying copyObject() not list_copy().
In these places the idea is to make duplicates of pointed-to structures
not just the list proper. These should be left alone, I think.
When the copyObject is applied to the second argument, we're leaking
the top-level List in the copy result, but again it's not worth
fussing over.

Comments?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-07-16 19:16:29 Re: rebased background worker reimplementation prototype
Previous Message Daniel Gustafsson 2019-07-16 18:48:20 Re: A little report on informal commit tag usage