Re: POC: converting Lists into arrays

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: POC: converting Lists into arrays
Date: 2019-02-25 23:19:43
Message-ID: 8358.1551136783@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> Btw, should we remove the ENABLE_LIST_COMPAT stuff independent of this
> discussion? Seems like we could even just do that for 12.

+1. I took it out in the POC patch, but I see no very good reason
not to do it sooner than that.

>> The most critical thing that we lose by doing this is that when a
>> List is modified, all of its cells may need to move, which breaks
>> a lot of code that assumes it can insert or delete a cell while
>> hanging onto a pointer to a nearby cell.

> We could probably "fix" both this, and the cost of making such
> modifications, by having more of an list-of-arrays style
> representation. When adding/removing middle-of-the-"list" elements, we
> could chop that array into two (by modifying metadata, not freeing), and
> shove the new data into a new array inbetween. But I don't think that'd
> overall be a win, even if it'd get us out of the silent API breakage
> business.

Yeah, I'm afraid that would still leave us with pretty expensive
primitives.

>> 1. This still involves at least two palloc's for every nonempty List,
>> because I kept the header and the data array separate. Perhaps it's
>> worth allocating those in one palloc.

> Hm, I think if we force external code to audit their code, we better
> also do this. This is a significant number of allocations, and I don't
> think it'd be good to spread this out over two releases.

If we choose to do it, I'd agree with doing both in the same major release
cycle, so that extensions see just one breakage. But I think it'd still
best be developed as a follow-on patch.

I had an idea that perhaps is worth considering --- upthread I rejected
the idea of deleting lnext() entirely, but what if we did so? We could
redefine foreach() to not use it:

#define foreach(cell, l) \
for (int cell##__index = 0; \
(cell = list_nth_cell(l, cell##__index)) != NULL; \
cell##__index++)

We'd need to fix list_nth_cell() to return NULL not choke on an index
equal to (or past?) the array end, but that's easy.

I think this would go a very long way towards eliminating the hazards
associated with iterating around a list-modification operation.
On the downside, it's hard to see how to merge it with the other idea
for evaluating the List reference only once, so we'd still have the
hazard that the list ref had better be a stable expression. But that's
likely to be much easier to audit for than what the POC patch asks
people to do (maybe there's a way to check it mechanically, even?).

Also, any code that does contain explicit use of lnext() is likely
in need of rethinking anyhow, so taking it away would help answer
the objection about making problems easy to identify.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2019-02-25 23:19:57 Re: NOT IN subquery optimization
Previous Message Tom Lane 2019-02-25 23:01:14 Re: Allowing extensions to supply operator-/function-specific info