Re: POC: converting Lists into arrays

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: POC: converting Lists into arrays
Date: 2019-02-26 23:51:19
Message-ID: 16812.1551225079@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> I had an idea that perhaps is worth considering --- upthread I rejected
> the idea of deleting lnext() entirely, but what if we did so? We could
> redefine foreach() to not use it:

> #define foreach(cell, l) \
> for (int cell##__index = 0; \
> (cell = list_nth_cell(l, cell##__index)) != NULL; \
> cell##__index++)

> I think this would go a very long way towards eliminating the hazards
> associated with iterating around a list-modification operation.

I spent some time trying to modify my patch to work that way, but
abandoned the idea before completing it, because it became pretty
clear that it is a bad idea. There are at least two issues:

1. In many places, the patch as I had it only required adding an
additional parameter to lnext() calls. Removing lnext() altogether is
far more invasive, requiring restructuring loop logic in many places
where we otherwise wouldn't need to. Since most of the point of this
proposal is to not have a larger patch footprint than we have to, that
seemed like a step in the wrong direction.

2. While making foreach() work this way might possibly help in avoiding
writing bad code in the first place, a loop of this form is really
just about as vulnerable to being broken by list insertions/deletions
as what I had before. If you don't make suitable adjustments to the
integer index after an insertion/deletion then you're liable to skip
over, or double-visit, some list entries; and there's nothing here to
help you notice that you need to do that. Worse, doing things like
this utterly destroys our chance of detecting such errors, because
there's no state being kept that's in any way checkable.

I was driven to realize point 2 by noticing, while trying to get rid
of some lnext calls, that I'd mostly failed in the v1 patch to fix
loops that contain embedded list_delete() calls other than
list_delete_cell(). This is because the debug support I'd added failed
to force relocation of lists after a deletion (unlike the insertion
cases). It won't take much to add such relocation, and I'll go do that;
but with an integer-index-based loop implementation we've got no chance
of having debug support that could catch failure to update the loop index.

So I think that idea is a failure, and going forward with the v1
approach has better chances.

I did find a number of places where getting rid of explicit lnext()
calls led to just plain cleaner code. Most of these were places that
could be using forboth() or forthree() and just weren't. There's
also several places that are crying out for a forfour() macro, so
I'm not sure why we've stubbornly refused to provide it. I'm a bit
inclined to just fix those things in the name of code readability,
independent of this patch.

I also noticed that there's quite a lot of places that are testing
lnext(cell) for being NULL or not. What that really means is whether
this cell is last in the list or not, so maybe readability would be
improved by defining a macro along the lines of list_cell_is_last().
Any thoughts about that?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-02-26 23:55:29 Re: Segfault when restoring -Fd dump on current HEAD
Previous Message Alvaro Herrera 2019-02-26 23:42:29 Re: Segfault when restoring -Fd dump on current HEAD