linked list rewrite

From: Neil Conway <neilc(at)samurai(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: linked list rewrite
Date: 2004-03-23 10:00:14
Message-ID: E1332BE6-7CB0-11D8-8804-000A95AB279E@samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

I'd like to wrap up the linked list rewrite, and get the code into CVS
reasonably soon. If you're unfamiliar with the motivations for
redesigning the linked list code, check the archives for previous
discussions, such as:


http://www.mail-archive.com/pgsql-patches(at)postgresql(dot)org/msg02204.html

http://www.mail-archive.com/pgsql-patches(at)postgresql(dot)org/msg01320.html

The basic idea is to ditch the Lisp-style linked list representation
(in which a "list" is merely a pointer to the head node), and adopt a
new List struct like so:

typedef struct List
{
NodeTag type; /* T_List, T_IntList, or T_OidList */
int length;
ListCell *head;
ListCell *tail;
} List;

This allows us to implement most of the important list operations
(length, prepend, append and concatenation) with constant time
complexity, and speed up some other operations (such as equality).

The known remaining issues that need to be addressed are:

(1) API naming

In the latest prototype implementation, I've adopted a completely new
naming convention for the functions in the list API. The new API looks
like:

List *list_append(List *list, void *datum);
List *list_prepend(List *list, void *datum);
List *list_conc(List *list1, List *list2);
bool list_member(List *list, void *datum);
List *list_remove(List *list, void *datum);
etc.

Function variants which operate on integer lists (type = T_IntList)
have an "_int" suffix, whereas OID list functions use an "_oid" suffix.
By default, list operations use structural equality (i.e. equal()) --
if a list function uses pointer equality to determine if two list
elements are the same, it has the "_simple" suffix. And so on.

In contrast, the existing List API is a lot more inconsistent (IMHO). I
elaborated on some of the deficiencies of the existing API naming
scheme here:

http://archives.postgresql.org/pgsql-patches/2004-01/msg00188.php

Tom objected to changing the names:

http://archives.postgresql.org/pgsql-patches/2004-01/msg00186.php
http://archives.postgresql.org/pgsql-patches/2004-01/msg00191.php

(I won't summarize Tom's cogent points here so as not to put words in
his mouth; however, please read those posts.)

I think changing the list API names is a good idea for a few reasons.
First and foremost, it improves the readability of the code and the
consistency of the list API. Second, one objection to changing the API
names is that it requires more changes to the rest of the tree. While
that's indisputable, I think we'll realistically need to visit the vast
majority of the List API call sites in order to implement the list
rewrite anyway, so in the long run I don't think changing the API names
will be that much extra work. Third, it has the benefit of cleanly and
totally breaking existing code that uses the List API, so that we don't
miss any List API call sites whose semantics have subtly changed (e.g.
a switch of a particular function from structural equality to pointer
equality).

So, would people prefer that we keep the old API names, or adopt the
new naming conventions?

(2) Remaining implementation work

The code has drifted a fair bit underneath the latest list rewrite
patch (which was incomplete anyway). It's unfortunately been a little
while since I've looked at the patch, but I don't recall there being
any showstopping problems that needed to be resolved --- it's just a
matter of wrapping up remaining loose ends. I'm going to get started on
this on today.

(3) Apply the work to CVS, and update the rest of the tree for the new
API

The amount of integration work that needs to be done partially depends
on the resolution to #1, but either way the list rewrite will require a
lot of (relatively minor) changes scattered throughout the tree. What
is the best way to land these changes in HEAD with minimal disturbance
to other developers?

One possibility is to create a private CVS branch, implement any
necessary changes throughout the source tree and test all those
changes, and then merge the branch into HEAD once it is ready. This
would still break outstanding patches, as Tom has pointed out, but it
might reduce the disturbance to other developers. It doesn't really
matter to me whether this is done or not (I can do my own revision
control locally if there's a need for it), so speak up if you think
this would be worth doing.

Another possibility is to create some wrapper macros / functions that
would insulate the changes in the list API, so that we could make the
changes incrementally. Not sure how viable this is.

That's about it. Does anyone know of anything else that will need to be
tackled before we can merge the list rewrite?

Cheers,

Neil

Responses

Browse pgsql-general by date

  From Date Subject
Next Message lnd 2004-03-23 10:40:36 Re: PHP or JSP? That is the question.
Previous Message Paul Thomas 2004-03-23 09:48:14 Re: Problems with pg_dumpall

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthew T. O'Connor 2004-03-23 13:11:25 Re: pg_autovacuum
Previous Message Fabien COELHO 2004-03-23 08:52:45 LIKE ANY and the like?