Re: ArrayLists instead of List (for some things)

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: ArrayLists instead of List (for some things)
Date: 2017-11-02 14:58:57
Message-ID: CAKJS1f-5K1phLzPQRpdbEZ45W-aWeYuC7V=uTaHC2aXK_zsq4A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3 November 2017 at 03:27, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> * David Rowley (david(dot)rowley(at)2ndquadrant(dot)com) wrote:
>> We could get around a few of these problems if Lists were implemented
>> internally as arrays, however, arrays are pretty bad if we want to
>> delete an item out the middle as we'd have to shuffle everything over
>> one spot. Linked lists are much better here... at least they are when
>> you already have the cell you want to remove... so we can't expect
>> that we can just rewrite List to use arrays instead.
>
> This actually depends on just how you delete the item out of the array,
> though there's a trade-off there. If you "delete" the item by marking
> it as "dead" then you don't need to re-shuffle everything, but, of
> course, then you have to make sure to 'skip' over those by running down
> the array for each list_nth() call- but here's the thing, with a small
> list that's all in a tighly packed array, that might not be too bad.
> Certainly far better than having to grab random memory on each step and
> I'm guessing you'd only actually store the "data" item for a given list
> member in the array if it's a integer/oid/etc list, not when it's
> actually a pointer being stored in the list, so you're always going to
> be working with pretty tightly packed arrays where scanning the list on
> the list_nth call really might be darn close to as fast as using an
> offset to the actual entry in the array, unless it's a pretty big list,
> but I don't think we've actually got lots of multi-hundred-deep lists.

I was hoping to have list_nth as always O(1) so that we'd never be
tempted again to use arrays directly like we are not for things like
simle_rel_array[].
Probably it could be made to work quickly in most cases by tracking if
there are any deleted items and just do the direct array lookups if
nothing is deleted, and if something is deleted then visit index n
minus bms_num_members() of some deleted bitmapset for the bits earlier
than the Nth element. bms_num_members() could be made faster with
64bit bitmap words and __builtin_popcountl() falling back on the
number_of_ones[] array when not available. Would also require a
bms_num_members_before() function.

In the implementation that I attached, I showed an example of how to
delete items out an array list, which I think is quite a bit cleaner
than how we generally do it today with List. (e.g in
reconsider_outer_join_clauses()) However, it'll still perform badly
when just a single known item is being removed.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2017-11-02 15:02:34 Re: Custom compression methods
Previous Message Alvaro Herrera 2017-11-02 14:55:25 Re: [HACKERS] pgsql: Fix freezing of a dead HOT-updated tuple