Re: Vectors instead of lists, specialised qsort(), binary_search(), unique()

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vectors instead of lists, specialised qsort(), binary_search(), unique()
Date: 2019-02-21 06:26:46
Message-ID: CAKJS1f_RfKq2XJ1i6CWOh-0PjSe-UEpbCfMG29DXF1cJAFL5_w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 21 Feb 2019 at 18:02, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> David Rowley posted some interesting stuff about an ArrayList[1] that
> would replace List in many places. (I probably would have replied to
> that thread if I hadn't changed email account, sorry.) I had similar
> thoughts when I needed to maintain a sorted vector for my refactoring
> proposal for the fsync queue, as part of my undo log work, except I
> came at the problem with some ideas from C++ in mind. However, now
> this stuff seems to be redundant for now, given some decisions we've
> arrived at in the thread about how to track the fsync queue, and I'm
> more focused on getting the undo work done.

Hi Thomas,

Glad to see further work being done in this area. I do agree that our
linked list implementation is very limited and of course, does perform
very poorly for Nth lookups and checking if the list contains
something. Seems what you have here solves (at least) these two
issues. I also think it's tragic in the many places where we take the
List and turn it into an array because we know list_nth() performs
terribly. (e.g ExecInitRangeTable(), setup_simple_rel_arrays() and
setup_append_rel_array(), it would be great to one day see those
fixed)

As for the ArrayList patch that I'd been working on, I was a bit
blocked on it due to Tom's comment in [1], after having quickly looked
over your patch I see there's no solution for that complaint.

(I'd been trying to think of a solution to this as a sort of idle
background task and so far I'd only come up with a sort of List
indexing system, where we could build additional data structures atop
of the list and have functions like list_nth() check for such an index
and use it if available. I don't particularly like the idea, but it
was the only way I could think of to work around Tom's complaint. I
felt there was just too much list API code that relies on the list
being a linked list. e.g checking for empty lists with list == NIL.
lnext() etc.)

So in short, I'm in favour of not just having braindead linked lists
all over the place to store things. I will rejoice the day we move
away from that, but also Tom's comment kinda stuck with me. He's
often right too. Likely backpatching pain that Tom talks about would
be much less if we used C++, but... we don't.

I'm happy to support the cause here, just not quite sure yet how I can
best do that.

[1] https://www.postgresql.org/message-id/21592.1509632225%40sss.pgh.pa.us

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tsunakawa, Takayuki 2019-02-21 06:37:45 RE: Protect syscache from bloating with negative cache entries
Previous Message Tsunakawa, Takayuki 2019-02-21 06:11:51 RE: Protect syscache from bloating with negative cache entries