Re: Vectors instead of lists, specialised qsort(), binary_search(), unique()

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vectors instead of lists, specialised qsort(), binary_search(), unique()
Date: 2019-02-21 22:45:11
Message-ID: CA+hUKGL=TSM=ei0QrM81FOYQq=+K5uVUDbsG-fuNzWAPvCCaJQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 21, 2019 at 7:27 PM David Rowley
<david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> As for the ArrayList patch that I'd been working on, I was a bit
> blocked on it due to Tom's comment in [1], after having quickly looked
> over your patch I see there's no solution for that complaint.
>
> (I'd been trying to think of a solution to this as a sort of idle
> background task and so far I'd only come up with a sort of List
> indexing system, where we could build additional data structures atop
> of the list and have functions like list_nth() check for such an index
> and use it if available. I don't particularly like the idea, but it
> was the only way I could think of to work around Tom's complaint. I
> felt there was just too much list API code that relies on the list
> being a linked list. e.g checking for empty lists with list == NIL.
> lnext() etc.)
>
> So in short, I'm in favour of not just having braindead linked lists
> all over the place to store things. I will rejoice the day we move
> away from that, but also Tom's comment kinda stuck with me. He's
> often right too. Likely backpatching pain that Tom talks about would
> be much less if we used C++, but... we don't.

Right, in this patch-set I wasn't really focused on how to replace the
existing lists in a back-patch friendly style (a very legitimate
concern), I was focused on how to get the best possible machine code
for basic data structures and algorithms that are used all over the
place, by inlining as many known-at-compile-time parameters as
possible. And using the right data-structures in the first place by
making them easily available; I guess that's the bit where your ideas
and mine overlapped. I'm also interested in skipping gratuitous
allocation and pointer chasing, even in cases where eg linked lists
might not be "wrong" according to the access pattern, just because
it's cache-destructive and a waste of cycles. As Alexander Stepanov
said: "Use vectors whenever you can. If you cannot use vectors,
redesign your solution so that you can use vectors", and I think there
is also something interesting about keeping objects inside in their
containing object, instead of immediately using pointers to somewhere
else (obviously tricky with variable sized data, but that's what the
small vector optimisation idea is about; whether it's worth it after
the overhead of detecting the case, I don't know; std::string
implementors seem to think so). I was primarily interested in new
code, though I'm pretty sure there are places all over the tree where
microoptimisations are possible through specialisation (think
partitioning, sorting, searching, uniquifying in cases where the types
are fixed, or vary at runtime but there are some common cases you
might want to specialise for).

For the cases you're interested in, maybe piecemeal replication of the
planner lists that are measurably very hot is the only practical way
to do it, along with evidence that it's really worth the future
backpatching pain in each case. Or maybe there is some way to create
a set of macros that map to List operations in back branches, as you
were hinting at, to keep client code identical... I dunno.

--
Thomas Munro
https://enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2019-02-21 23:07:22 Re: psql show URL with help
Previous Message 'Bruce Momjian' 2019-02-21 22:24:34 Re: Protect syscache from bloating with negative cache entries