Re: POC: converting Lists into arrays

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: converting Lists into arrays
Date: 2019-02-25 07:43:16
Message-ID: alpine.DEB.2.21.1902250652120.25064@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Tom,

> For quite some years now there's been dissatisfaction with our List
> data structure implementation. Because it separately palloc's each
> list cell, it chews up lots of memory, and it's none too cache-friendly
> because the cells aren't necessarily adjacent. Moreover, our typical
> usage is to just build a list by repeated lappend's and never modify it,
> so that the flexibility of having separately insertable/removable list
> cells is usually wasted.
>
> Every time this has come up, I've opined that the right fix is to jack
> up the List API and drive a new implementation underneath, as we did
> once before (cf commit d0b4399d81). I thought maybe it was about time
> to provide some evidence for that position, so attached is a POC patch
> that changes Lists into expansible arrays, while preserving most of
> their existing API.

My 0.02€ about this discussion (I assume that it is what you want): I had
the same issue in the past on a research project. I used a similar but
slightly different approach:

I did not touch the existing linked list implementation but provided
another data structure, which was a linked list of buckets (small arrays)
stack kept from the head, with buckets allocated on need but not freed
until the final deallocation. If pop was used extensively, a linked list
of freed bucket was kept, so that they could be reused. Some expensive
list-like functions were not provided, so the data structure could replace
lists in some but not all instances, which was fine. The dev had then to
choose which data structure was best for its use case, and critical
performance cases could be replaced.

Note that a "foreach", can be done reasonably cleanly with a
stack-allocated iterator & c99 struct initialization syntax, which is now
allowed in pg AFAICR, something like:

typedef struct { ... } stack_iterator;

#define foreach_stack(i, s) \
for (stack_iterator i = SITER_INIT(s); SITER_GO_ON(&i); SITER_NEXT(&i))

Used with a simple pattern:

foreach_stack(i, s)
{
item_type = GET_ITEM(i);
...
}

This approach is not as transparent as your approach, but changes are
somehow less extensive, and it provides choices instead of trying to do a
one solution must fit all use cases. Also, it allows to revisit the
pointer to reference choices on some functions with limited impact.
In particular the data structure is used for a "string buffer"
implementation (like the PQExpBuffer stuff).

--
Fabien.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message hubert depesz lubaczewski 2019-02-25 07:45:39 Segfault when restoring -Fd dump on current HEAD
Previous Message Kyotaro HORIGUCHI 2019-02-25 07:40:56 Re: Protect syscache from bloating with negative cache entries