From: | Jim Nasby <jnasby(at)upgrade(dot)com> |
---|---|
To: | Tomas Vondra <tomas(at)vondra(dot)me> |
Cc: | "Aya Iwata (Fujitsu)" <iwata(dot)aya(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 |
Date: | 2025-06-04 22:19:35 |
Message-ID: | CAMFBP2qL5GmhJSneErnXj_JXfLoH+g4p6ijyO+vvWd+xtU0QWg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jun 4, 2025 at 1:16 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> On 6/4/25 19:59, Jim Nasby wrote:
> >
> >
> > On Fri, May 23, 2025 at 4:29 PM Tomas Vondra <tomas(at)vondra(dot)me
> > <mailto:tomas(at)vondra(dot)me>> wrote:
> >
> > Also, Alvaro seemed to think TAM is the way to go, and in order to
> keep
> > the OLTP performance he suggested to use both heap and VCI at the
> same
> > time, in different "forks". I'm not sure how would that work, or if
> we
> > can already do that - AFAIK we can't, because ForkNumber does not
> allow
> > adding custom forks. We'd have to relax that, or invent some sort of
> > federated TAM (that just multiplexes it to two TAMs). Maybe.
> >
> > But it's not like the IAM approach doesn't need to do this. The first
> > patch had to add stuff to a lot of random places to make this work.
> And
> > some of the places touch stuff that we don't expect indexes to worry
> > about, like ALTER TABLE, etc.
> >
> >
> > I suspect another option would be to handle this with table inheritance:
> > have one child that is heap-based, a second that's VCI, and a background
> > job to move data from heap to VCI (and vice-versa for updates and maybe
> > deletes).
> >
> > Note that you could actually implement all that in user-space.
> > Personally I'd much rather have a way to do pure VCI / column-store
> > sooner and manage it myself than have to wait another release (or more)
> > to get a complete solution...
>
> I don't see how could this ever work with the optimizer, which assumes
> scanning an inheritance hierarchy means scanning all parts. But this
> would require making planner "smarter" to know it should scan only one
> of the child relations. And I believe it's not possible to do that while
> constructing scans for the heap/VCI parts, those places are not aware of
> what other parts are being scanned etc.
>
Right; I was envisioning that one child would be a conventional heap that
stored very recent data and another child would be columnar in nature. So
you'd definitely want to always look at both children.
I am making an assumption (based on the comment about multiple forks) that
we'd have some way to handle VCI without having an actual heap.
> Sure, you could do this in "user-space" by constructing queries that
> reference either the heap or VCI part. But then why put that into
> inheritance tree at all? It certainly does not help with moving data
> between the parts.
>
Right; I only brought it up because just having a working column-store
would be a big win, even if you had to code something to deal with any DML
that wasn't already batch up. Of course it would be better if it just did
the RightThing(TM) out of the box... but the perfect can be the enemy of
the good.
> What I can imagine is "VCI" as a "proxy" TAM on top of heap, keeping the
> columnar format in a separate fork. And using either that from custom
> scans, or the heap as a fallback for cases not supported by VCI.
>
Yeah, there'd definitely need to be some kind of proxy... I'm just
suggesting that we don't *have* to do that as a separate fork...
Of course I could also just be missing something :)
From | Date | Subject | |
---|---|---|---|
Next Message | David G. Johnston | 2025-06-04 22:25:54 | Re: PG 18 release notes draft committed |
Previous Message | Noah Misch | 2025-06-04 22:17:10 | Re: PG 18 release notes draft committed |