Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)

From: Stefan Keller <sfkeller(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Chapman Flack <chap(at)anastigmatix(dot)net>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Oleg Ivanov <o(dot)ivanov(at)postgrespro(dot)ru>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)
Date: 2021-04-20 21:29:43
Message-ID: CAFcOn29U7X14d9ZcZDcsNJe6GadRtFTmeh7JkJJSG+rLTMuG7g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Just for the records: A learned index as no more foreknowledge about
the dataset as other indices.

I'd give learned indexes at least a change to provide a
proof-of-concept. And I want to learn more about the requirements to
be accepted as a new index (before undergoing month's of code
sprints).

As you may have seen, the "Stonebraker paper" I cited [1] is also
sceptic requiring full parity on features (like "concurrency control,
recovery, non main memory,and multi-user settings")! Non main memory
code I understand.
=> But index read/write operations and multi-user settings are part of
a separate software (transaction manager), aren't they?

~Stefan

[1] https://www.cs.brandeis.edu/~olga/publications/ieee2021.pdf

Am Di., 20. Apr. 2021 um 22:22 Uhr schrieb Peter Geoghegan <pg(at)bowt(dot)ie>:
>
> On Tue, Apr 20, 2021 at 12:51 PM Jonah H. Harris <jonah(dot)harris(at)gmail(dot)com> wrote:
> >> Maybe I'll be wrong about learned indexes - who knows? But the burden
> >> of proof is not mine. I prefer to spend my time on things that I am
> >> reasonably confident will work out well ahead of time.
> >
> >
> > Agreed on all of your takes, Peter. In time, they will probably be more realistic.
>
> A big problem when critically evaluating any complicated top-down
> model in the abstract is that it's too easy for the designer to hide
> *risk* (perhaps inadvertently). If you are allowed to make what
> amounts to an assumption that you have perfect foreknowledge of the
> dataset, then sure, you can do a lot with that certainty. You can
> easily find a way to make things faster or more space efficient by
> some ridiculous multiple that way (like 10x, 100x, whatever).
>
> None of these papers ever get around to explaining why what they've
> come up with is not simply fool's gold. The assumption that you can
> have robust foreknowledge of the dataset seems incredibly fragile,
> even if your model is *almost* miraculously good. I have no idea how
> fair that is. But my job is to make Postgres better, not to judge
> papers. My mindset is very matter of fact and practical.
>
> --
> Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-04-20 21:49:48 Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)
Previous Message Peter Geoghegan 2021-04-20 21:25:57 Re: RFE: Make statistics robust for unplanned events