Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Cc: Chapman Flack <chap(at)anastigmatix(dot)net>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Stefan Keller <sfkeller(at)gmail(dot)com>, Oleg Ivanov <o(dot)ivanov(at)postgrespro(dot)ru>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)
Date: 2021-04-20 20:22:27
Message-ID: CAH2-Wzkwopr028p+3d=kNjd6hoA3V+Zy6hbou-LcHaV8FymH9Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 20, 2021 at 12:51 PM Jonah H. Harris <jonah(dot)harris(at)gmail(dot)com> wrote:
>> Maybe I'll be wrong about learned indexes - who knows? But the burden
>> of proof is not mine. I prefer to spend my time on things that I am
>> reasonably confident will work out well ahead of time.
>
>
> Agreed on all of your takes, Peter. In time, they will probably be more realistic.

A big problem when critically evaluating any complicated top-down
model in the abstract is that it's too easy for the designer to hide
*risk* (perhaps inadvertently). If you are allowed to make what
amounts to an assumption that you have perfect foreknowledge of the
dataset, then sure, you can do a lot with that certainty. You can
easily find a way to make things faster or more space efficient by
some ridiculous multiple that way (like 10x, 100x, whatever).

None of these papers ever get around to explaining why what they've
come up with is not simply fool's gold. The assumption that you can
have robust foreknowledge of the dataset seems incredibly fragile,
even if your model is *almost* miraculously good. I have no idea how
fair that is. But my job is to make Postgres better, not to judge
papers. My mindset is very matter of fact and practical.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-04-20 20:28:27 Re: when the startup process doesn't
Previous Message Jonah H. Harris 2021-04-20 19:51:30 Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)