Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)

From: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Chapman Flack <chap(at)anastigmatix(dot)net>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Stefan Keller <sfkeller(at)gmail(dot)com>, Oleg Ivanov <o(dot)ivanov(at)postgrespro(dot)ru>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)
Date: 2021-04-20 19:51:30
Message-ID: CADUqk8VYXaA3_OC=pwvR90MwKVp2tWrEVN_o7L5AB57zedxyuA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 20, 2021 at 3:45 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:

> On Tue, Apr 20, 2021 at 12:35 PM Chapman Flack <chap(at)anastigmatix(dot)net>
> wrote:
> > How would showing that to be true for data structure X be different from
> > making a case for data structure X?
>
> You don't have to understand the theoretical basis of B-Tree indexes
> to see that they work well. In fact, it took at least a decade for
> somebody to formalize how the space utilization works with B-Trees
> containing random data. Of course theory matters, but the fact is that
> B-Trees had been widely used for commercial and scientific
> applications that whole time.
>
> Maybe I'll be wrong about learned indexes - who knows? But the burden
> of proof is not mine. I prefer to spend my time on things that I am
> reasonably confident will work out well ahead of time.
>

Agreed on all of your takes, Peter. In time, they will probably be more
realistic. But, at present, I tend to see the research papers make
comparisons between learned vs. traditional pitching the benefits of the
former without any of the well-known optimizations of the latter - as if
time stood still since the original B-Tree. Similarly, where most academic
research starts to fall apart in practicality is the lack of addressing
realistic write volumes and related concurrency issues. I'm happy to be
disproven on this, though.

--
Jonah H. Harris

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2021-04-20 20:22:27 Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)
Previous Message Peter Geoghegan 2021-04-20 19:45:19 Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)