Re: Merge Join with an Index Optimization

From: Michael Malis <michaelmalis2(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Merge Join with an Index Optimization
Date: 2016-09-11 21:07:43
Message-ID: CAFQtOwraN_11L1XuaRrsJJhE-XkReX=WH+GHNyA5uoJaoe9Kvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Sep 11, 2016 at 9:20 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Michael Malis <michaelmalis2(at)gmail(dot)com> writes:
> >> As I understand it, a merge join will currently read all tuples from
> both
> >> subqueries (besides early termination). I believe it should be possible
> to
> >> take advantages of the indexes on one or both of the tables being read
> from
> >> to skip a large number of tuples that would currently be read.
> >>
> > IIUC, what you're proposing is to replace "read past some tuples in the
> > index" with "re-descend the index btree". Yes, that would win if it
>

Roughly yes, although is it necessary to rescan the btree from the top? Is
it not possible to "resume" the scan from the previous location?

> > You might want to troll the archives for past discussions of index skip
> > scan, which is more or less the same idea (could possibly be exactly the
> > same idea, depending on how it's implemented).
>

After searching a bit, all I was able to find was this
<https://www.postgresql.org/message-id/CADLWmXXbTSBxP-MzJuPAYSsL_2f0iPm5VWPbCvDbVvfX93FKkw%40mail.gmail.com>.
It looks like someone had a rough patch of applying a similar idea to
DISTINCT. I can't tell what happened to it, but in the patch they mention
that it should be possible to do a "local traversal ie from the current
page".

A different, but similar optimization, would be to first check the join
condition with the data in the index. Then only fetch the full row only if
the data in the index passes the join condition. I imagine that this would
be easier to implement, although less efficent than what I proposed above
because it has to scan the entire index.

Thanks,
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2016-09-11 22:13:55 Re: Tuplesort merge pre-reading
Previous Message Kevin Grittner 2016-09-11 20:52:30 Re: [REVIEW] Tab Completion for CREATE DATABASE ... TEMPLATE ...