Re: Reducing planning time on tables with many indexes

From: David Geier <geidav(dot)pg(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Reducing planning time on tables with many indexes
Date: 2022-08-01 13:33:55
Message-ID: CAPsAnrmz8qeSEGNyfePqY3hy8L9q15idqaaM6dAs4kbXmNggug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tom,

On Wed, Jul 27, 2022 at 7:15 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> I wrote:
> > Unfortunately, as things stand today, the planner needs more than the
> > right to look at the indexes' schemas, because it makes physical accesses
> > to btree indexes to find out their tree height (and I think there are
> some
> > comparable behaviors in other AMs). I've never particularly cared for
> > that implementation, and would be glad to rip out that behavior if we can
> > find another way. Maybe we could persuade VACUUM or ANALYZE to store
> that
> > info in the index's pg_index row, or some such, and then the planner
> > could use it with no lock?
>
It seems like _bt_getrootheight() first checks if the height is cached and
only if it isn't it accesses index meta pages.
If the index locks are only taken for the sake of _bt_getrootheight()
accessing index meta pages in case they are not cached, maybe the index
locks could be taken conditionally.
However, postponing the call to where it is really needed sounds even
better.

>
> A first step here could just be to postpone fetching _bt_getrootheight()
> until we actually need it during cost estimation. That would avoid the
> need to do it at all for indexes that indxpath.c discards as irrelevant,
> which is a decision made on considerably more information than the
> proposed patch uses.
>
> Having done that, you could look into revising plancat.c to fill the
> IndexOptInfo structs from catcache entries instead of opening the
> index per se. (You'd have to also make sure that the appropriate
> index locks are acquired eventually, for indexes the query does use
> at runtime. I think that's the case, but I'm not sure if anything
> downstream has been optimized on the assumption the planner did it.)
>
> I can give this a try.
That way we would get rid of the scalability issues.
However, what about the runtime savings observed with a single query stream?
In that case there is no contention, so it seems like having less indexes
to look at further down the road, also yields substantial savings.
Any clue where exactly these savings might come from? Or is it actually
the calls to _bt_getrootheight()? I can also do a few perf runs to track
that down.

> This'd probably get us a large part of the way there. Further
> optimization of acquisition of tree height etc could be an
> optional follow-up.
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhang Mingli 2022-08-01 13:56:14 [feature]COPY FROM enable FORCE_NULL/FORCE_NOT_NULL on all columns
Previous Message Euler Taveira 2022-08-01 13:15:25 Re: logical replication restrictions