Re: Speeding up parts of the planner using a binary search tree structure for nodes

From: Ashutosh Bapat <ashutosh(dot)bapat(at)2ndquadrant(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Speeding up parts of the planner using a binary search tree structure for nodes
Date: 2020-06-08 14:17:59
Message-ID: CAG-ACPWy93jG2GhYN71cctzYqbCe2QrpBSYoUcBtpHpSYt3miQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2 Jun 2020 at 03:13, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

>
>
> It does seem likely to me that hash tables would be a more efficient
> structure, but the gains might not be as big as you think. To perform
> a lookup in the table we need to hash the entire node to get the hash
> value, then perform at least one equal comparison. With the Binary
> Search Lists, we'd need more comparisons, but those can be partial
> comparisons as they can abort early when we find the first mismatching
> field in the Node. When the number of items in the list is fairly
> small that might actually be less work, especially when the Node type
> varies (since that's the first field we compare). Hash tables are
> likely to become more efficient when the number of items is larger.
> The general case is that we'll just have a small number of items. I'd
> like to improve the less common situation when the number of items is
> large with minimal impact for when the number of items is small.
>

Agree with that. I think we can again borrow from the way we search a join
- when small number of joins use list and for a larger number use hash
table. I am not suggesting that we use list in this case, but the idea is
to use two data structures. May be every eclass will use one of them
depending upon the number of members. Queries involving partitioned tables
with lots of partitions will have a large number of child eclass members.

>
> > > tlist_member()
> >
> > hash table by expression as key here as well?
>
> The order of the tlist does matter in many cases. I'm unsure how we
> could track the order that items were added to the hash table and then
> obtain the items back in that order in an efficient way. I imagine
> we'd need to store the item in some subsequent data structure, such as
> a List.

If we use hash table then we retain the list as well. But your idea below
looks better.

> There's obviously going to be some overhead to doing that.
> The idea with the Binary Search Lists was that items would be stored
> in an array, similar to List, but the cells of the list would contain
> the offset index to its parent and left and right child. New items
> would always take the next free slot in the list, just like List does
> so we'd easily be able to get the insert order by looping over the
> array of elements in order.
>
> That seems like a good idea. I am worried that the expression nodes do not
have any inherent ordering and we are proposing to use a structure which
relies on order.

--
Best Wishes,
Ashutosh

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-06-08 14:33:05 Re: snowball release
Previous Message Ashutosh Bapat 2020-06-08 14:15:56 Re: A wrong index choose issue because of inaccurate statistics