DISTINCT/Optimizer question

From: "Beth Jen" <raelys(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: DISTINCT/Optimizer question
Date: 2006-07-07 17:25:53
Message-ID: 79c97ea10607071025v62ca8f22w46b73d85bef2e66b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

My name is Beth Jen and I'm a Summer of Code student currently adding a
hash-based implementation of DISTINCT to PostgreSQL. My prototype is near
completion, and the current design is as follows:

I've created a new exec node that uses hash table functions in execGrouping.
The node simply sticks distinct values into the hash table and then returns
them, else it discards the value. The idea was to then have the optimizer
consider using this node instead of the current sort -> unique combination.

However, I've run into an issue where Jonah and Bruce suggested I post the
question to the hackers list.

Right now, the distinct clause adds its targets to the sort clause list when
it is parsed. This causes an automatic insertion of the sort node into the
query plan before the application of the unique node. The hash-based
implementation however is meant to bypass the need to sort. I could just
remove this action, but the optimizer should only consider using the
hash-based implementation, it should still use the sort -> unique
implementation when necessary. (Not to mention, this sort -> unique
implementation is used in other cases where unique values are needed, such
as in unions.) Therefore, I need to be able to somehow either move this
component from the parser into the optimizer, or prevent the creation of a
sort node when the query planner chooses to use the hash-based
implementation.

What are your suggestions for going about this? Are these approaches
feasible without a significant restructuring of the code? Are there any
other approaches I should consider?

Thank you for your assistance.

-Beth

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2006-07-07 21:04:06 Re: system info functions
Previous Message Bruce Momjian 2006-07-07 17:13:31 Three weeks left until feature freeze