Re: [HACKERS] Surjective functional indexes

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Christoph Berg <myon(at)debian(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Surjective functional indexes
Date: 2018-11-09 16:08:58
Message-ID: 188663b1-3392-2e2a-0f41-cf1e1e3200ae@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09.11.2018 2:27, Tom Lane wrote:
> I wrote:
>> The bigger picture here, and the reason for my skepticism about having
>> any intelligence in the enabling logic, is that there is no scenario
>> in which this code can be smarter than the user about what to do.
>> We have no insight today, and are unlikely to have any in future, about
>> whether a specific index expression is many-to-one or not.
> Hmm ... actually, I take that back. Since we're only interested in this
> for expression indexes, we can expect that statistics will be available
> for the expression index, at least for tables that have been around
> long enough that UPDATE performance is really an exciting topic.
> So you could imagine pulling up the stadistinct numbers for the index
> column(s) and the underlying column(s), and enabling the optimization
> when their ratio is less than $something. Figuring out how to merge
> numbers for multiple columns might be tricky, but it's not going to be
> completely fact-free. (I still think that the cost-estimate logic is
> quite bogus, however.)
>
> Another issue in all this is the cost of doing this work over again
> after any relcache flush. Maybe we could move the responsibility
> into ANALYZE?
>
> BTW, the existing code appears to be prepared to enable this logic
> if *any* index column is an expression, but surely we should do so
> only if they are *all* expressions?
>
> regards, tom lane

From my point of view "auto" value should be default, otherwise it has
not so much sense.
If somebody decides to switch on this optimization for some particular
index, then it will set it to "on", not "auto".
So I agree with your previous opinion, that if this optimization is
disabled by default, then it is enough to have boolean parameter.

Concerning muticolumn indexes: why we should apply this optimization
only if *all* of index columns are expressions?
Assume very simple example: we have some kind of document storage
represented by the following table:

     create table document(owner integer, name text, last_updated
timestamp, description json);

So there are some static document attributes (name, date,...) and some
dynamic, stored in json field.
Consider that most frequently users will search among their own documents.
So we may create index like this:

     create index by_title on documents(owner,(description->>'title'));

Document description may include many attributes which are updated quite
frequently, like "comments", "keywords",...
But "title" is rarely changed, so this optimization will be very useful
for such index.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2018-11-09 16:16:16 Re: Alternative to \copy in psql modelled after \g
Previous Message Alvaro Herrera 2018-11-09 15:59:43 Re: BUG #15212: Default values in partition tables don't work as expected and allow NOT NULL violation