Re: [Fwd: Index Advisor]

From: "Gurjeet Singh" <singh(dot)gurjeet(at)gmail(dot)com>
To: "Kai-Uwe Sattler" <kus(at)tu-ilmenau(dot)de>
Cc: "PGSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>, "Gurjeet Singh" <gurjeet(dot)singh(at)enterprisedb(dot)com>
Subject: Re: [Fwd: Index Advisor]
Date: 2006-11-20 05:35:46
Message-ID: 65937bea0611192135q2f9e8a74he5b84b5b0880f5be@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/19/06, Kai-Uwe Sattler <kus(at)tu-ilmenau(dot)de> wrote:
>
> Hi,
> > .) The SELECTs in the pg_advise are returning wrong results, when
> > the same index is suggested twice, because of the SUM() aggregates.
> I don't think that this is a bug. If the same index is recommended
> for two different queries it will appear two times in
> pg_indexadvisor. So, if you want to calculate the overall benefit of
> this index, then you have to sum up the local benefits for each query.

If this is intended behaviour, then its okay.

> .) I doubt that on a table t(a,b), for a suggestion of idx(b,a),
> > pg_advise will
> > suggest idx(a,b);
> ?? Not sure, if I understand you right. idx(b,a) and idx(a,b) are
> completely different indexes. Why should pg_advise suggest idx(a,b).

I am referring to the way get_column_names() is coded. First, the SQL for
the portal does not guarantee any order of the result; secondly, the 'for'
loops that follow, will always output the columns in their increasing order
of attribute number. Here's a small way to reproduce the bug, that I cooked
up just now:

Change the SQL in read_advisor_output() to:

res = PQexec(conn, "DECLARE myportal CURSOR FOR "
"SELECT relname,"
"int2vector_to_string(index_attrs) AS colids,"
"MAX(index_pages) AS size_in_pages,"
"SUM(profit) AS benefit,"
"SUM(profit)/MAX(index_pages) AS gain "
"FROM pg_indexadvisor,"
"pg_class "
"WHERE backend_pid = 0 "
"AND rel_oid = pg_class.oid "
"GROUP BY relname, colids "
"ORDER BY gain DESC");

Notice the backend_pid = 0. Now insert the following into pg_indexadvisor:

insert into pg_indexadvisor values( 1259, '2 1', 2, 1000, 20,0,0 );

This should prompt the advisor to generate the statement:

create index advidx_1 on pg_class (relnamespace,relname);

But instead, it will output this:

create index advidx_1 on pg_class (relname,relnamespace);

Now run the advisor with any workload, and inspect the output.

$ pg_advise.exe -d postgres -h localhost -p 5432 -U gsk -o create_index.sql
workload.sql

We should tokenize the list of attribute numbers (column_ids variable) in
get_column_names() and query them individually.

But there is another bug: if there are recommendations like idx
> (a,b,c), idx(a,b) and idx(a) it would be a good idea to create just
> idx(a). I will add this to pg_advise as an optional feature.
>

I'd say it's a new feature request and not a bug :) But I don't understand
why would you want to not build idx(a,b,c) in such a situation? idx(a,b,c)
can be useful in places where idx(a,b) or idx(a) is required, but the same
can't be said about idx(a) or idx(a,b) being useful where idx(a,b,c) is
required!

Best regards,

--
gurjeet[(dot)singh](at)EnterpriseDB(dot)com
singh(dot)gurjeet(at){ gmail | hotmail | yahoo }.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Fuhr 2006-11-20 05:54:57 Statistics visibility in SERIALIZABLE transactions
Previous Message Alvaro Herrera 2006-11-19 22:37:53 Re: [GENERAL] Allowing SYSDATE to Work