Re: Ad Hoc Indexes

From: Justin <justin(at)emproshunts(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Stephen Denne <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Ad Hoc Indexes
Date: 2008-02-19 03:56:13
Message-ID: 47BA535D.7010601@emproshunts.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

there are certainly problems with this idea. Figuring out the cost to
either create an index or just do table scan would be pain but not
impossible. The hit to index a table in memory i don't think would be
that bad compare to do 100's to thousands of loops with table scans.

I see more options for the query planner in config file to handle ad-hoc
indexes

Now to Toms points. The point of Ad Hoc index is they're only alive for
the period time the query is running and only live in memory. Once
the query completes they die horrible deaths

These temporay indexes will not do a table lock or row lock ever it only
needs to view the record to create this index. So the same problems
that affects 100 table scans would affect create temp index. I think
this would help reduce concurrence sense the table is not being scanned
thousands of times

The idea of creating one time use indexes has been around for long time
and has showed to be a benefit when dealing with large data sets where
queries is one time or rarely used and its to much labor and cost to
figure out how to make it faster. This would also reduce IO disk
activity allot if the table can't fit in memory but the index would fit
because it relativity small in comparison.

Tom Lane wrote:
> "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz> writes:
>
>> The improvement wasn't to the part of the query that had the bad cost estimate, it was to the part that was being performed hundreds of times instead of the one time the planner estimated.
>>
>
>
>> The planner still thought it was only going to perform a sequential scan of your wooper table once. So even if there had been any Ad Hoc Index creation code that had been used to consider creating indexes as part of a plan cost estimate, it wouldn't have bothered creating any indexes on wooper.
>>
>
> Right. And even more to the point, if it had gotten the estimate right
> and known that the subquery would have been repeated, that would have
> (ideally) prompted it to shift to a different plan structure.
>
> As Peter pointed out upthread, the existing hash join logic seems to be
> a pretty decent facsimile of an "ad hoc index" --- in fact, the hash
> table *is* a hash index for all intents and purposes. If you tilt your
> head at the right angle, a merge join could be seen as comparable to
> constructing a couple of ad-hoc btree indexes. Plus the join code is
> not burdened by any overhead that actual index code would be likely to
> have, such as niggling worries about crash-safety or concurrent access
> to the index.
>
> So in my mind the issue here is not why don't we have ad hoc indexes,
> it's why the planner didn't choose a more appropriate join method.
> It's clear that faulty rowcount estimation was one part of the answer.
> Now it may also be that there are some outright structural limitations
> involved due to the "join" arising from a sub-select --- I'm not sure
> that the planner *can* generate a bulk join plan from a query expressed
> like this one. But the TODO items that this suggests to me do not
> include implementing anything I'd call an ad-hoc index.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message James Mansion 2008-02-19 06:10:47 Re: wishlist for 8.4
Previous Message Tom Lane 2008-02-19 02:52:17 Re: Ad Hoc Indexes