Re: [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes

From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes
Date: 2011-12-20 16:17:57
Message-ID: CABRT9RBP92ep4vzr3MHMOoMFP4GDmkTQ3fTmFnq-82X2o12rfA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 20, 2011 at 07:08, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> it'd likely be better if this code ignored unrecognized qual expression
> types rather than Assert'ing they're not there.

The patch replaced that Assert with an elog(ERROR)

> Hmm.  I am reminded of how utterly unreadable "diff -u" format is for
> anything longer than single-line changes :-( ...

Sorry, the new patch is in context (-C) diff format proper. I also
moved around code a bit and removed an unused variable that was left
around from the refactoring.

> but I think I don't
> like this refactoring much. Will take a closer look tomorrow.

I was afraid you'd say that, especially for a change that should be
backpatched. But I couldn't think of alternative ways to do it that
give non-bogus estimates.

----

While writing this patch, the largest dilemma was where to account for
the multiple array scans. Given that this code is mostly a heuristic
and I lack a deep understanding of GIN indexes, it's likely that I got
this part wrong.

Currently I'm doing this:
partialEntriesInQuals *= array_scans;
exactEntriesInQuals *= array_scans;
searchEntriesInQuals *= array_scans;

Which seems to be the right thing as far as random disk accesses are
concerned (successive scans are more likely to hit the cache) and also
works well with queries that don't touch most of the index. But this
fails spectacularly when multiple full scans are performed e.g. LIKE
ANY ('{%,%,%}'). Because index_pages_fetched() ends up removing all of
the rescan costs.

Another approach is multiplying the total cost from the number of
scans. This overestimates random accesses from rescans, but fixes the
above case:
*indexTotalCost = (*indexStartupCost + dataPagesFetched *
spc_random_page_cost) * array_scans;

Regards,
Marti

Attachment Content-Type Size
gin-scalar-estimates-v2.patch text/x-patch 16.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2011-12-20 16:23:02 Re: ALTER TABLE lock strength reduction patch is unsafe
Previous Message Peter Geoghegan 2011-12-20 15:53:32 Re: Replication timeout units