Abbreviated keys for text cost model fix

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Subject: Abbreviated keys for text cost model fix
Date: 2015-02-22 01:59:17
Message-ID: CAM3SWZQeCekUShZXgW8_nPuuF5s9eUVt+Fi7LkvBSHmXUUDcZg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Over in the abbreviated keys for numeric thread, Tomas Vondra reports
a case where the ad-hoc cost model of abbreviated keys/sortsupport for
text is far too conservative, and aborts a case where abbreviated keys
can still greatly help.

Previously, I proposed additions to the cost model that dealt with all
this, by passing down a reltuples hint from the optimizer or a
relcache entry to tuplesort. Robert seemed to think that this was a
little bit much, and that it could be discussed at a later point. It's
clear that we need to do something about cases like the one reported
by Tomas, though.

Attached patch adds a decay to the threshold that (estimated)
abbreviated key cardinality must cross as a proportion of the
(estimated) cardinality of the original set of strings to be sorted,
while also quadrupling the initial required proportional cardinality
to 20% of full key cardinality (but for just the first 10,000 strings,
before this threshold is decayed aggressively). This seems
significantly less invasive than what I previously proposed, while
still being effective. I suggest that this be treated as a bugfix,
since Tomas reports a 5% -7% regression with his case. OTOH, with this
patch, I saw the same case have a ~300% improvement in performance.

The adjusted cost model strongly prefers to decide to abort early,
which seems desirable. In one way this makes it a little more
vulnerable to skewed sets, since early data matters more, but in
another more important way it's less vulnerable, since perhaps that
skew indicates a general skew in the data, a skew that makes the idea
of measuring the cardinality of abbreviated keys as a proxy for full
key cardinality less useful.

Thoughts?
--
Peter Geoghegan

Attachment Content-Type Size
cost_model.patch text/x-patch 3.9 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2015-02-22 02:00:54 Re: proposal: searching in array function - array_position
Previous Message Jeff Davis 2015-02-22 01:38:36 Re: PATCH: decreasing memory needlessly consumed by array_agg