From: | Peter Geoghegan <pg(at)heroku(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Subject: | Abbreviated keys for text cost model fix |
Date: | 2015-02-22 01:59:17 |
Message-ID: | CAM3SWZQeCekUShZXgW8_nPuuF5s9eUVt+Fi7LkvBSHmXUUDcZg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Over in the abbreviated keys for numeric thread, Tomas Vondra reports
a case where the ad-hoc cost model of abbreviated keys/sortsupport for
text is far too conservative, and aborts a case where abbreviated keys
can still greatly help.
Previously, I proposed additions to the cost model that dealt with all
this, by passing down a reltuples hint from the optimizer or a
relcache entry to tuplesort. Robert seemed to think that this was a
little bit much, and that it could be discussed at a later point. It's
clear that we need to do something about cases like the one reported
by Tomas, though.
Attached patch adds a decay to the threshold that (estimated)
abbreviated key cardinality must cross as a proportion of the
(estimated) cardinality of the original set of strings to be sorted,
while also quadrupling the initial required proportional cardinality
to 20% of full key cardinality (but for just the first 10,000 strings,
before this threshold is decayed aggressively). This seems
significantly less invasive than what I previously proposed, while
still being effective. I suggest that this be treated as a bugfix,
since Tomas reports a 5% -7% regression with his case. OTOH, with this
patch, I saw the same case have a ~300% improvement in performance.
The adjusted cost model strongly prefers to decide to abort early,
which seems desirable. In one way this makes it a little more
vulnerable to skewed sets, since early data matters more, but in
another more important way it's less vulnerable, since perhaps that
skew indicates a general skew in the data, a skew that makes the idea
of measuring the cardinality of abbreviated keys as a proxy for full
key cardinality less useful.
Thoughts?
--
Peter Geoghegan
Attachment | Content-Type | Size |
---|---|---|
cost_model.patch | text/x-patch | 3.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Petr Jelinek | 2015-02-22 02:00:54 | Re: proposal: searching in array function - array_position |
Previous Message | Jeff Davis | 2015-02-22 01:38:36 | Re: PATCH: decreasing memory needlessly consumed by array_agg |