From: | Tomasz Ostrowski <tometzky+pg(at)ato(dot)waw(dot)pl> |
---|---|
To: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com> |
Subject: | Re: Multicolumn hash indexes |
Date: | 2017-09-27 21:52:19 |
Message-ID: | 3d44a329-7ced-6d43-06c3-4a2d0bb62f38@ato.waw.pl |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 09/27/2017 05:57 PM, Tom Lane wrote:
> If we follow GIST's lead that the leading column is "most important",
> the idea could be to require a search constraint on the first column,
> which produces the hash that determines the bucket assignment. Hashes
> for additional columns would just be payload data in the index entry.
> If you have search constraint(s) on low-order column(s), you can check
> for hash matches before visiting the heap, but they don't reduce how
> much of the index you have to search. Even btree works that way for
> many combinations of incomplete index constraints.
I feel that this would eliminate a large amount of potential gains from
such an index. This would be usable only when a sufficiently variable
column exists, in which case a simple hash index on the column wouldn't
be much worse.
But I have an idea. What if there was a requirement for the search
criteria to use tuple equality comparison:
where (a,b,c)=(?,?,?)
or
where (a,b,c) in ((?,?,?),(?,?,?),(?,?,?),(?,?,?))
Wouldn't it eliminate problems with disappearing conditions?
Actually maybe this could be implemented just like a functional index.
So it would implement reasonably something that otherwise would be a
terribly hackish and slow solution like:
create or replace function hashhack(a bytea, b bytea)
returns bigint
language sql
immutable
as $$
-- uses 'x1e' (record separator)
-- to ensure hashhack('a','')!=hashhack('','a')
select (
'x'
||
substr(
md5($1||'\x1e'::bytea||$2),
1,
16
)
)::bit(64)::bigint;
$$;
create index t_hashhack_a_b_idx
on t( hashhack(a::bytea,b::bytea) );
select * from t
where a='a' and b='b'
and
hashhack(a::bytea, b::bytea)
=
hashhack('a'::bytea,'b'::bytea);
If if was automatic man could avoid the overhead of converting data to
bytea/string, concatenating, truncating, converting back to bigint,
rechecking condition etc. that make this kind of hack not very sane.
Even providing a specially crafted function or operator for queries
specifically targeted for the index would be quite sufficient:
where pg_equal( (a,b,c), (?,?,?) );
--
Tomasz "Tometzky" Ostrowski
From | Date | Subject | |
---|---|---|---|
Next Message | Nico Williams | 2017-09-27 21:52:24 | Re: Multicolumn hash indexes |
Previous Message | Michael Paquier | 2017-09-27 21:51:46 | Re: Use of RangeVar for partitioned tables in autovacuum |