|From:||Christoph Heiss <christoph(dot)heiss(at)cybertec(dot)at>|
|To:||PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>|
|Cc:||Hans-Jürgen Schönig <hs(at)cybertec(dot)at>|
|Subject:||[PATCH] Add sortsupport for range types and btree_gist|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
The motivation behind this is that incrementally building up a GiST
index for certain input data can create a terrible tree structure.
Furthermore, exclusion constraints are commonly implemented using GiST
indices and for that use case, data is mostly orderable.
By sorting the data before inserting it into the index results in a much
better index structure, leading to significant performance improvements.
Testing was done using following setup, with about 50 million rows:
CREATE EXTENSION btree_gist;
CREATE TABLE t (id uuid, block_range int4range);
CREATE INDEX ON before USING GIST (id, block_range);
COPY t FROM '..' DELIMITER ',' CSV HEADER;
SELECT * FROM t WHERE id = '..' AND block_range && '..'
as test query, using a unpatched instance and one with the patch applied.
Some stats for fetching 10,000 random rows using the query above,
100 iterations to get good averages.
The benchmarking was done on a unpatched instance compiled using the
exact same options as with the patch applied.
[ Results are noted in a unpatched -> patched fashion. ]
First set of results are after the initial CREATE TABLE, CREATE INDEX
and a COPY to the table, thereby incrementally building the index.
Shared Hit Blocks (average): 110.97 -> 78.58
Shared Read Blocks (average): 58.90 -> 47.42
Execution Time (average): 1.10 -> 0.83 ms
I/O Read Time (average): 0.19 -> 0.15 ms
After a REINDEX on the table, the results improve even more:
Shared Hit Blocks (average): 84.24 -> 8.54
Shared Read Blocks (average): 49.89 -> 0.74
Execution Time (average): 0.84 -> 0.065 ms
I/O Read Time (average): 0.16 -> 0.004 ms
Additionally, the time a REINDEX takes also improves significantly:
672407.584 ms (11:12.408) -> 130670.232 ms (02:10.670)
Most of the sortsupport for btree_gist was implemented by re-using
already existing infrastructure. For the few remaining types (bit, bool,
cash, enum, interval, macaddress8 and time) I manually implemented them
directly in btree_gist.
It might make sense to move them into the backend for uniformity, but I
wanted to get other opinions on that first.
`make check-world` reports no regressions.
Attached below, besides the patch, are also two scripts for benchmarking.
`bench-gist.py` to benchmark the actual patch, example usage of this
would be e.g. `./bench-gist.py -o results.csv public.table`. This
expects a local instance with no authentication and default `postgres`
user. The port can be set using the `--port` option.
`plot.py` prints average values (as used above) and creates boxplots for
each statistic from the result files produced with `bench-gist.py`.
Depends on matplotlib and pandas.
Additionally, if needed, the sample dataset used to benchmark this is
available to independently verify the results .
|Next Message||Przemysław Sztoch||2022-06-15 11:01:37||Re: [PATCH] Completed unaccent dictionary with many missing characters|
|Previous Message||Peter Eisentraut||2022-06-15 09:23:06||Re: replacing role-level NOINHERIT with a grant-level option|