Re: Gist indexing performance with cidr types

From: Emre Hasegeli <emre(at)hasegeli(dot)com>
To: Henrik Thostrup Jensen <htj(at)nordu(dot)net>
Cc: pgsql-performance <pgsql-performance(at)postgresql(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andreas Karlsson <andreas(at)proxel(dot)se>
Subject: Re: Gist indexing performance with cidr types
Date: 2015-08-27 16:05:06
Message-ID: CAE2gYzz_Fr6PqHEN4dan_E5v+BGVyz4VSHdthxqQ2=Ay+F4_mQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> Nothing really interesting here though.

I think the slowdown is not related with the key your searched for,
but the organisation of the index. We have a simple structure for
the index keys. Basically, common bits of the child nodes are stored
on the parent node. It leads to not efficient indexes, where there
are too much values with the same prefix. I couldn't quite understand
why it performs so bad, though. You might have better luck with
ip4r extension [1] or creating an index using the range types like
this:

> # create type inetrange as range (subtype = inet);
> CREATE TYPE
>
> # create function cidr_to_range(cidr) returns inetrange language sql as 'select inetrange(set_masklen($1::inet, 0), set_masklen(broadcast($1), 0))';
> CREATE FUNCTION
>
> # create index on routes using gist ((cidr_to_range(route)));
> CREATE INDEX
>
> # explain analyze select * from routes where cidr_to_range(route) && cidr_to_range('160.75/16');
> QUERY PLAN
> ----------
> Bitmap Heap Scan on routes (cost=864.50..18591.45 rows=21173 width=19) (actual time=7.249..7.258 rows=7 loops=1)
> Recheck Cond: (inetrange(set_masklen((route)::inet, 0), set_masklen(broadcast((route)::inet), 0)) && '[160.75.0.0/0,160.75.255.255/0)'::inetrange)
> Heap Blocks: exact=3
> -> Bitmap Index Scan on routes_cidr_to_range_idx (cost=0.00..859.21 rows=21173 width=0) (actual time=7.242..7.242 rows=7 loops=1)
> Index Cond: (inetrange(set_masklen((route)::inet, 0), set_masklen(broadcast((route)::inet), 0)) && '[160.75.0.0/0,160.75.255.255/0)'::inetrange)
> Planning time: 1.456 ms
> Execution time: 7.346 ms
> (7 rows)

I have examined them about the performance problem:

* It splits pages by IP family [2] a lot of times, but deleting IPv6
rows from the table doesn't make it faster.
* It doesn't fail and do 50-50 split [3] as I expected.
* The previous posted version [4] of it works roughly twice faster,
but it is still too slow.

[1] https://github.com/RhodiumToad/ip4r
[2] network_gist.c:705
[3] network_gist.c:754
[4] CAE2gYzzioHNxdZXyWz0waruJuw7wKpEJ-2xPTihjd6Rv8YJF_w(at)mail(dot)gmail(dot)com

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jean Cavallo 2015-08-27 17:21:22 Server slowing down over time
Previous Message Henrik Thostrup Jensen 2015-08-27 09:35:31 Re: Gist indexing performance with cidr types