Re: GiST support for inet datatypes

From: Emre Hasegeli <emre(at)hasegeli(dot)com>
To: Andreas Karlsson <andreas(at)proxel(dot)se>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST support for inet datatypes
Date: 2014-01-19 10:10:12
Message-ID: CAE2gYzwKR2ib9JfV9U7fh4mAEPWEnQcUjd8H8zNV7JnuP9EgmA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2014-01-19 Andreas Karlsson <andreas(at)proxel(dot)se>:
> Hi,
>
> I will review your two patches (gist support + selectivity). This is part 1
> of my review. I will look more into the actual GiST implementation in a
> couple of days, but thought I could provide you with my initial input right
> away.

Thank you for looking at it.

>
> inet-gist
> ---------
>
> General:
>
> I like the idea of the patch and think the && operator is useful for
> exclusion constraints, and that indexing the contains operator is useful for
> IP-address lookups. There is an extension, ip4r, which adds a GiST indexed
> type for IP ranges but since we have the inet type in core I think it should
> have GiST indexes.
>
> I am not convinced an adjacent operator is useful for the inet type, but if
> it is included it should be indexed just like -|- of ranges. We should try
> to keep these lists of indexed operators the same.

I added it just not to leave negotor field empty. It can also be useful with
exclusion constraints but not with GiST support. I did not add GiST support
for it and the not equals operator because they do not fit the index
structure. I can just remove the operator for now.

>
> Compilation:
>
> Compiled without errors.
>
> Regression tests:
>
> One of the broken regression tests seems unrelated to the selectivity
> functions.
>
> -- Make a list of all the distinct operator names being used in particular
> -- strategy slots.
>
> I think it would be fine just to add the newly indexed operators here, but
> the more indexed operators we get in the core the less useful this test
> becomes.

I did not add the new operators to the list because I do not feel right
about supporting <<, <<=, >>, >>= symbols for these operators.
They should be <@, <@=, @>, @>= to be consistent with other types.

>
> I am a bit suspicious about your memcmp based optimization in bitncommon,
> but it could be good. Have you benchmarked it compared to doing the same
> thing with a loop?

I did, when I was writing that part. I will be happy to do it again. I will
post the results.

>
> Documentation:
>
> Please use examples in the operator table which will evaluate to true, and
> for the && case an example where not both sides are the same.

I will change in the next version.

>
> I have not found a place either in the documentation where it is documented
> which operators are documented. I would suggest just adding a short note
> after the operators table.

I will add in the next version.

>
> inet-selfuncs
> -------------
>
> Compilation:
>
> There were some new warnings caused by this patch (with gcc 4.8.2). The
> warnings were all about the use of uninitialized variables (right,
> right_min_bits, right_order) in inet_hist_overlap_selectivity. Looking at
> the code I see that they are harmless but you should still rewrite the loop
> to silence the warnings.

I will fix in the next version.

>
> Regression tests:
>
> There are two broken
>
> -- Insist that all built-in pg_proc entries have descriptions
>
> Here you should just add descriptions for the new functions in pg_proc.h.

I will add in the next version.

>
> -- Check that all opclass search operators have selectivity estimators.
>
> Fails due to missing selectivity functions for the operators. I think you
> should at least for now do like the range types and use areajoinsel,
> contjoinsel, etc for the join selectivity estimation.

I did not support them in the first version because I could not decide
the way. I have better options than using the the geo_selfuncs for these
operators. The options are from simple to complex:

0) just use network_overlap_selectivity
1) fix network_overlap_selectivity with a constant between 0 and 1
2) calculate this constant by using masklens of all buckets of the histogram
3) calculate this constant by using masklens of matching buckets of
the histogram
4) store STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM for this
types, calculate this constant with it
5) create another kind of histogram for masklens, calculate this constant
with it

I do not know how to do 4 or 5, so I will try 3 for the next version. Do you
think it is a good idea?

>
> Source code:
>
> The selectivity estimation functions, network_overlap_selectivity and
> network_adjacent_selectivity, should follow the naming conventions of the
> other selectivity estimation functions. They are named things like
> tsmatchsel, tsmatchjoinsel, and rangesel.

I will rename it to inetoverlapsel, then.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marko Tiikkaja 2014-01-19 11:43:24 Re: array_length(anyarray)
Previous Message Tom Lane 2014-01-19 08:22:10 Re: Re: Patch to add support of "IF NOT EXISTS" to others "CREATE" statements