Re: REVIEW Range Types

From: "Erik Rijkers" <er(at)xs4all(dot)nl>
To: "Jeff Davis" <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: REVIEW Range Types
Date: 2011-02-08 19:43:18
Message-ID: a0804f90179263648c96f89d481cd85e.squirrel@webmail.xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, February 6, 2011 07:41, Jeff Davis wrote:
> New patch. All known TODO items are closed, although I should do a
> cleanup pass over the code and docs.
>
> Fixed in this patch:
>
> * Many documentation improvements
> * Added INT8RANGE
> * Renamed PERIOD[TZ] -> TS[TZ]RANGE
> * Renamed INTRANGE -> INT4RANGE
> * Improved parser's handling of whitespace and quotes
> * Support for PL/pgSQL functions with ANYRANGE arguments/returns
> * Make "subtype_float" function no longer a requirement for GiST,
> but it should still be supplied for the penalty function to be
> useful.
>

I'm afraid even the review is WIP, but I thought I'd post what I have.

Context: At the moment we use postbio (see below) range functionality, to search ranges and
overlap in large DNA databases ('genomics'). We would be happy if a core data type could replace
that. It does look like the present patch is ready to do those same tasks, of which the main one
for us is gist-indexed ranges. We also use btree_gist with that, so to include that in core would
make sense in this regard.

test config:

./ configure \
--prefix=/var/data1/pg_stuff/pg_installations/pgsql.range_types \
--with-pgport=6563 \
--enable-depend \
--enable-cassert \
--enable-debug \
--with-perl \
--with-openssl \
--with-libxml \
--enable-dtrace

compile, make, check, install all OK.

------------------
Submission review:
------------------

* Is the patch in context diff format?

Yes.

* Does it apply cleanly to the current git master?

It applied cleanly. (after the large serialisation commit 2011.02.08 it will need some
changes/rebase)

* Does it include reasonable tests, necessary doc patches, etc?

Yes, there are many tests; the documentation is good. Small improvements below.

-----------------
Usability review
-----------------

Read what the patch is supposed to do, and consider:

* Does the patch actually implement that?

Yes.

* Do we want that?

Yes.

* Do we already have it?

contrib/seg has some similar functionalities: "seg is a data type for representing line segments,
or floating point intervals".

And on pgFoundry there is a seg spin-off "postbio", tailored to genomic data (with gist-indexing).
(see postbio manual: http://postbio.projects.postgresql.org/ )

* Does it follow SQL spec, or the community-agreed behavior?

I don't know - I couldn't find much in the SQL-spec on a range datatype.

The ranges behaviour has been discussed on -hackers.

* Does it include pg_dump support (if applicable)?

dump/restore were fine in the handful of range-tables
which I moved between machines.

* Are there dangers?

Not that I could find.

* Have all the bases been covered?

I think the functionality looks fairly complete.

-------------
Feature test:
-------------

* Does the feature work as advertised?

The patch seems very stable. My focus has been mainly on the intranges. I tested by parsing
documentation- and regression examples, and parametrising them in a perl harness, to generate many
thousands of range combinations. I found only a single small problem (posted earlier - Jeff Davis
solved it already apparently).

see: http://archives.postgresql.org/pgsql-hackers/2011-02/msg00387.php

* Are there corner cases the author has failed to consider?

No.

* Are there any assertion failures or crashes?

I haven't seen a single one.

--------------
Documentation:
--------------

Section 9.18:
table 9-42. range functions:
The following functions are missing (I encountered them in the regression tests):
contained_by()
range_eq()

section 'Constructing Ranges' (8.16.6):
In the code example, remove the following line:
"-- the int4range result will appear in the canonical format"
it doesn't make sense there. At this place "canonical format" has not been discussed;
maybe it is not even discussed anywhere.

also (same place):
'where "_" is used to mean "exclusive" and "" is used to mean "inclusive".'
should be:
'where "_" is used to mean "exclusive" and "i" is used to mean "inclusive".'

And btw: it should mention here that the range*inf* functions,
an underscore to separate 'range' from the rest of the function name, e.g.:
range_linfi_() => infinite lower bound, inclusive upper bound

I still want to do Performance review and Coding review.

FWIW, I would like to repeat that my impression is that the patch is very stable, especially with
regard to the intranges (tested extensively).

regards,

Erik Rijkers

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2011-02-08 19:48:53 Re: MVCC doc typo fix
Previous Message Magnus Hagander 2011-02-08 19:34:15 Re: Sync Rep for 2011CF1