Re: range_agg

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Paul A Jungwirth <pj(at)illuminatedcomputing(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>
Subject: Re: range_agg
Date: 2020-12-07 23:45:57
Message-ID: CAPpHfdvKrecVTSMmWxNAMveReAwP2taxb-vLOCF--Ne=KwMiow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 30, 2020 at 11:39 PM Alexander Korotkov
<aekorotkov(at)gmail(dot)com> wrote:
> On Mon, Nov 30, 2020 at 10:35 PM Alexander Korotkov
> <aekorotkov(at)gmail(dot)com> wrote:
> > On Sun, Nov 29, 2020 at 11:53 PM Paul A Jungwirth
> > <pj(at)illuminatedcomputing(dot)com> wrote:
> > >
> > > On Sun, Nov 29, 2020 at 11:43 AM Alexander Korotkov
> > > <aekorotkov(at)gmail(dot)com> wrote:
> > > > Thank you. Could you please, update doc/src/sgml/catalogs.sgml,
> > > > because pg_type and pg_range catalogs are updated.
> > >
> > > Attached! :-)
> >
> > You're quick, thank you. Please, also take a look at cfbot failure
> > https://travis-ci.org/github/postgresql-cfbot/postgresql/builds/746623942
> > I've tried to reproduce it, but didn't manage yet.
>
> Got it. type_sanity test fails on any platform, you just need to
> repeat "make check" till it fails.
>
> The failed query checked consistency of range types, but it didn't
> take into account ranges of domains and ranges of records, which are
> exercised by multirangetypes test running in parallel. We could teach
> this query about such kinds of ranges, but I think that would be
> overkill, because we're not going to introduce such builtin ranges
> yet. So, I'm going to just move multirangetypes test into another
> group of parallel tests.

I also found a problem in multirange types naming logic. Consider the
following example.

create type a_multirange AS (x float, y float);
create type a as range(subtype=text, collation="C");
create table tbl (x __a_multirange);
drop type a_multirange;

If you dump this database, the dump couldn't be restored. The
multirange type is named __a_multirange, because the type named
a_multirange already exists. However, it might appear that
a_multirange type is already deleted. When the dump is restored, a
multirange type is named a_multirange, and the corresponding table
fails to be created. The same thing doesn't happen with arrays,
because arrays are not referenced in dumps by their internal names.

I think we probably should add an option to specify multirange type
names while creating a range type. Then dump can contain exact type
names used in the database, and restore wouldn't have a names
collision.

Another thing that worries me is the multirange serialization format.

typedef struct
{
int32 vl_len_; /* varlena header */
char flags; /* range flags */
char _padding; /* Bounds must be aligned */
/* Following the header are zero to two bound values. */
} ShortRangeType;

Comment says this structure doesn't contain a varlena header, while
structure obviously has it.

In general, I wonder if we can make the binary format of multiranges
more efficient. It seems that every function involving multiranges
from multirange_deserialize(). I think we can make functions like
multirange_contains_elem() much more efficient. Multirange is
basically an array of ranges. So we can pack it as follows.
1. Typeid and rangecount
2. Tightly packed array of flags (1-byte for each range)
3. Array of indexes of boundaries (4-byte for each range). Or even
better we can combine offsets and lengths to be compression-friendly
like jsonb JEntry's do.
4. Boundary values
Using this format, we can implement multirange_contains_elem(),
multirange_contains_range() without deserialization and using binary
search. That would be much more efficient. What do you think?

------
Regards,
Alexander Korotkov

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-12-08 00:00:10 Re: range_agg
Previous Message James Coleman 2020-12-07 23:45:50 Re: Consider parallel for lateral subqueries with limit