Re: range_agg

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Paul A Jungwirth <pj(at)illuminatedcomputing(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>
Subject: Re: range_agg
Date: 2020-12-08 00:00:10
Message-ID: 20201208000010.GA2786@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020-Dec-08, Alexander Korotkov wrote:

> I also found a problem in multirange types naming logic. Consider the
> following example.
>
> create type a_multirange AS (x float, y float);
> create type a as range(subtype=text, collation="C");
> create table tbl (x __a_multirange);
> drop type a_multirange;
>
> If you dump this database, the dump couldn't be restored. The
> multirange type is named __a_multirange, because the type named
> a_multirange already exists. However, it might appear that
> a_multirange type is already deleted. When the dump is restored, a
> multirange type is named a_multirange, and the corresponding table
> fails to be created. The same thing doesn't happen with arrays,
> because arrays are not referenced in dumps by their internal names.
>
> I think we probably should add an option to specify multirange type
> names while creating a range type. Then dump can contain exact type
> names used in the database, and restore wouldn't have a names
> collision.

Hmm, good point. I agree that a dump must preserve the name, since once
created it is user-visible. I had not noticed this problem, but it's
obvious in retrospect.

> In general, I wonder if we can make the binary format of multiranges
> more efficient. It seems that every function involving multiranges
> from multirange_deserialize(). I think we can make functions like
> multirange_contains_elem() much more efficient. Multirange is
> basically an array of ranges. So we can pack it as follows.
> 1. Typeid and rangecount
> 2. Tightly packed array of flags (1-byte for each range)
> 3. Array of indexes of boundaries (4-byte for each range). Or even
> better we can combine offsets and lengths to be compression-friendly
> like jsonb JEntry's do.
> 4. Boundary values
> Using this format, we can implement multirange_contains_elem(),
> multirange_contains_range() without deserialization and using binary
> search. That would be much more efficient. What do you think?

I also agree. I spent some time staring at the I/O code a couple of
months back but was unable to focus on it for long enough. I don't know
JEntry's format, but I do remember that the storage format for JSONB was
widely discussed back then; it seems wise to apply similar logic or at
least similar reasoning.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Nancarrow 2020-12-08 00:17:33 Re: On login trigger: take three
Previous Message Alexander Korotkov 2020-12-07 23:45:57 Re: range_agg