Re: [PATCH] SortSupport for macaddr type

From: Brandur Leach <brandur(at)mutelight(dot)org>
To: Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] SortSupport for macaddr type
Date: 2017-02-05 21:56:41
Message-ID: CABR_9B_9HsZQ+QUb1_nR4WS_BHGcds7eV4HBJCRPLZ-r0WJM3A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Julien,

Thank you for taking the time to do this review, and my
apologies for the very delayed response. I lost track of
this work and have only jumped back into it today.

Please find attached a new version of the patch with your
feedback integrated. I've also rebased the patch against
the current master and selected a new OID because my old
one is now in use.

> * you used macaddr_cmp_internal() function name, for uuid
> the same function is named uuid_internal_cmp(). Using
> the same naming pattern is probably better.

I was a little split on this one! It's true that UUID uses
`_internal_cmp`, but `_cmp_internal` is also used in a
number of places like `enum`, `timetz`, and `network`. I
don't have a strong feeling about it either way, so I've
changed it to `_internal_cmp` to match UUID as you
suggested.

> * the function comment on macaddr_abbrev_convert()
> doesn't mention specific little-endian handling

I tried to bake this into the comment text. Here are the
relevant lines of the amended version:

* Packs the bytes of a 6-byte MAC address into a Datum and treats it as
an
* unsigned integer for purposes of comparison. On a 64-bit machine,
there
* will be two zeroed bytes of padding. The integer is converted to
native
* endianness to facilitate easy comparison.

> * "There will be two bytes of zero padding on the least
> significant end"
>
> "least significant bits" would be better

Also done. Here is the amended version:

* On a 64-bit machine, zero out the 8-byte datum and copy the 6 bytes of
* the MAC address in. There will be two bytes of zero padding on the end
* of the least significant bits.

> * This patch will trigger quite a lot modifications after
> a pgindent run. Could you try to run pgindent on mac.c
> before sending an updated patch?

Good call! I've run the new version through pgindent.

Let me know if you have any further feedback and/or
suggestions. Thanks!

Brandur

On Wed, Sep 14, 2016 at 3:14 AM, Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>
wrote:

> On 26/08/2016 19:44, Brandur wrote:
> > Hello,
> >
>
> Hello,
>
> > I've attached a patch to add SortSupport for Postgres' macaddr which has
> the
> > effect of improving the performance of sorting operations for the type.
> The
> > strategy that I employ is very similar to that for UUID, which is to
> create
> > abbreviated keys by packing as many bytes from the MAC address as
> possible into
> > Datums, and then performing fast unsigned integer comparisons while
> sorting.
> >
> > I ran some informal local benchmarks, and for cardinality greater than
> 100k
> > rows, I see a speed up on `CREATE INDEX` of roughly 25% to 65%. (For
> those
> > interested, I put a few more numbers into a small report here [2].)
> >
>
> That's a nice improvement!
>
> > Admittedly, this is not quite as useful as speeding up sorting on a more
> common
> > data type like TEXT or UUID, but the change still seems like a useful
> > performance improvement. I largely wrote it as an exercise to familiarize
> > myself with the Postgres codebase.
> >
> > I'll add an entry into the current commitfest as suggested by the
> Postgres Wiki
> > and follow up here with a link.
> >
> > Thanks, and if anyone has feedback or other thoughts, let me know!
> >
>
> I just reviewed your patch. It applies and compiles cleanly, and the
> abbrev feature works as intended. There's not much to say since this is
> heavily inspired on the uuid SortSupport. The only really specific part
> is in the abbrev_converter function, and I don't see any issue with it.
>
> I have a few trivial comments:
>
> * you used macaddr_cmp_internal() function name, for uuid the same
> function is named uuid_internal_cmp(). Using the same naming pattern is
> probably better.
>
> * the function comment on macaddr_abbrev_convert() doesn't mention
> specific little-endian handling
>
> * "There will be two bytes of zero padding on the least significant end"
>
> "least significant bits" would be better
>
> * This patch will trigger quite a lot modifications after a pgindent
> run. Could you try to run pgindent on mac.c before sending an updated
> patch?
>
> Best regards.
>
> --
> Julien Rouhaud
> http://dalibo.com - http://dalibo.org
>

Attachment Content-Type Size
0002-Implement-SortSupport-for-macaddr-data-type.patch application/octet-stream 11.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2017-02-05 22:08:05 Re: 3D Z-curve spatial index
Previous Message Tom Lane 2017-02-05 20:14:59 Re: Index corruption with CREATE INDEX CONCURRENTLY