Re: Patch for SortSupport implementation on inet/cdir

From: Brandur Leach <brandur(at)mutelight(dot)org>
To: Peter Geoghegan <pg(at)bowt(dot)ie>, ejrh00(at)gmail(dot)com
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch for SortSupport implementation on inet/cdir
Date: 2019-07-28 20:14:02
Message-ID: CABR_9B8ECZLUdH2wXOwuuiRXxHOSEYq9YYxgk+d=vk1gFH=Pwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks the follow ups on this one Edmund/Peter!

I've attached a new V4 variant of the patch based on
Peter's V3, mostly containing comment amendments and a few
other minor stylistic fixes.

> An interesting thing about sorting IPv4 inets on 64-bit machines is that
when the inets are the same, the abbreviated comparitor will return 0 which
is taken by the sorting machinery to mean "the datums are the same up to
this point, so you need to call the full comparitor" — but, in this case, 0
means "the datums truly are the same, no need to call the full
comparitor". Since the full comparitor assumes its arguments to be the
original (typically pass-by-reference) datums, you can't do it there.
You'd need to add another optional comparitor to be called after the
abbreviated one. In inet's case on a 64-bit machine, it would look at the
abbreviated datums and if they're both in the IPv4 family, would return 0
(knowing that the abbreviated comparitor has already done the real work).
I have no reason to believe this particular optimisation is worth anything
much, though; it's outside the scope of this patch, besides.

Edmund: thanks a lot for the really quick review turn
around, and apologies for not following up sooner!

Agreed that this change is out-of-scope, but it could be an
interesting improvement. You'd have similar potential speed
improvements for other SortSupport data types like uuid,
strings (short ones), and macaddr. Low cardinality data
sets would probably benefit the most.

> When you say "comparison is done" it sounds like more comparing is going
to be done, but what I think you mean is that comparison is finished.

Peter's version of my patch ended up stripping out and/or
changing some of my original comments, so most of them are
fixed by virtue of that. And agreed about the ambiguity in
wording above — I changed "done" to "finished".

> I didn't see any links for [1], [2] and [3] in your email.

Good catch. Here are the original footnote links:

[1] https://github.com/brandur/inet-sortsupport-test
[2]
https://github.com/brandur/inet-sortsupport-test/blob/master/results.md
[3]
https://github.com/brandur/postgres/compare/master...brandur-inet-sortsupport-unit#diff-a28824d1339d3bb74bb0297c60140dd1

> The way that you put a Min() on the subnet size potentially constrains
> the size of the bitmask used for the network component of the
> abbreviated key (the component that comes immediately after the
> ipfamily status bit). Why not just let the bitmask be a bitmask,
> without bringing SIZEOF_DATUM into it? Doing it that way allowed for a
> more streamlined approach, with significantly fewer special cases.

Peter: thanks a lot for the very thorough look and revised
version! Generally agreed that fewer special cases is good,
but I was also trying to make sure that we didn't
compromise the code's understandability by optimizing for
fewer special cases above everything else (especially for
this sort of thing where tricky bit manipulation is
involved).

But I traced through your variant and it looks fine to me
(still looks correct, and readability is still good). I've
pulled most of your changes into V4.

> I'm not sure whether or not your approach had bugs, but I
> didn't like the way you sometimes did a straight "network
> = ipaddr_datum" assignment without masking.

For what it's worth, I believe this did work, even if it
did depend on being within that one branch of code. Agreed
though that avoiding it (as in the new version) is more
hygienic.

> I really liked your diagrams, but much of the text that went with them
> either seemed redundant (it described established rules about how the
> underlying types sort), or seemed to discuss things that were better
> discussed next to the relevant network_abbrev_convert() code.

Thanks! I kept most of your changes, but resurrected some
of my original introductory text. The fine points of the
code's implementation are intricate enough that I think
having some background included is useful to new entrants;
specifically:

1. Norms for naming the different "parts" (network, size,
subnet) of an inet/cidr value aren't broadly
well-established, so defining what they are before
showing diagrams is helpful.

2. Having examples of each part (`1.2.3.0`, `/24`,
`0.0.0.4`) helps mentally cement them.

3. I know that it's in the PG documentation, but the rules
for sorting inet/cidr are not very intuitive. Spending a
few lines re-iterating them so that they don't have to
be cross-referenced elsewhere is worth the space.

Anyway, once again appreciate the extreme attention to
detail on these reviews — this level of rigor would be a
very rare find in projects outside of the Postgres
community!

Brandur

Attachment Content-Type Size
v4-0001-SortSupport-for-inet-cidr-types.patch application/octet-stream 20.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2019-07-28 21:36:22 Re: psql - add SHOW_ALL_RESULTS option
Previous Message Fabien COELHO 2019-07-28 20:02:41 Re: LLVM compile failing in seawasp