Shorter archive URLs

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: PostgreSQL WWW <pgsql-www(at)lists(dot)postgresql(dot)org>
Subject: Shorter archive URLs
Date: 2019-07-14 10:52:46
Message-ID: CABUevExPMOGVMjyMyUO3oUnAGR+BWwSTvJL0VXqL2YLZ61g1OQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-www

We've been discussing shorting the archives URLs for some time, but never
got around to actually doing it.

I've now got a preliminary patch going, that basically replaces the
message-id portion of the URL with a sha1 hash (base64 encoded) of the
message-id.

This brings the messageid part down to 28 characters. Looking at the
current archives, that will make 92% of the URLs shorter and 6% longer than
they are today. Looking at just 2018 and 2019, the number is 99% shorter
and 0.5% longer (only 429 messages in total gets longer).

(The average messageid length has consistently increased from 39 characters
in 1996, through 41 in 2006 and 51 in 2016 and on -- probably much thanks
to gmail)

This means that instead of being:
https://www.postgresql.org/message-id/CABUevEyqGVV-s1yXQBsTpoPDCHy79j-yDtJcucrPb9Hh4CFTNg%40mail.gmail.com

The url would be:
https://www.postgresql.org/message-id/Z0oaTfo56bV4tke6-r_PKJstHF8=

We could of course also use the internal surrogate key that each message
has and make it even shorter, but if we expose that then it becomes a lot
harder to change the underlying representation. (we'd need big annoying
mapping tables like we still carry from the old implementation). Using the
sha hash means we can still generate the URL just from the contents of the
message itself -- which also means that the URLs can be generated offline
for those that prefer to.

Any access to the old message-id based URLs would automatically receive a
permanent redirect to the new ones.

The postgr.es redirector would keep working, and accept both old and new
style URLs (the old ones would just cause a double redirect step).

I think the only thing we'd really lose is the ability to "determine from
the url who sent an email", which in itself only works for some people, but
definitely does work for some who send a lot of mail (hi, Tom!).

Thoughts on this? Do people think that's a big enough win to go for?

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

Responses

Browse pgsql-www by date

  From Date Subject
Next Message Tom Lane 2019-07-14 16:35:26 Re: Shorter archive URLs
Previous Message Magnus Hagander 2019-07-11 16:43:00 Re: Namespace projects.postgresql.org