Re: Proposal to add a new URL data type.

From: Alexander Borisov <lex(dot)borisov(at)gmail(dot)com>
To: Victor Yegorov <vyegorov(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal to add a new URL data type.
Date: 2024-12-11 16:04:40
Message-ID: 70b01d6b-d6d4-43bd-977a-8a51cf5a0b8a@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

10.12.2024 13:59, Victor Yegorov пишет:
> чт, 5 дек. 2024 г. в 17:02, Alexander Borisov <lex(dot)borisov(at)gmail(dot)com
> <mailto:lex(dot)borisov(at)gmail(dot)com>>:
>

[..]

>
> Hey, I had a look at this patch and found its functionality mature and
> performant.
>
> As Peter mentioned pguri, I used it to compare with the proposed
> extension. This brought up
> the following differences:
> - pguri (uriparser 0.9.8) doesn't support Chinese symbols in the host
> part of URI (uri_test1.sh):
>
>     ERROR:  invalid input syntax for type uri at or near "事
> 例.com#comments <http://xn--3kq3x.com#comments>"
>
>   Therefore, I avoided Chinese or Cyrillic symbols in the pguri test
> script.
> - There are no SET functions in the pguri, changing specific portions of
> URI is troublesome. I used
>   replace() in the test, but this is an error prone approach.
> - It's even more troublesome to set parts of the URI that are not
> initially there. Probably, a full decomposition
>   into parts and the following wrap up is needed
> Suggested extension has no such limitations. Additionally, pguri
> extracts userinfo as a whole,
> while suggested extension can get/set user and password individually.
>
>
> Running tests (attached) I got the following numbers:
> $ ./url_test.sh
> NOTICE:  extension "url" already exists, skipping
> tps = 13068.287423 (without initial connection time)
> tps = 12888.937747 (without initial connection time)
> tps = 12830.642558 (without initial connection time)
> tps = 12846.341411 (without initial connection time)
> tps = 13187.955601 (without initial connection time)
>
> $ ./uri_test2.sh
> NOTICE:  extension "uri" already exists, skipping
> tps = 2441.934308 (without initial connection time)
> tps = 2513.277660 (without initial connection time)
> tps = 2484.641673 (without initial connection time)
> tps = 2519.312395 (without initial connection time)
> tps = 2512.364492 (without initial connection time)
>
> So it's 12.9k vs 2.5k, or 6x faster for a case where we replace 5 parts
> of the original URL.
>
> Given its performance and functionality, I find the suggested URL
> extension better than pguri.
>

Thanks for the constructive comments and the testing you have done.

>
> Now to the review part.
>
> 1. Applying patch causes indentation warning, please, bring spacing to
> the project's policy
>
> $ git apply ~/0001-Add-url-data-type-to-contrib.patch
> /home/vyegorov/0001-Add-url-data-type-to-contrib.patch:837: indent with
> spaces.
>     return lexbor_calloc(1, sizeof(lexbor_array_t));
> /home/vyegorov/0001-Add-url-data-type-to-contrib.patch:843: indent with
> spaces.
>     if (array == NULL) {
> /home/vyegorov/0001-Add-url-data-type-to-contrib.patch:844: indent with
> spaces.
>         return LXB_STATUS_ERROR_OBJECT_IS_NULL;
> /home/vyegorov/0001-Add-url-data-type-to-contrib.patch:845: indent with
> spaces.
>     }
> /home/vyegorov/0001-Add-url-data-type-to-contrib.patch:847: indent with
> spaces.
>     if (size == 0) {
> warning: squelched 6350 whitespace errors
> warning: 6355 lines add whitespace errors.

This will be fixed when the main URL parser code is rewritten to fit
the Postgres style/infrastructure.

> 2. There's a lexbor/ library that contains core and url parts. Feels
> like some commentary about what's
>   inside is required.

Yeah, that's a good point.

> 3. Do you think it's possible to adjust your code to use existing
> postgres infrastructure instead? I don't
>   think having its own infrastructure for a single extension is a good
> thing. Also, this might be a source
>   for performance improvements in the core.

To implement (rewrite/modify) a URL parser using exclusively Postgres
infrastructure is one of the main goals.

>
> 4. There's no user visible documentation, please, add one.

That's a good point.

> I've created a commitfest entry for the patch: https://
> commitfest.postgresql.org/51/5432/ <https://
> commitfest.postgresql.org/51/5432/>
> I was not able to find you, please, register a community account and set
> yourself as an author for the patch.

Done.

>
> --
> Victor Yegorov

--
Alexander Borisov

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2024-12-11 16:34:07 Re: FileFallocate misbehaving on XFS
Previous Message Tom Lane 2024-12-11 15:38:03 Re: Suggestion to standardize comment format in pg_dump