From: | Alexander Borisov <lex(dot)borisov(at)gmail(dot)com> |
---|---|
To: | Victor Yegorov <vyegorov(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Proposal to add a new URL data type. |
Date: | 2024-12-11 16:04:40 |
Message-ID: | 70b01d6b-d6d4-43bd-977a-8a51cf5a0b8a@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
10.12.2024 13:59, Victor Yegorov пишет:
> чт, 5 дек. 2024 г. в 17:02, Alexander Borisov <lex(dot)borisov(at)gmail(dot)com
> <mailto:lex(dot)borisov(at)gmail(dot)com>>:
>
[..]
>
> Hey, I had a look at this patch and found its functionality mature and
> performant.
>
> As Peter mentioned pguri, I used it to compare with the proposed
> extension. This brought up
> the following differences:
> - pguri (uriparser 0.9.8) doesn't support Chinese symbols in the host
> part of URI (uri_test1.sh):
>
> ERROR: invalid input syntax for type uri at or near "事
> 例.com#comments <http://xn--3kq3x.com#comments>"
>
> Therefore, I avoided Chinese or Cyrillic symbols in the pguri test
> script.
> - There are no SET functions in the pguri, changing specific portions of
> URI is troublesome. I used
> replace() in the test, but this is an error prone approach.
> - It's even more troublesome to set parts of the URI that are not
> initially there. Probably, a full decomposition
> into parts and the following wrap up is needed
> Suggested extension has no such limitations. Additionally, pguri
> extracts userinfo as a whole,
> while suggested extension can get/set user and password individually.
>
>
> Running tests (attached) I got the following numbers:
> $ ./url_test.sh
> NOTICE: extension "url" already exists, skipping
> tps = 13068.287423 (without initial connection time)
> tps = 12888.937747 (without initial connection time)
> tps = 12830.642558 (without initial connection time)
> tps = 12846.341411 (without initial connection time)
> tps = 13187.955601 (without initial connection time)
>
> $ ./uri_test2.sh
> NOTICE: extension "uri" already exists, skipping
> tps = 2441.934308 (without initial connection time)
> tps = 2513.277660 (without initial connection time)
> tps = 2484.641673 (without initial connection time)
> tps = 2519.312395 (without initial connection time)
> tps = 2512.364492 (without initial connection time)
>
> So it's 12.9k vs 2.5k, or 6x faster for a case where we replace 5 parts
> of the original URL.
>
> Given its performance and functionality, I find the suggested URL
> extension better than pguri.
>
Thanks for the constructive comments and the testing you have done.
>
> Now to the review part.
>
> 1. Applying patch causes indentation warning, please, bring spacing to
> the project's policy
>
> $ git apply ~/0001-Add-url-data-type-to-contrib.patch
> /home/vyegorov/0001-Add-url-data-type-to-contrib.patch:837: indent with
> spaces.
> return lexbor_calloc(1, sizeof(lexbor_array_t));
> /home/vyegorov/0001-Add-url-data-type-to-contrib.patch:843: indent with
> spaces.
> if (array == NULL) {
> /home/vyegorov/0001-Add-url-data-type-to-contrib.patch:844: indent with
> spaces.
> return LXB_STATUS_ERROR_OBJECT_IS_NULL;
> /home/vyegorov/0001-Add-url-data-type-to-contrib.patch:845: indent with
> spaces.
> }
> /home/vyegorov/0001-Add-url-data-type-to-contrib.patch:847: indent with
> spaces.
> if (size == 0) {
> warning: squelched 6350 whitespace errors
> warning: 6355 lines add whitespace errors.
This will be fixed when the main URL parser code is rewritten to fit
the Postgres style/infrastructure.
> 2. There's a lexbor/ library that contains core and url parts. Feels
> like some commentary about what's
> inside is required.
Yeah, that's a good point.
> 3. Do you think it's possible to adjust your code to use existing
> postgres infrastructure instead? I don't
> think having its own infrastructure for a single extension is a good
> thing. Also, this might be a source
> for performance improvements in the core.
To implement (rewrite/modify) a URL parser using exclusively Postgres
infrastructure is one of the main goals.
>
> 4. There's no user visible documentation, please, add one.
That's a good point.
> I've created a commitfest entry for the patch: https://
> commitfest.postgresql.org/51/5432/ <https://
> commitfest.postgresql.org/51/5432/>
> I was not able to find you, please, register a community account and set
> yourself as an author for the patch.
Done.
>
> --
> Victor Yegorov
--
Alexander Borisov
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2024-12-11 16:34:07 | Re: FileFallocate misbehaving on XFS |
Previous Message | Tom Lane | 2024-12-11 15:38:03 | Re: Suggestion to standardize comment format in pg_dump |