Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Mark Dilger" <mark(dot)dilger(at)enterprisedb(dot)com>, "Postgres hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Andreas Karlsson" <andreas(at)proxel(dot)se>, "David Fetter" <david(at)fetter(dot)org>
Subject: Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]
Date: 2021-03-08 18:46:56
Message-ID: fe07eb9b-a90d-4789-9f36-5a71f7c27333@www.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 8, 2021, at 18:30, Tom Lane wrote:
> FWIW, I personally think that returning a start position and a length
> would be the most understandable way to operate.

Very good point. I agree. (And then ranges cannot be used, regardless of canonical form.)

> Yeah: it's hard. The amount of catalog infrastructure needed by a
> composite type is dauntingly large, and genbki.pl doesn't offer any
> support for building composite types that aren't tied to catalogs.
> (I suppose if you don't mind hacking Perl, you could try to refactor
> it to improve that.)

I haven't studied genbki.pl in detail, but seen its name on the list many times,
maybe I should go through it to understand it in detail.

On the topic of Perl.
I've written a lot of Perl code over the years.
Trustly was initially a Perl+PostgreSQL microservice project, with different components
written in Perl run as daemons, communicating with each other over TCP/IP,
via JSON-RPC. We had lots of strange problems difficult to debug.
In the end, we moved all the business logics from Perl into database functions in PostgreSQL,
and all problems went away. The biggest win was the nice UTF-8 support,
which was really awkward in Perl. It's kind of UTF-8, but not really and not always.

Most programming languages/compilers are obsessed
with the concept of "bootstrapping"/"dogfooding".

Thinking of PostgreSQL as a language/compiler, that would mean we should be obsessed with the idea
of implementing PostgreSQL in SQL or PL/pgSQL. That would be quite a challenge of course.

However, for certain tasks, when a high-level language is preferred,
and when the raw performance of C isn't necessary, then maybe SQL/PLpgSQL
could be a serious alternative to Perl?

If I understand it correctly, we don't need to run genbki.pl to compile PostgreSQL,
so someone wanting to compile PostgreSQL without having a running PostgreSQL-instance
could do so without problems.

A dependency on having a PostgreSQL instance running,
is perhaps acceptable for hackers developing PostgreSQL?
But of course not for normal users just wanting to compile PostgreSQL.

If we think there is at least a 1% chance this is a feasible idea,
I'm willing to try implementing a SQL/PLpgSQL-version of genbki.pl.
Would be a fun hack, but not if it's guaranteed time-waste.

> It seems like you need it to return setof array(s), so the choices are
> array of composite, 2-D array, or two parallel arrays. I'm not sure
> the first of those is so much better than the others that it's worth
> the pain involved to set up the initial catalog data that way.

I agree, I like the 2-D array version, but only if a we could provide a C-function
to allow unnesting N+1 dims to N dims. Is that a fruitful idea, or are there
reasons why it cannot be done easily? I could give it a try, if we think it's a good idea.

>
> BTW, I don't know if you know the history here, but regexp_matches()
> is way older than regexp_match(); we eventually invented the latter
> because the former was just too hard to use for easy non-'g' cases.
> I'm inclined to think we should learn from that and provide equivalent
> variants regexp_position[s] right off the bat.

I remember! regexp_match() was a very welcomed addition.
I agree both regexp_position[s] variants would be good for same reasons.

/Joel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ibrar Ahmed 2021-03-08 18:55:32 Re: SQL/JSON: functions
Previous Message Bossart, Nathan 2021-03-08 18:38:55 Re: partial heap only tuples