MAP syntax for arrays

From: Ildar Musin <i(dot)musin(at)postgrespro(dot)ru>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: MAP syntax for arrays
Date: 2018-05-04 13:08:03
Message-ID: f86e5d8b-20a7-36ce-acd6-ac58b290c8f0@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello hackers,

Recently I was working with sql arrays in postgres and it turned out
that postgres doesn't have such very convinient functional constructions
as map, reduce and filter. Currently to map function over array user has
to make a subquery like:

select u.* from
my_table,
lateral (
select array_agg(lower(elem))
from unnest(arr) as elem
) as u;

Which is not only inconvenient but not very efficient as well (see
'Demo' section below).

When I dug into the code I found that postgres already has the needed
infrastructure for implementing map for arrays; actually array coercing
already works that way (it basically maps cast function).

In the attached patch there is a simple map implementation which
introduces new expression type and syntax:

MAP(<func_name> OVER <array_expression>)

For example:

SELECT MAP(upper OVER array['one', 'two', 'three']::text[]);
?column?
-----------------
{ONE,TWO,THREE}
(1 row)

This is probably not the most useful notation and it would be better to
have syntax for mapping arbitrary expressions over array, not just
function. I'm struggling to come up with a good idea of how it should
look like. It could look something like following:

MAP(<expr> FOR <placeholder> IN <array_expressin>)

For instance:

SELECT MAP(x*2 FOR x IN array[1, 2, 3]::int[]);

Looking forward for community's suggestions!

Demo
----

Here is a small comparison between map and unnest/aggregate ways for
per-element processing of arrays. Given a table with 1K rows which
contains single column of text[] type. Each array contains 5/10/100
elements.

create table my_table (arr text[]);
insert into my_table
select array_agg(md5(random()::text))
from generate_series(1, 1000) as rows,
generate_series(1, 10) as elements
group by rows;

There are two scripts for pgbench. One for 'map' syntax:

select map(upper over arr) from my_table;

And one for unnest/aggregate:

select u.* from my_table,
lateral (
select array_agg(upper(elem))
from unnest(arr) as elem
) as u;

Results are:

elements per array | map (tps) | unnest/aggregate (tps)
--------------------+------------+------------------------
5 | 139.105359 | 74.434010
10 | 74.089743 | 43.622554
100 | 7.693000 | 5.325805

Apparently map is more efficient for small arrays. And as the size of
array increases the difference decreases.

I'll be glad to any input from the community. Thanks!

--
Ildar Musin
i(dot)musin(at)postgrespro(dot)ru

Attachment Content-Type Size
map_v1.patch text/x-diff 18.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2018-05-04 13:08:38 Re: pg_rewind and postgresql.conf
Previous Message Michael Paquier 2018-05-04 13:02:48 Re: pg_rewind and postgresql.conf