Re: Does people favor to have matrix data type?

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Does people favor to have matrix data type?
Date: 2016-05-28 14:12:24
Message-ID: 9A28C8860F777E439AA12E8AEA7694F8011F6486@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 5/25/16 7:46 AM, Kouhei Kaigai wrote:
> > My only concern is that domain type is not allowed to define type cast.
> > If we could add type cast on domain, we can define type transformation from
> > other array type to matrix.
>
> I've actually wished for that in the past, as well as casting to
> compound types. Having those would make it easier to mock up a real data
> type for experimentation.
>
> I strongly encourage you to talk to the MADlib community about
> first-class matrix support. They currently emulate matrices via arrays.
>
A MADLib folks contacted me in the past; when I initially made PG-Strom
several years before, however, no communication with them after that.

https://madlib.incubator.apache.org/docs/v1.8/group__grp__arraysmatrix.html
According to the documentation, their approach is different from what
I'd like to implement. They use a table as if a matrix. Because of its
data format, we have to adjust position of the data element to fit
requirement by high performance matrix libraries (like cuBLAS).

> I don't know offhand if they support NULLs in their regular matrices.
> They also support a sparsematrix "type" that is actually implemented as
> a table that contains coordinates and a value for each value in the
> matrix. Having that as a type might also be interesting (if you're
> sparse enough, that will be cheaper than the current NULL bitmap
> implementation).
>
Sparse matrix! It is a disadvantaged area for the current array format.

I have two ideas. HPC folks often split a large matrix into multiple
grid. A grid is typically up to 1024x1024 matrix, for example.
If a grid is consists of all zero elements, it is obvious we don't need
to have individual elements on the grid.
One other idea is compression. If most of matrix is zero, it is an ideal
data for compression, and it is easy to reconstruct only when calculation.

> Related to this, Tom has mentioned in the past that perhaps we should
> support abstract use of the [] construct. Currently point finds a way to
> make use of [], but I think that's actually coded into the grammar.
>
Yep, if we consider 2D-array is matrix, no special enhancement is needed
to use []. However, I'm inclined to have own data structure for matrix
to present the sparse matrix.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-05-28 15:32:21 Re: Parallel pg_dump's error reporting doesn't work worth squat
Previous Message Tatsuo Ishii 2016-05-28 13:04:42 Statement timeout