Re: Does people favor to have matrix data type?

From: "ktm(at)rice(dot)edu" <ktm(at)rice(dot)edu>
To: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Does people favor to have matrix data type?
Date: 2016-05-25 13:22:43
Message-ID: 20160525132243.GD32767@aart.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 25, 2016 at 09:10:02AM +0000, Kouhei Kaigai wrote:
> > -----Original Message-----
> > From: Simon Riggs [mailto:simon(at)2ndQuadrant(dot)com]
> > Sent: Wednesday, May 25, 2016 4:39 PM
> > To: Kaigai Kouhei(海外 浩平)
> > Cc: pgsql-hackers(at)postgresql(dot)org
> > Subject: Re: [HACKERS] Does people favor to have matrix data type?
> >
> > On 25 May 2016 at 03:52, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> wrote:
> >
> >
> > In a few days, I'm working for a data type that represents matrix in
> > mathematical area. Does people favor to have this data type in the core,
> > not only my extension?
> >
> >
> > If we understood the use case, it might help understand whether to include it or not.
> >
> > Multi-dimensionality of arrays isn't always useful, so this could be good.
> >
> As you may expect, the reason why I've worked for matrix data type is one of
> the groundwork for GPU acceleration, but not limited to.
>
> What I tried to do is in-database calculation of some analytic algorithm; not
> exporting entire dataset to client side.
> My first target is k-means clustering; often used to data mining.
> When we categorize N-items which have M-attributes into k-clusters, the master
> data can be shown in NxM matrix; that is equivalent to N vectors in M-dimension.
> The cluster centroid is also located inside of the M-dimension space, so it
> can be shown in kxM matrix; that is equivalent to k vectors in M-dimension.
> The k-means algorithm requires to calculate the distance to any cluster centroid
> for each items, thus, it produces Nxk matrix; that is usually called as distance
> matrix. Next, it updates the cluster centroid using the distance matrix, then
> repeat the entire process until convergence.
>
> The heart of workload is calculation of distance matrix. When I tried to write
> k-means algorithm using SQL + R, its performance was not sufficient (poor).
> https://github.com/kaigai/toybox/blob/master/Rstat/pgsql-kmeans.r
>
> If we would have native functions we can use instead of the complicated SQL
> expression, it will make sense for people who tries in-database analytics.
>
> Also, fortunately, PostgreSQL's 2-D array format is binary compatible to BLAS
> library's requirement. It will allow GPU to process large matrix in HPC grade
> performance.
>
> Thanks,
> --
> NEC Business Creation Division / PG-Strom Project
> KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

Hi,

Have you looked at Perl Data Language under pl/perl? It has pretty nice support
for matrix calculations:

http://pdl.perl.org

Regards,
Ken

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2016-05-25 13:28:47 Re: Inheritance
Previous Message Kouhei Kaigai 2016-05-25 12:46:08 Re: Does people favor to have matrix data type?