Quick Links

Re: Query to return normalized floats

From:	Kip Warner <kip(at)thevertigo(dot)com>
To:	Andreas Kretschmer <akretschmer(at)spamfence(dot)net>
Cc:	pgsql-novice(at)postgresql(dot)org
Subject:	Re: Query to return normalized floats
Date:	2016-04-12 00:17:00
Message-ID:	1460420220.13062.32.camel@thevertigo.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-novice

On Sun, 2016-02-28 at 09:21 +0100, Andreas Kretschmer wrote:
> Kip Warner <kip(at)thevertigo(dot)com> wrote:
>
> > The some_value_* fields contain floating point data and are at
> > least a
> > dozen in number.
>
> maybe a wrong table design, but that's another question.

Hey Andreas. I am very sorry for taking so long to acknowledge and
respond to your thoughtful answer. The software I am working on had
another aspect of it that sucked me away for the last few weeks before
I could return to the database schema.

> > I would like to be able to perform queries on the table on the
> > normalized versions of these values. By normalized I don't mean in
> > the database nomenclature, but where all some_value_i's are in the
> > range of [0,1].
> >
> > To do this, I must find the min() and max() of each row's
> > some_value_i within the table and divide each some_value_i by the
> > absolute difference of these two values.
> >
> > As an example, if a row contained the lowest some_value_3 of -4.0
> > and the largest row 1.5, then the normalized version of any
> > some_value_3 field is some_value_3 / (1.5 - -4.0) or some_value_3 /
> > 5.5.
> >
> > I am having difficulty expressing this as a query to just list
> > every row in the table for starters (e.g. SELECT * FROM my_table;).
> > I considered creating a VIEW, my_table_normalized, but I'm not sure
> > if that is the appropriate strategy here.
>
> simple example:
>
> test=*# select * from bla;
> id | col1 | col2
> ----+-------------------+--------------------
> 1 | -27.3061781190336 | 9.23637737520039
> 2 | -34.9188138730824 | 4.02728125452995
> 3 | 27.7425193693489 | -1.71850152313709
> 4 | 18.2173402048647 | 1.78571328520775
> 5 | -49.3932147044688 | 3.25902994722128
> 6 | -21.3868645019829 | 0.0395399890840054
> 7 | 48.6888256389648 | -0.219368590041995
> 8 | -26.943267416209 | -2.84633947536349
> 9 | -47.2060812171549 | 1.46993971429765
> 10 | -16.008263733238 | 5.15772333368659
> 11 | 12.651920504868 | 4.91552650928497
> 12 | 38.5760291945189 | 6.94333815015852
> 13 | -47.87487979047 | -4.18941779062152
> 14 | -2.24363747984171 | 0.686697596684098
> 15 | -2.6916132774204 | 7.83255377784371
> 16 | -4.97196828946471 | 4.0004417207092
> 17 | 35.3446557652205 | 2.0218435768038
> 18 | -9.86138512380421 | 2.61743502691388
> 19 | -48.4832897316664 | -8.32880898378789
> 20 | -27.9842584393919 | -9.57530088722706
> (20 rows)
>
> test=*# create view my_factor as select max(col1) - min(col1) as c1,
> max(col2) - min(col2) as c2 from bla;
> CREATE VIEW
> test=*# select id, col1 / f.c1, col2 / f.c2 from bla cross join
> my_factor f;
> id | ?column? | ?column?
> ----+---------------------+---------------------
> 1 | -0.278401407876724 | 0.490991672638172
> 2 | -0.356016389451263 | 0.214084102351125
> 3 | 0.282850145370229 | -0.0913529085052159
> 4 | 0.185735738582485 | 0.094925782819407
> 5 | -0.503590815724457 | 0.173245039690613
> 6 | -0.218050770835282 | 0.00210188525087517
> 7 | 0.496409184275543 | -0.0116612982096414
> 8 | -0.274701334942333 | -0.151307046381315
> 9 | -0.481291794622778 | 0.078139743503564
> 10 | -0.163212996764598 | 0.274176671625737
> 11 | 0.128993243417117 | 0.261301859446679
> 12 | 0.393303698204536 | 0.369097219997987
> 13 | -0.48811056155476 | -0.222703032242957
> 14 | -0.0228751101831245 | 0.0365037923307268
> 15 | -0.0274424682438878 | 0.416366560632055
> 16 | -0.0506919337327751 | 0.212657353847013
> 17 | 0.360358080250589 | 0.107478107407462
> 18 | -0.100542210268818 | 0.139138836546109
> 19 | -0.494313633381836 | -0.442746727197808
> 20 | -0.285314807291989 | -0.509008327361828
> (20 rows)
>
> test=*#
>
> is that okay?

Almost. I realized that the normalization is not calculated correctly
and should be calculated as...

norm(x) = (x - min)/(max - min)

This also ensures the values are in the range of [0.0, 1.0].

It would also be nice if the view contained the fully normalized
results. I've started with this...

CREATE VIEW my_view AS
SELECT
id,
(col1 - MIN(col1) / (MAX(col1) - MIN(col1)) AS col1_norm,
(col2 - MIN(col2) / (MAX(col2) - MIN(col2)) AS col2_norm
FROM my_table GROUP_BY id;

The problem is when I try to see what it's populated with via...

SELECT * FROM my_view;

I get an ERROR: division by zero. I can see how that would happen if
MAX(x) - MIN(x) was zero, but I've made sure it wasn't from the sample
rows in my_table. I suspect my_view schema is not declared correctly.

> (you should define aliases for the columns...)

Yes. Hopefully I'm using the AS keyword correctly here.

> create indexes on the columns, so it should use indexes for the
> min/max.

Done.

--
Kip Warner -- Senior Software Engineer
OpenPGP encrypted/signed mail preferred
http://www.thevertigo.com

In response to

Re: Query to return normalized floats at 2016-02-28 08:21:32 from Andreas Kretschmer

Responses

Re: Query to return normalized floats at 2016-04-14 08:21:35 from Kip Warner

Browse pgsql-novice by date

	From	Date	Subject
Next Message	Kip Warner	2016-04-14 08:21:35	Re: Query to return normalized floats
Previous Message	Kevin Struckhoff	2016-04-11 18:24:10	Re: Upgrading from Postgres 9.3 to 9.5