Re: Calcul de médiane

From: Stephane Bortzmeyer Sébastien DINOT pgsql-fr-generale(at)postgresql(dot)org Re: Calcul de médiane 2007-02-21 11:22:58 20070221112258.GA25559@nic.fr (view raw or whole thread) 2007-02-21 10:12:13 from Sébastien DINOT  2007-02-21 11:22:58 from Stephane Bortzmeyer   2007-02-21 12:59:30 from Stephane Bortzmeyer    2007-02-21 13:05:32 from Emmanuel Seyman     2007-02-21 13:34:33 from Stephane Bortzmeyer      2007-02-21 14:01:27 from Sébastien DINOT      2007-02-21 22:46:29 from Sébastien Dinot       2007-02-21 23:56:54 from Dimitri Fontaine        2007-02-22 00:15:45 from Sébastien Dinot         2007-02-22 00:44:44 from Dimitri Fontaine          2007-02-22 12:05:16 from Sébastien DINOT           2007-02-22 12:10:56 from Sébastien DINOT       2008-06-03 23:10:42 from Sébastien Dinot   2007-02-21 15:52:51 from Stéphane BUNEL    2007-02-21 16:27:13 from Stephane Bortzmeyer    2007-03-07 21:21:15 from Stephane Bortzmeyer pgsql-fr-generale
```On Wed, Feb 21, 2007 at 11:12:13AM +0100,
Sébastien DINOT <sebastien(dot)dinot(at)free(dot)fr> wrote
a message of 79 lines which said:

> Pour mémoire, la médiane est différente de la moyenne :

Oui, et, dans 90 % de cas où on utilise la moyenne, on devrait
utiliser la médiane à la place.

http://www.bortzmeyer.org/mediane-et-moyenne.html

Si on met vos données ainsi :

CREATE TABLE Data (
valeur FLOAT,
occurrences INTEGER);

INSERT INTO Data (valeur, occurrences) VALUES (0, 5);
INSERT INTO Data (valeur, occurrences) VALUES (10, 45);
INSERT INTO Data (valeur, occurrences) VALUES (12, 1);
INSERT INTO Data (valeur, occurrences) VALUES (20, 75);
INSERT INTO Data (valeur, occurrences) VALUES (25, 19);
INSERT INTO Data (valeur, occurrences) VALUES (30, 25);
INSERT INTO Data (valeur, occurrences) VALUES (480, 1);
INSERT INTO Data (valeur, occurrences) VALUES (50, 29);

Et que, pour aider, on crée les fonctions auxiliaires suivantes :

CREATE OR REPLACE FUNCTION occurrences_cumulees(FLOAT) RETURNS INTEGER AS
'SELECT sum(occurrences)::INTEGER FROM Data WHERE Valeur <= \$1;'
LANGUAGE SQL;

CREATE OR REPLACE FUNCTION Moitie_population() RETURNS INTEGER AS
'SELECT sum(occurrences)::INTEGER/2 FROM Data;'
LANGUAGE SQL;

La médiane peut alors se calculer facilement :

SELECT min(valeur) AS median FROM Data
WHERE occurrences_cumulees(valeur) >= Moitie_population();

Testons :

essais=> SELECT valeur, occurrences, occurrences_cumulees(valeur) FROM Data ORDER BY valeur;
valeur | occurrences | occurrences_cumulees
--------+-------------+----------------------
0 |           5 |                    5
10 |          45 |                   50
12 |           1 |                   51
20 |          75 |                  126
25 |          19 |                  145
30 |          25 |                  170
50 |          29 |                  199
480 |           1 |                  200
(8 rows)

essais=> SELECT min(valeur) AS median FROM Data
essais->    WHERE occurrences_cumulees(valeur) >= Moitie_population();
median
--------
20
(1 row)

```

pgsql-fr-generale by date

 Next: From: Stephane Bortzmeyer Date: 2007-02-21 12:59:30 Subject: Re: Calcul de médiane Previous: From: Sébastien DINOT Date: 2007-02-21 10:12:13 Subject: Calcul de médiane