Wiki editor request

From: "David M(dot) Kaplan" <dmkaplan2000(at)gmail(dot)com>
To: pgsql-www(at)postgresql(dot)org
Subject: Wiki editor request
Date: 2018-04-27 10:19:26
Message-ID: 9b106c7b-4d27-4211-401e-26c6a7b4d4fb@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-www

Hi,

I noticed something that I do not think is correct on the PostgreSQL
wiki and would like to edit it, but don't have the privileges. Could you
either edit it for me or give me the privileges? My account is based on
my gmail dmkaplan2000(at)gmail(dot)com

The think that I noticed that seems incorrect to me has to do with the
histogram aggregate code snippet here
https://wiki.postgresql.org/wiki/Aggregate_Histogram. I believed that
the author misinterpreted the fact that width_bucket uses nbuckets+1
buckets (actually nbuckets+2 buckets) as not appropriate and tries to
correct for that, but I don't think current code does what it is
supposed to. width_bucket returns values between 0 and nbuckets+1 where
a value of 0 means val<MIN and a value of nbuckets+1 means val>=MAX.
These first and last buckets are actually useful information that should
be kept, and the existing code puts into a bucket -1 all values less
than the MIN and bucket=nbuckets all values that exceed the MAX, which
is incorrect.

I believe the correct code should be:

CREATE OR REPLACE FUNCTION hist_sfunc (state INTEGER[], val DOUBLE PRECISION,
       MIN DOUBLE PRECISION, MAX DOUBLE PRECISION, nbuckets INTEGER) RETURNS INTEGER[] AS $$
DECLARE
  bucket INTEGER;
  i INTEGER;
BEGIN
  -- This will put values in buckets with a 0 bucket for <MIN and a (nbuckets+1) bucket for >=MAX
  bucket := width_bucket(val, MIN, MAX, nbuckets);

  -- Init the array with the correct number of 0's so the caller doesn't see NULLs
  IF state[0] IS NULL THEN
    FOR i IN SELECT * FROM generate_series(0, nbuckets + 1) LOOP
      state[i] := 0;
    END LOOP;
  END IF;

  state[bucket] = state[bucket] + 1;
  RETURN state;
END;
$$ LANGUAGE plpgsql IMMUTABLE;

-- Tell Postgres how to use the new function
DROP AGGREGATE IF EXISTS histogram (DOUBLE PRECISION, DOUBLE PRECISION, DOUBLE PRECISION, INTEGER);
CREATE AGGREGATE histogram (DOUBLE PRECISION, DOUBLE PRECISION, DOUBLE PRECISION, INTEGER) (
       SFUNC = hist_sfunc,
       STYPE = INTEGER[]
);

To confirm that this is correct, please compare the original histogram
aggregate with my version using the following query:

WITH a AS (
SELECT generate_series(-2,5,0.5) AS i
)
SELECT array_agg(i) AS values,
       histogram(i,0,3,3) AS counts,
(histogram(i,0,3,3))[1:3] AS counts_in_limits
FROM a;

Thanks,
David

Responses

Browse pgsql-www by date

  From Date Subject
Next Message Teodor Sigaev 2018-04-28 16:23:11 Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)
Previous Message Erikjan Rijkers 2018-04-27 09:35:44 Re: New archives layout is not an improvement