Selecting "sample" data from large tables.

From: Joseph Turner <joseph(dot)turner(at)oakleynetworks(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Selecting "sample" data from large tables.
Date: 2004-06-03 17:31:22
Message-ID: 200406031131.24535.joseph.turner@oakleynetworks.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I have a table with a decent number of rows (let's say for example a
billion rows). I am trying to construct a graph that displays the
distribution of that data. However, I don't want to read in the
complete data set (as reading a billion rows would take a while). Can
anyone thing of a way to do this is postgresql? I've been looking
online and most of the stuff I've found has been for other databases.
As far as I can tell ANSI SQL doesn't provide for this scenario.

I could potentially write a function to do this, however I'd prefer
not to. But if that's what I'm going to be stuck doing I'd like to
know earlier then later. Here's the description of the table:

create table score
{
pageId Integer NOT NULL,
ruleId, Integer NOT NULL
score Double precision NULL,
rowAddedDate BigInt NULL,
primary key (pageId, ruleId)
};

I also have an index on row added date, which is just the number of
millis since the epoc (Jan 1, 1970 or so [java style timestamps]).
I'd be willing to accept that the row added date values are random
enough to represent random.

Thanks in advance,

-- Joe T.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFAv2Bqs/P36Z9SDAARAkmLAJ9dDB0sqACgFrxH8NukFUsizXz5zgCgt9IT
/wh3ryz4WQzc5qQY2cAZtVE=
=5dg+
-----END PGP SIGNATURE-----

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message elein 2004-06-03 18:06:57 Re: [SQL] SQL Spec Compliance Questions
Previous Message Bruno Wolff III 2004-06-03 17:31:01 Re: ORDER BY TIMESTAMP_column ASC, NULL first