Interesting question for LARGE (> 10 Million Rows).

From: Larry Rosenman <ler(at)lerctr(dot)org>
To: PostgreSQL General List <pgsql-general(at)postgresql(dot)org>
Subject: Interesting question for LARGE (> 10 Million Rows).
Date: 2001-05-30 11:50:59
Message-ID: 20010530065059.A28696@lerami.lerctr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I am loading (for the first time..) a VERY LARGE (to me) data base of
traffic statistics for the ISP I work for.

The schema looks as follows:

--
-- Selected TOC Entries:
--
\connect - neteng
--
-- TOC Entry ID 3 (OID 108735)
--
-- Name: traffic Type: TABLE Owner: neteng
--

CREATE TABLE "traffic" (
"asn" integer,
"protocol" integer,
"pkts_src" bigint,
"pkts_dst" bigint,
"bytes_src" bigint,
"bytes_dst" bigint,
"secs_src" bigint,
"secs_dst" bigint,
"early" timestamp with time zone,
"late" timestamp with time zone
);

--
-- TOC Entry ID 2 (OID 108754)
--
-- Name: protocol_id_seq Type: SEQUENCE Owner: neteng
--

CREATE SEQUENCE "protocol_id_seq" start 1 increment 1 maxvalue 2147483647 minvalue 1 cache 1 ;

--
-- TOC Entry ID 4 (OID 108773)
--
-- Name: protocol Type: TABLE Owner: neteng
--

CREATE TABLE "protocol" (
"id" integer DEFAULT nextval('"protocol_id_seq"'::text) NOT NULL,
"protocol" character varying(256) NOT NULL,
Constraint "protocol_pkey" Primary Key ("id")
);

--
-- TOC Entry ID 10 (OID 7150793)
--
-- Name: "get_protocol_id" (character varying) Type: FUNCTION Owner: neteng
--

CREATE FUNCTION "get_protocol_id" (character varying) RETURNS integer AS 'SELECT id FROM protocol
WHERE protocol = $1;' LANGUAGE 'sql' WITH ( iscachable );

--
-- TOC Entry ID 11 (OID 7150794)
--
-- Name: "get_protocol" (integer) Type: FUNCTION Owner: neteng
--

CREATE FUNCTION "get_protocol" (integer) RETURNS character varying AS 'SELECT protocol FROM protocol
WHERE id = $1;' LANGUAGE 'sql' WITH ( iscachable );

--
-- TOC Entry ID 5 (OID 108735)
--
-- Name: "early_index" Type: INDEX Owner: neteng
--

CREATE INDEX "early_index" on "traffic" using btree ( "early" "timestamp_ops" );

--
-- TOC Entry ID 6 (OID 108735)
--
-- Name: "late_index" Type: INDEX Owner: neteng
--

CREATE INDEX "late_index" on "traffic" using btree ( "late" "timestamp_ops" );

--
-- TOC Entry ID 7 (OID 108735)
--
-- Name: "asn_index" Type: INDEX Owner: neteng
--

CREATE INDEX "asn_index" on "traffic" using btree ( "asn" "int4_ops" );

--
-- TOC Entry ID 8 (OID 108735)
--
-- Name: "protocol_index" Type: INDEX Owner: neteng
--

CREATE INDEX "protocol_index" on "traffic" using btree ( "protocol" "int4_ops" );

--
-- TOC Entry ID 9 (OID 108735)
--
-- Name: "asn_protocol_index" Type: INDEX Owner: neteng
--

CREATE INDEX "asn_protocol_index" on "traffic" using btree ( "asn" "int4_ops", "protocol" "int4_ops" );

I collect 5 minute data from my routers using netflow, and drop
summary records for nntp, smtp, pop3, web, web/ssl, and other into
this table.

My question is what can I do to make date and asn/protocol queries
relatively fast?

What improvements would the guru's out there recommend. This is with
7.1.2 or beter PG. The data is on a NetApp, and the server has 512Meg
real running on FreeBSD. I have *NOT* changed the default sizes of
SHMEM yet.

I have only loaded 2-3 days worth of data into it, and already have
7million rows, and I have data for this entire year.

The type of things I want to do is find who I'm sending lots of data
to (to make peering decisions), and graph what protocols are common
for any given random period of time.

Any suggestions would be appreciated.

Any questions I'll be happy to supply what data I can.

Thanks!

LER

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: ler(at)lerctr(dot)org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749

Browse pgsql-general by date

  From Date Subject
Next Message newsreader 2001-05-30 12:07:11 DBD::Pg 1.00
Previous Message Alexander Solianic 2001-05-30 08:43:00 Current database name