Re: PGSQL 9.3 - billion rows

From: Sékine Coulibaly <scoulibaly(at)gmail(dot)com>
To: Nicolas Paris <niparisco(at)gmail(dot)com>
Cc: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: PGSQL 9.3 - billion rows
Date: 2014-07-07 19:40:49
Message-ID: CAD8n-Fp6+ZxgExEYimEhXdhK_eu3a0m-GzuqXmnaNGzJ_3bfhQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

PostgreSQL will definitely be able to handle it.

However, besides the schema, an important parameter is the kind of request
you will be submitting to PostgreSQL. Reporting queries, low-latency
queries ?

You may found PostgreSQL weak if you mainly submit analytical queries (eg.
SELECT count(1) from observation_fact might need a long time to complete).
However, based on the indexes you showed, a standard "SELECT * FROM
observation_fact WHERE" will most likely show decent performances.

Do you think the active set will fit your RAM ? If not, it might be
interesting to increase memory.

AFAIK, vanilla PostgreSQL can not scale horizontally (yet), and each query
is not multithreaded (yet). Hence, you could have a look at Postgres-XC or
Postgres-XL.

Sekine

2014-07-07 15:59 GMT+02:00 Nicolas Paris <niparisco(at)gmail(dot)com>:

> ​​
> ​​
> ​​
> ​​
> ​Hello,
>
> I have a fact table ( table and indexes are bellow ) that will probably
> get arround 2 billion rows.
>
> - Can postgresql support such table (this table is the fact table of a
> datamart -> many join query with dimensions tables) ?
> - If yes, I would like to test (say insert 2 billion test rows), what
> serveur configuration do I need ? How much RAM ?
> - If not, would it be better to think about a cluster or other ?
> - (Have you any idea to optimize this table ?)
>
> Thanks a lot !
>
>
> CREATE TABLE observation_fact
> (
> encounter_num integer NOT NULL,
> patient_num integer NOT NULL,
> concept_cd character varying(50) NOT NULL,
> provider_id character varying(50) NOT NULL,
> start_date timestamp without time zone NOT NULL,
> modifier_cd character varying(100) NOT NULL DEFAULT '@'::character
> varying,
> instance_num integer NOT NULL DEFAULT 1,
> valtype_cd character varying(50),
> tval_char character varying(255),
> nval_num numeric(18,5),
> valueflag_cd character varying(50),
> quantity_num numeric(18,5),
> units_cd character varying(50),
> end_date timestamp without time zone,
> location_cd character varying(50),
> observation_blob text,
> confidence_num numeric(18,5),
> update_date timestamp without time zone,
> download_date timestamp without time zone,
> import_date timestamp without time zone,
> sourcesystem_cd character varying(50),
> upload_id integer,
> text_search_index serial NOT NULL,
> CONSTRAINT observation_fact_pk PRIMARY KEY (patient_num, concept_cd,
> modifier_cd, start_date, encounter_num, instance_num, provider_id)
> )
> WITH (
> OIDS=FALSE
> );
>
>
> CREATE INDEX of_idx_allobservation_fact
> ON i2b2databeta.observation_fact
> USING btree
> (patient_num, encounter_num, concept_cd COLLATE pg_catalog."default",
> start_date, provider_id COLLATE pg_catalog."default", modifier_cd COLLATE
> pg_catalog."default", instance_num, valtype_cd COLLATE
> pg_catalog."default", tval_char COLLATE pg_catalog."default", nval_num,
> valueflag_cd COLLATE pg_catalog."default", quantity_num, units_cd COLLATE
> pg_catalog."default", end_date, location_cd COLLATE pg_catalog."default",
> confidence_num);
>
>
> CREATE INDEX of_idx_clusteredconcept
> ON i2b2databeta.observation_fact
> USING btree
> (concept_cd COLLATE pg_catalog."default");
>
>
> CREATE INDEX of_idx_encounter_patient
> ON i2b2databeta.observation_fact
> USING btree
> (encounter_num, patient_num, instance_num);
>
>
> CREATE INDEX of_idx_modifier
> ON i2b2databeta.observation_fact
> USING btree
> (modifier_cd COLLATE pg_catalog."default");
>
> CREATE INDEX of_idx_sourcesystem_cd
> ON i2b2databeta.observation_fact
> USING btree
> (sourcesystem_cd COLLATE pg_catalog."default");
>
>
> CREATE INDEX of_idx_start_date
> ON i2b2databeta.observation_fact
> USING btree
> (start_date, patient_num);
>
>
> CREATE INDEX of_idx_uploadid
> ON i2b2databeta.observation_fact
> USING btree
> (upload_id);
>
>
> CREATE UNIQUE INDEX of_text_search_unique
> ON i2b2databeta.observation_fact
> USING btree
> (text_search_index);
> ​
>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Steve Crawford 2014-07-08 16:28:47 Re: DB sessions 100 times of DB connections
Previous Message piuschan 2014-07-07 18:39:20 Re: stored procedure suddenly runs slowly in HOT STANDBY but fast in primary