Re: Load a database into memory

From: Guillaume Le'mery <glemery(at)comclick(dot)com>
To: Denis Perchine <dyp(at)perchine(dot)com>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Load a database into memory
Date: 2001-01-26 08:25:47
Message-ID: 3A71348B.9020000@comclick.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


For memory :

>> CREATE TABLE accord_editeur
>> (
>> id_regie int4 not null,
>> num_editeur int4 not null,
>> num_site int4 not null,
>> num_emplacement int4 not null,
>> num_campagne int4 not null,
>> num_publicite int4 not null,
>> num_periode int4,
>> par_id_technologie int4 not null,
>> affichage_possible int4 default 0,
>> ponderation_calculee int4,
>> date_pilotage timestamp NULL,
>> id_ct1 int4,
>> PRIMARY
>> KEY(id_regie,num_editeur,num_site,num_emplacement,num_campagne,num_publicit
>> e) );
>>
>> (I've got a primary key on multiple fields because of the complexity of
>> my database. It's the only way to have unique record.)
>>
>> CREATE TABLE parametre
>> (
>> id_parametre int4 not null primary key,
>> id_regie int4 NULL ,
>> par_id_parametre int4 NULL ,
>> type INT4 not null,
>> valeur_str varchar null,
>> valeur_int int4 null,
>> valeur_fl float8 null,
>> valeur_txt varchar,
>> date_pilotage timestamp NULL,
>> id_ct1 int4
>> );
>> CREATE INDEX parametre_tracking_idx ON parametre(type, valeur_int);
>>
>> The query :
>> SELECT ae.id_regie,
>> ae.num_campagne,
>> ae.num_publicite,
>> ae.ponderation_calculee * random(),
>> FROM accord_editeur ae,
>> parametre par
>> WHERE ae.id_regie=1
>> AND ae.num_editeur = 1494
>> AND ae.num_site = 1
>> AND ae.num_emplacement = 1
>> AND ae.affichage_possible = 1
>> AND ae.par_id_technologie = par.id_parametre
>> AND par.type = 10
>> AND par.valeur_int = 1
>>
>> And the Explain :
>> NOTICE: QUERY PLAN:
>>
>> Nested Loop (cost=0.00..228.27 rows=1 width=56)
>> -> Index Scan using accord_editeur_pkey on accord_editeur ae
>> (cost=0.00..225.50 rows=1 width=48)
>> -> Index Scan using parametre_tracking_idx on parametre par
>> (cost=0.00..2.02 rows=1 width=8)
>>
>> EXPLAIN

> How much tuples do you usually get from this query?
About 100.

> random() can be quite expensive.
Ok, Here is with the change :

Query whitout the random() :
SELECT ae.id_regie,
ae.num_campagne,
ae.num_publicite,
ae.ponderation_calculee,
ae.num_periode
FROM accord_editeur ae,
parametre par
WHERE ae.id_regie=1
AND ae.num_editeur = 1494
AND ae.num_site = 1
AND ae.num_emplacement = 1
AND ae.affichage_possible = 1
AND ae.par_id_technologie = par.id_parametre
AND par.type = 10
AND par.valeur_int = 1

Explain :
NOTICE: QUERY PLAN:

Nested Loop (cost=0.00..228.27 rows=1 width=56)
-> Index Scan using parametre_tracking_idx on parametre par
(cost=0.00..2.02 rows=1 width=8)
-> Index Scan using accord_editeur_pkey on accord_editeur ae
(cost=0.00..225.50 rows=60 width=48)

EXPLAIN

random() is not quite expensive... I'll do it in my C code...


> Also please give us output of
> vacuum verbose analyze parametre;
NOTICE: --Relation parametre--
NOTICE: Pages 3: Changed 0, reaped 0, Empty 0, New 0; Tup 235: Vac 0,
Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 64, MaxLen 148; Re-using:
Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec.
NOTICE: Index parametre_pkey: Pages 2; Tuples 235. CPU 0.00s/0.00u sec.
NOTICE: Index parametre_tracking_idx: Pages 2; Tuples 235. CPU
0.00s/0.00u sec.
NOTICE: Index parametre_idct1_idx: Pages 2; Tuples 235. CPU 0.00s/0.00u
sec.
VACUUM

> vacuum verbose analyze accord_editeur;
NOTICE: --Relation accord_editeur--
NOTICE: Pages 7096: Changed 0, reaped 6790, Empty 0, New 0; Tup 447032:
Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 23768, MinLen 124, MaxLen 124;
Re-using: Free/Avail. Space 721776/0; EndEmpty/Avail. Pages 0/0. CPU
0.21s/1.85u sec.
NOTICE: Index accord_editeur_pkey: Pages 3339; Tuples 447032: Deleted
0. CPU 0.19s/1.01u sec.
VACUUM

> But it seems that the problem is in your too large index. It is quite
> expensive to traverse it (note that all cost flows from index scan over it).
>
> I do not know the exact statictics for your data, but if you create an index
> with small amount of fields, which account vast of the data, query should
> became much faster, if optimizer will decide to use this index.
> As a first try you can create index on num_editeur. Do not forget to do
> vacuum analyze accord_editeur; after this.

OK, I dropped the large index and only set one on num_editeur.

Vacuum verbose analyze accord_editeur :
NOTICE: --Relation accord_editeur--
NOTICE: Pages 7096: Changed 0, reaped 6790, Empty 0, New 0; Tup 447032:
Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 23768, MinLen 124, MaxLen 124;
Re-using: Free/Avail. Space 721776/0; EndEmpty/Avail. Pages 0/0. CPU
0.33s/1.74u sec.
NOTICE: Index accord_editeur_tracking_idx: Pages 1330; Tuples 447032:
Deleted 0. CPU 0.05s/0.93u sec.
VACUUM

Here is the new Explain (always whithout the random()) :
NOTICE: QUERY PLAN:

Nested Loop (cost=0.00..512.65 rows=1 width=56)
-> Index Scan using parametre_tracking_idx on parametre par
(cost=0.00..2.02 rows=1 width=8)
-> Index Scan using accord_editeur_tracking_idx on accord_editeur ae
(cost=0.00..509.88 rows=60 width=48)

EXPLAIN

So, the large index seems to be better...
Or maybe I should try with a hash index ?

If I hash the num_editeur in another field and I set my index on these 2
fields, it gives me better cost, but performance is always slow :-(

>> In fact it is a CPU-Bound... But why ?
>> One query takes up to 14% of CPU usage ! The server is a bi-proc PIII 660.
>> And Id'like to run more than 100 at the same time...
>>
>> Maybe if I set hash indices on single fields instead of one index on
>> multiple columns ?
>>
>> Thanks,
>>
>> Guillaume.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Denis Perchine 2001-01-26 08:45:37 Re: Load a database into memory
Previous Message Dr R.Adscheid 2001-01-26 08:03:30 vacuum