Re: [SQL] [PERFORM] 7.3.1 index use / performance

From: Achilleus Mantzios <achill(at)matrix(dot)gatewaynet(dot)com>
To: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org, <pgsql-general(at)postgresql(dot)org>, <pgsql-sql(at)postgresql(dot)org>
Subject: Re: [SQL] [PERFORM] 7.3.1 index use / performance
Date: 2003-01-07 20:27:32
Message-ID: Pine.LNX.4.44.0301071754580.8183-200000@matrix.gatewaynet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance pgsql-sql

On Tue, 7 Jan 2003, Stephan Szabo wrote:

>
> On Tue, 7 Jan 2003, Achilleus Mantzios wrote:
>
> > i am just in the stage of having migrated my test system to 7.3.1
> > and i am experiencing some performance problems.
> >
> > i have a table "noon"
> > Table "public.noon"
> > Column | Type | Modifiers
> > ------------------------+------------------------+-----------
> > v_code | character varying(4) |
> > log_no | bigint |
> > report_date | date |
> > report_time | time without time zone |
> > voyage_no | integer |
> > charterer | character varying(12) |
> > port | character varying(24) |
> > duration | character varying(4) |
> > rotation | character varying(9) |
> > ......
> >
> > with a total of 278 columns.
> >
> > it has indexes:
> > Indexes: noonf_date btree (report_date),
> > noonf_logno btree (log_no),
> > noonf_rotation btree (rotation text_ops),
> > noonf_vcode btree (v_code),
> > noonf_voyageno btree (voyage_no)
> >
> > On the test 7.3.1 system (a FreeBSD 4.7-RELEASE-p2, Celeron 1.2GHz
> > 400Mb, with 168Mb for pgsql),
> > i get:
> > dynacom=# EXPLAIN ANALYZE select
> > FUELCONSUMPTION,rpm,Steam_Hours,voyage_activity,ldin from noon where
> > v_code='4500' and rotation='NOON ' and report_date between
> > '2002-01-07' and '2003-01-07';
> > QUERY PLAN
> >
>
> > -------------------------------------------------------------------------------------------------------------------
> > Index Scan using noonf_date on noon (cost=0.00..4.46 rows=1 width=39)
> > (actual time=0.27..52.89 rows=259 loops=1)
>
>
> > Index Scan using noonf_vcode on noon (cost=0.00..3122.88 rows=1
> > width=39) (actual time=0.16..13.92 rows=259 loops=1)
>
>
> What do the statistics for the three columns actually look like and what
> are the real distributions and counts like?

The two databases (test 7.3.1 and development 7.2.3) are identical
(loaded from the same pg_dump).

About the stats on these 3 columns i get: (see also attachment 1 to avoid
identation/wraparound problems)

schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation
------------+-----------+-------------+-----------+-----------+------------+-----------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+-------------
public | noon | v_code | 0 | 8 | 109 | {4630,4650,4690,4670,4520,4610,4550,4560,4620,4770} | {0.0283333,0.028,0.0256667,0.0243333,0.024,0.0236667,0.0233333,0.0233333,0.0226667,0.0226667} | {2070,3210,4330,4480,4570,4680,4751,4820,4870,4940,6020} | -0.249905
public | noon | report_date | 0 | 4 | 3408 | {2001-11-14,1998-10-18,2000-04-03,2000-07-04,2000-12-20,2000-12-31,2001-01-12,2001-10-08,2001-12-25,1996-01-23} | {0.002,0.00166667,0.00166667,0.00166667,0.00166667,0.00166667,0.00166667,0.00166667,0.00166667,0.00133333} | {"0001-12-11 BC",1994-09-27,1996-03-26,1997-07-29,1998-08-26,1999-03-29,1999-11-30,2000-09-25,2001-05-25,2002-01-17,2002-12-31} | -0.812295
public | noon | rotation | 0 | 13 | 6 | {"NOON ","PORT LOG ","ARRIVAL ",DEPARTURE,"SEA ","NEXT PORT"} | {0.460333,0.268667,0.139,0.119667,0.007,0.00533333} | | 0.119698
(3 rows)

About distributions, i have:

dynacom=# SELECT rotation,count(*) from noon group by rotation;
rotation | count
-----------+-------
| 2
000000000 | 65
ARRIVAL | 1
ARRIVAL | 15471
DEPARTURE | 15030
NEXT PORT | 462
NOON | 50874
PORT LOG | 25688
SEA | 1202
(9 rows)

dynacom=# SELECT v_code,count(*) from noon group by v_code;
v_code | count
--------+-------
0004 | 1
1030 | 1
2070 | 170
2080 | 718
2110 | 558
2220 | 351
2830 | 1373
2840 | 543
2860 | 407
2910 | 418
3010 | 352
3020 | 520
3060 | 61
3130 | 117
3140 | 1
3150 | 752
3160 | 811
3170 | 818
3180 | 1064
3190 | 640
3200 | 998
3210 | 1512
3220 | 595
3230 | 374
3240 | 514
3250 | 13
3260 | 132
3270 | 614
4010 | 413
4020 | 330
4040 | 728
4050 | 778
4060 | 476
4070 | 534
4310 | 759
4320 | 424
4330 | 549
4360 | 366
4370 | 334
4380 | 519
4410 | 839
4420 | 183
4421 | 590
4430 | 859
4450 | 205
4470 | 861
4480 | 766
4490 | 169
4500 | 792
4510 | 2116
4520 | 2954
4530 | 2142
4531 | 217
4540 | 2273
4550 | 2765
4560 | 2609
4570 | 2512
4580 | 1530
4590 | 1987
4600 | 308
4610 | 2726
4620 | 2698
4630 | 2813
4640 | 1733
4650 | 2655
4660 | 2139
4661 | 65
4670 | 2607
4680 | 1729
4690 | 2587
4700 | 2101
4710 | 1830
4720 | 1321
4730 | 1258
4740 | 1506
4750 | 1391
4751 | 640
4760 | 1517
4770 | 2286
4780 | 1353
4790 | 1209
4800 | 2414
4810 | 770
4820 | 1115
4830 | 1587
4840 | 983
4841 | 707
4850 | 1297
4860 | 375
4870 | 1440
4880 | 456
4881 | 742
4890 | 210
4891 | 45
4900 | 2
4910 | 1245
4920 | 414
4930 | 1130
4940 | 1268
4950 | 949
4960 | 836
4970 | 1008
4980 | 1239
5510 | 477
5520 | 380
5530 | 448
5540 | 470
5550 | 352
5560 | 148
5570 | 213
5580 | 109
5590 | 55
6010 | 246
6020 | 185
9180 | 1

(Not all the above vessels are active or belong to me:) )

The distribution on the report_date has no probabilistic significance
since each report_date usually corresponds to one row.
So,
dynacom=# SELECT count(*) from noon;
count
--------
108795
(1 row)

dynacom=#

Now for the specific query the counts have as follows:

dynacom=# select count(*) from noon where v_code='4500';
count
-------
792
(1 row)

dynacom=# select count(*) from noon where rotation='NOON ';
count
-------
50874
(1 row)

dynacom=# select count(*) from noon where report_date between '2002-01-07'
and '2003-01-07';
count
-------
7690
(1 row)

dynacom=#

> Given an estimated cost of around 4 for the first scan, my guess would be
> that it's not expecting alot of rows between 2002-01-07 and 2003-01-07
> which would make that a reasonable plan.
>

As we see the rows returned for v_code='4500' (792) are much fewer than
the rows returned for the dates between '2002-01-07' and '2003-01-07'
(7690).

Is there a way to provide you with more information?

And i must note that the two databases were worked on after a fresh
createdb on both systems (and as i told they are identical).
But, for some reason the 7.2.3 *always* finds the best index to use :)

>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>

==================================================================
Achilleus Mantzios
S/W Engineer
IT dept
Dynacom Tankers Mngmt
Nikis 4, Glyfada
Athens 16610
Greece
tel: +30-10-8981112
fax: +30-10-8981877
email: achill(at)matrix(dot)gatewaynet(dot)com
mantzios(at)softlab(dot)ece(dot)ntua(dot)gr

Attachment Content-Type Size
query_stats text/plain 2.2 KB

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tommi Mäkitalo 2003-01-07 20:29:41 release prepared statement
Previous Message Roman Fail 2003-01-07 20:23:03 Binary data migration from MSSQL

Browse pgsql-performance by date

  From Date Subject
Next Message Claiborne, Aldemaco Earl (Al) 2003-01-07 20:29:54 path
Previous Message Fred Moyer 2003-01-07 19:46:22 Re: [PERFORM] PostgreSQL and memory usage

Browse pgsql-sql by date

  From Date Subject
Next Message Achilleus Mantzios 2003-01-07 21:05:22 Re: [SQL] [PERFORM] 7.3.1 index use / performance
Previous Message Tom Lane 2003-01-07 19:26:08 Re: [SQL] [PERFORM] 7.3.1 index use / performance