Re: Why so slow?

From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Bealach-na Bo <bealach_na_bo(at)hotmail(dot)com>
Cc: akretschmer(at)spamfence(dot)net, pgsql-performance(at)postgresql(dot)org
Subject: Re: Why so slow?
Date: 2006-04-28 14:58:43
Message-ID: 20060428145842.GI97354@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, Apr 28, 2006 at 11:41:06AM +0000, Bealach-na Bo wrote:
> OK, here is a much more detailed output. I still don't quite
> understand why simple queries like counting the number of rows in a
> table should take minutes to complete. Surely, any performance
> enhancement to be had by vacuuming is closely related to indexes
> which, in turn, are closely related to sorting and searching. A simple
> count of 365590 does not involve indexes (or does it??) and should not take
> minutes. Should I be forcing the
> way postgresql plans my queries?
>
> Here is my first attempt at vacuum that got nowhere and I had to
> cancel it.
>
> ----------psql session start----------
> vacuum verbose analyze job_log;
> INFO: vacuuming "job_log"
> INFO: index "job_log_id_pkey" now contains 10496152 row versions in 59665
> pages
> DETAIL: 0 index row versions were removed.
> 28520 index pages have been deleted, 20000 are currently reusable.
> CPU 1.44s/3.49u sec elapsed 33.71 sec.
> INFO: index "idx_job_stop_filter" now contains 10496152 row versions in
> 71149 pages
> DETAIL: 0 index row versions were removed.
> 24990 index pages have been deleted, 20000 are currently reusable.
> CPU 2.11s/3.61u sec elapsed 115.69 sec.
> INFO: index "idx_job_start_filter" now contains 10496152 row versions in
> 57891 pages
> DETAIL: 0 index row versions were removed.
> 19769 index pages have been deleted, 19769 are currently reusable.
> CPU 1.58s/3.44u sec elapsed 23.11 sec.
> Cancel request sent
> ----------psql session finish----------
>
>
> I thought that combining indexes would improve things and dropped the
> 3 separate ones above and created this one
>
> ----------psql session start----------
> create index idx_job_log_filter on job_log(job_name,job_start,job_stop);
>
> select count(*) from job_log;
> count
> --------
> 365590
> (1 row)

The above shows that the indexes contained 10M rows and 160M of dead
space each. That means you weren't vacuuming nearly enough.

> explain analyse select count(*) from job_log;
> QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=1382171.88..1382171.88 rows=1 width=0) (actual
> time=207011.882..207011.883 rows=1 loops=1)
> -> Seq Scan on job_log (cost=0.00..1381257.90 rows=365590 width=0)
> (actual time=199879.510..206708.523 rows=365590 loops=1)
> Total runtime: 207014.363 ms
> (3 rows)
> ----------psql session finish----------
>
> Then I tried another vacuum and decided to be very patient
>
> ----------psql session start----------
> vacuum verbose analyze job_log;
> INFO: vacuuming "job_log"
> INFO: index "job_log_id_pkey" now contains 10496152 row versions in 59665
> pages
> DETAIL: 0 index row versions were removed.
> 28520 index pages have been deleted, 20000 are currently reusable.
> CPU 1.39s/3.39u sec elapsed 24.19 sec.
> INFO: index "idx_job_log_filter" now contains 365590 row versions in 15396
> pages
> DETAIL: 0 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.59s/0.20u sec elapsed 10.28 sec.
> INFO: "job_log": removed 2795915 row versions in 368091 pages
> DETAIL: CPU 33.30s/30.11u sec elapsed 2736.54 sec.
> INFO: index "job_log_id_pkey" now contains 7700230 row versions in 59665
> pages
> DETAIL: 2795922 index row versions were removed.
> 37786 index pages have been deleted, 20000 are currently reusable.
> CPU 2.76s/6.45u sec elapsed 152.14 sec.
> INFO: index "idx_job_log_filter" now contains 365590 row versions in 15396
> pages
> DETAIL: 0 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.52s/0.20u sec elapsed 7.75 sec.
> INFO: "job_log": removed 2795922 row versions in 220706 pages
> DETAIL: CPU 19.81s/17.92u sec elapsed 1615.95 sec.
> INFO: index "job_log_id_pkey" now contains 4904317 row versions in 59665
> pages
> DETAIL: 2795913 index row versions were removed.
> 45807 index pages have been deleted, 20000 are currently reusable.
> CPU 2.22s/5.30u sec elapsed 129.02 sec.
> INFO: index "idx_job_log_filter" now contains 365590 row versions in 15396
> pages
> DETAIL: 0 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.50s/0.22u sec elapsed 7.61 sec.
> INFO: "job_log": removed 2795913 row versions in 188139 pages
> DETAIL: CPU 17.03s/15.37u sec elapsed 1369.45 sec.
> INFO: index "job_log_id_pkey" now contains 2108405 row versions in 59665
> pages
> DETAIL: 2795912 index row versions were removed.
> 53672 index pages have been deleted, 20000 are currently reusable.
> CPU 2.13s/4.57u sec elapsed 122.74 sec.
> INFO: index "idx_job_log_filter" now contains 365590 row versions in 15396
> pages
> DETAIL: 0 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.53s/0.23u sec elapsed 8.24 sec.
> INFO: "job_log": removed 2795912 row versions in 187724 pages
> DETAIL: CPU 16.84s/15.22u sec elapsed 1367.50 sec.
> INFO: index "job_log_id_pkey" now contains 365590 row versions in 59665
> pages
> DETAIL: 1742815 index row versions were removed.
> 57540 index pages have been deleted, 20000 are currently reusable.
> CPU 1.38s/2.85u sec elapsed 76.52 sec.
> INFO: index "idx_job_log_filter" now contains 365590 row versions in 15396
> pages
> DETAIL: 0 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.54s/0.31u sec elapsed 7.99 sec.
> INFO: "job_log": removed 1742815 row versions in 143096 pages
> DETAIL: CPU 12.77s/11.75u sec elapsed 1046.10 sec.
> INFO: "job_log": found 12926477 removable, 365590 nonremovable row
> versions in 1377602 pages

13M dead rows, and the table is 1.4M pages, or 11GB. No wonder it's
slow.

You need to run a vacuum full, and then you need to vacuum far more
often. If you're running 8.1, turn on autovacuum and cut each default
scale factor in half, to 0.2 and 0.1.

> DETAIL: 0 dead row versions cannot be removed yet.
> There were 7894754 unused item pointers.
> 0 pages are entirely empty.
> CPU 124.49s/117.57u sec elapsed 8888.80 sec.
> INFO: vacuuming "pg_toast.pg_toast_17308"
> INFO: index "pg_toast_17308_index" now contains 130 row versions in 12
> pages
> DETAIL: 2543 index row versions were removed.
> 9 index pages have been deleted, 0 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.11 sec.
> INFO: "pg_toast_17308": removed 2543 row versions in 617 pages
> DETAIL: CPU 0.04s/0.05u sec elapsed 4.85 sec.
> INFO: "pg_toast_17308": found 2543 removable, 130 nonremovable row
> versions in 650 pages
> DETAIL: 0 dead row versions cannot be removed yet.
> There were 0 unused item pointers.
> 0 pages are entirely empty.
> CPU 0.06s/0.06u sec elapsed 5.28 sec.
> INFO: analyzing "rshuser.job_log"
> INFO: "job_log": scanned 3000 of 1377602 pages, containing 695 live rows
> and 0 dead rows; 695 rows in sample, 319144 estimated total rows
> VACUUM
>
>
> explain analyse select count(*) from job_log;
> QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=1382171.88..1382171.88 rows=1 width=0) (actual
> time=207267.094..207267.095 rows=1 loops=1)
> -> Seq Scan on job_log (cost=0.00..1381257.90 rows=365590 width=0)
> (actual time=200156.539..206962.895 rows=365590 loops=1)
> Total runtime: 207267.153 ms
> (3 rows)
>
> ----------psql session finish----------
>
>
> I also took snapshots of top output while I ran the above
>
>
> ----------top output start----------
> Cpu(s): 0.7% us, 0.7% sy, 0.0% ni, 49.7% id, 48.5% wa, 0.5% hi, 0.0% si
> Mem: 1554788k total, 1538268k used, 16520k free, 6220k buffers
> Swap: 1020024k total, 176k used, 1019848k free, 1404280k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 3368 postgres 18 0 37492 29m 11m D 2.7 1.9 3:00.54 postmaster
>
>
>
> Cpu(s): 0.7% us, 0.8% sy, 0.0% ni, 49.7% id, 48.5% wa, 0.3% hi, 0.0% si
> Mem: 1554788k total, 1538580k used, 16208k free, 2872k buffers
> Swap: 1020024k total, 176k used, 1019848k free, 1414908k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 3368 postgres 15 0 37492 29m 11m D 2.3 1.9 5:26.03 postmaster
>
>
> Cpu(s): 0.5% us, 5.8% sy, 0.0% ni, 48.7% id, 44.4% wa, 0.5% hi, 0.0% si
> Mem: 1554788k total, 1538196k used, 16592k free, 1804k buffers
> Swap: 1020024k total, 176k used, 1019848k free, 1444576k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 3368 postgres 15 0 20956 13m 11m D 11.0 0.9 6:25.10 postmaster
> ----------top output end----------
>
>
> I know my database needs a major redesign. But I'm having a hard time
> explaining the poor performance nevertheless.
>
>
> Regards,
>
> Bealach
>
>
> >From: Andreas Kretschmer <akretschmer(at)spamfence(dot)net>
> >To: pgsql-performance(at)postgresql(dot)org
> >Subject: Re: [PERFORM] Why so slow?
> >Date: Thu, 27 Apr 2006 20:28:23 +0200
> >
> >Bealach-na Bo <bealach_na_bo(at)hotmail(dot)com> schrieb:
> >> The node table is tiny (2500 records). What I'm pulling my hair out
> >> over is that ANY Query, even something as simple as select count(*)
> >> form job_log takes of the order of tens of minutes to complete. Just
> >> now I'm trying to run an explain analyze on the above query, but so
> >> far, it's taken 35min! with no result and there is a postgres process at
> >> the top of top
> >>
> >> What am I doing wrong??
> >
> >The 'explain analyse' don't return a result, but it returns the query
> >plan and importance details, how PG works.
> >
> >That's why you should paste the query and the 'explain analyse' -
> >output. This is very important.
> >
> >Anyway, do you periodical vacuum your DB? My guess: no, and that's why
> >you have many dead rows.
> >
> >20:26 < akretschmer|home> ??vacuum
> >20:26 < rtfm_please> For information about vacuum
> >20:26 < rtfm_please> see
> >http://developer.postgresql.org/~wieck/vacuum_cost/
> >20:26 < rtfm_please> or
> >http://www.postgresql.org/docs/current/static/sql-vacuum.html
> >20:26 < rtfm_please> or http://www.varlena.com/varlena/GeneralBits/116.php
> >
> >20:27 < akretschmer|home> ??explain
> >20:27 < rtfm_please> For information about explain
> >20:27 < rtfm_please> see
> >http://techdocs.postgresql.org/oscon2005/robert.treat/OSCON_Explaining_Explain_Public.sxi
> >20:27 < rtfm_please> or http://www.gtsm.com/oscon2003/toc.html
> >20:27 < rtfm_please> or
> >http://www.postgresql.org/docs/current/static/sql-explain.html
> >
> >
> >Read this links for more informations about vacuum and explain.
> >
> >
> >HTH, Andreas
> >--
> >Really, I'm not out to destroy Microsoft. That will just be a completely
> >unintentional side effect. (Linus Torvalds)
> >"If I was god, I would recompile penguin with --enable-fly." (unknow)
> >Kaufbach, Saxony, Germany, Europe. N 51.05082?, E 13.56889?
> >
> >---------------------------(end of broadcast)---------------------------
> >TIP 9: In versions below 8.0, the planner will ignore your desire to
> > choose an index scan if your joining column's datatypes do not
> > match
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
>

--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Alan Hodgson 2006-04-28 15:02:59 Re: Why so slow?
Previous Message Tom Lane 2006-04-28 14:57:00 Re: CPU usage goes to 100%, query seems to ran forever