Re: Simple select hangs while CPU close to 100% - Analyze

From: "Jozsef Szalay" <jszalay(at)storediq(dot)com>
To: "Bill Moran" <wmoran(at)collaborativefusion(dot)com>
Cc: "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Simple select hangs while CPU close to 100% - Analyze
Date: 2007-08-17 22:54:29
Message-ID: E387E2E9622FDD408359F98BF183879EE4E966@dc1.storediq.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

With the limited time I had, I could not produce a test case that I
could have submitted to this forum.

I have found another issue in production though. In addition to the

SELECT COUNT(*) FROM table;

taking forever (>6 hours on a table with 100,000 rows and with no
indexes on a system with Linux Fedora 3 and with two 3.2GHz Xeon
processors plus hyperthreading), the

"SELECT column_1 FROM table GROUP BY column_1 HAVING COUNT(*) > 1;

statement actually ran out of memory (on a table with 300,000 rows and
with no indexes, while the OS reported >3.5 GB virtual memory for the
postgres backend process).

To make it short, we found that both problems could be solved with a
single magic bullet, namely by calling ANALYZE every time after large
amount of changes were introduced to the table. (Earlier, we called
ANALYZE only after we did some serious post-processing on freshly
bulk-loaded data.)

I don't know why ANALYZE would have any effect on a sequential scan of a
table but it does appear to impact both performance and memory usage
significantly.

Both of our production issues have vanished after this simple change! We
do not have to call a FULL VACUUM on the table anymore. The plain VACUUM
is satisfactory.

Thanks for all the responses!
Jozsef

-----Original Message-----
From: Bill Moran [mailto:wmoran(at)collaborativefusion(dot)com]
Sent: Wednesday, July 25, 2007 3:29 PM
To: Jozsef Szalay
Cc: Pavel Stehule; pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] Simple select hangs while CPU close to 100%

In response to "Jozsef Szalay" <jszalay(at)storediq(dot)com>:

> Our application is such that any update to the database is done by a
> single session in a batch process using bulk load. The frequency of
> these usually larger scale updates is variable but an update runs
every
> 2-3 days on average.
>
> Originally a plain VACUUM ANALYZE was executed on every affected table
> after every load.

Any other insert/update activity outside of the bulk loads? What's
the vacuum policy outside the bulk loads? You say originally, does
it still do so?

I agree with Pavel that the output of vacuum verbose when the problem
is occurring would be helpful.

> VACUUM FULL ANALYZE is scheduled to run on a weekly basis.

If you need to do this, then other settings are incorrect.

> I do understand the need for vacuuming. Nevertheless I expect Postgres
> to return data eventually even if I do not vacuum. In my case, the
> simple SELECT COUNT(*) FROM table; statement on a table that had
around
> 100K "live" rows has not returned the result for more than 6 hours
after
> which I manually killed it.

It should, 6 hours is too long for that process, unless you're running
a 486dx2. You didn't mention your hardware or your postgresql.conf
settings. What other activity is occurring during this long count()?
Can you give us a shot of the iostat output and/or top during this
phenomenon?

>
> Jozsef
>
>
> -----Original Message-----
> From: Bill Moran [mailto:wmoran(at)collaborativefusion(dot)com]
> Sent: Wednesday, July 25, 2007 1:12 PM
> To: Jozsef Szalay
> Cc: Pavel Stehule; pgsql-performance(at)postgresql(dot)org
> Subject: Re: [PERFORM] Simple select hangs while CPU close to 100%
>
> In response to "Jozsef Szalay" <jszalay(at)storediq(dot)com>:
>
> > Hi Pavel,
> >
> >
> > Yes I did vacuum. In fact the only way to "fix" this problem is
> > executing a "full" vacuum. The plain vacuum did not help.
>
> I read over my previous reply and picked up on something else ...
>
> What is your vacuum _policy_? i.e. how often do you vacuum/analyze?
> The fact that you had to do a vacuum full to get things back under
> control tends to suggest that your current vacuum schedule is not
> aggressive enough.
>
> An explicit vacuum of this table after the large delete/insert may
> be helpful.
>
> > -----Original Message-----
> > From: Pavel Stehule [mailto:pavel(dot)stehule(at)gmail(dot)com]
> > Sent: Sunday, July 22, 2007 10:53 AM
> > To: Jozsef Szalay
> > Cc: pgsql-performance(at)postgresql(dot)org
> > Subject: Re: [PERFORM] Simple select hangs while CPU close to 100%
> >
> > Hello
> >
> > did you vacuum?
> >
> > It's good technique do vacuum table after remove bigger number of
> rows.
> >
> > Regards
> > Pavel Stehule
> >
> > 2007/7/22, Jozsef Szalay <jszalay(at)storediq(dot)com>:
> > >
> > >
> > >
> > >
> > > I'm having this very disturbing problem. I got a table with about
> > 100,000
> > > rows in it. Our software deletes the majority of these rows and
then
> > bulk
> > > loads another 100,000 rows into the same table. All this is
> happening
> > within
> > > a single transaction. I then perform a simple "select count(*)
from
> > ..."
> > > statement that never returns. In the mean time, the backend
Postgres
> > process
> > > is taking close to 100% of the CPU. The hang-up does not always
> happen
> > on
> > > the same statement but eventually it happens 2 out of 3 times. If
I
> > dump and
> > > then restore the schema where this table resides the problem is
gone
> > until
> > > the next time we run through the whole process of deleting,
loading
> > and
> > > querying the table.
> > >
> > >
> > >
> > > There is no other activity in the database. All requested locks
are
> > granted.
> > >
> > >
> > >
> > > Has anyone seen similar behavior?
> > >
> > >
> > >
> > > Some details:
> > >
> > >
> > >
> > > Postgres v 8.1.2
> > >
> > > Linux Fedora 3
> > >
> > >
> > >
> > > shared_buffers = 65536
> > >
> > > temp_buffers = 32768
> > >
> > > work_mem = 131072
> > >
> > > maintenance_work_mem = 131072
> > >
> > > max_stack_depth = 8192
> > >
> > > max_fsm_pages = 40000
> > >
> > > wal_buffers = 16
> > >
> > > checkpoint_segments = 16
> > >
> > >
> > >
> > >
> > >
> > > top reports
> > >
> > >
> > >
> > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
> > >
> > > 19478 postgres 25 0 740m 721m 536m R 99.7 4.4 609:41.16
> > postmaster
> > >
> > >
> > >
> > > ps -ef | grep postgres reports
> > >
> > >
> > >
> > > postgres 19478 8061 99 00:11 ? 10:13:03 postgres: user
dbase
> > [local]
> > > SELECT
> > >
> > >
> > >
> > > strace -p 19478
> > >
> > > no system calls reported
> > >
> > >
> > >
> > >
> > >
> > > Thanks for the help!
> > >
> > > Jozsef
> >
> >
> > ---------------------------(end of
> broadcast)---------------------------
> > TIP 4: Have you searched our list archives?
> >
> > http://archives.postgresql.org
> >
> >
> >
> >
> >
> >
>
>
> --
> Bill Moran
> Collaborative Fusion Inc.
> http://people.collaborativefusion.com/~wmoran/
>
> wmoran(at)collaborativefusion(dot)com
> Phone: 412-422-3463x4023

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran(at)collaborativefusion(dot)com
Phone: 412-422-3463x4023

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2007-08-17 23:55:59 Re: Raid Configurations
Previous Message Guy Rouillier 2007-08-17 22:48:05 Re: High update activity, PostgreSQL vs BigDBMS