Re: sytem log audit/reporting and psql

From: Andreas Haumer <andreas(at)xss(dot)co(dot)at>
To: Fei Liu <fei(dot)liu(at)aepnetworks(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: sytem log audit/reporting and psql
Date: 2007-05-01 17:05:14
Message-ID: 4637734A.6070802@xss.co.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi!

Fei Liu schrieb:
[...]
>
> Now here are my questions:

These are a lot of questions, and some of them are not related
to pgsql-performance or even PostgreSQL.

I'll try to answer some of them, because I'm currently experimenting
with partitioned tables, too.

> 2) Is partitioning a good approach to speed up log query/view? The user
> comment in partitioning in pgsql manual seems to indicate partitioning
> may be slower than non-partitioned table under certain circumstances.

You can look at table partitioning under several points of view.
My experience with table partitioning under the aspect of "performance" is:

*) It is a benefit for large tables only, and the definition of
"large" depends on your data.
I did some testing for an application we are developing and here
it shows that table partitioning does not seem to make sense for
tables with less than 10 million rows (perhaps even more)

*) The performance benefit depends on your queries. Some queries get
a big improvement, but some queries might even run significantly
slower.

*) Depending on the way you setup your system, inserts can be much
slower with partitioned tables (e.g. if you are using triggers
to automatically create your partitions on demand)

> 3) How to avoid repetitive log entry scanning since my cron job script
> is run daily but logrotate runs weekly? This means everytime my script
> will be parsing duplicate entries.

This has nothing to do with postgres, but I wrote something similar years
ago. Here's what I did and what you could do:

Remember the last line of your logfile in some external file (or even the
database). Then on the next run you can read the logfile again line by line
and skip all lines until you have found the line you saved on the last run.
* If you find the line that way, just start parsing the logfile beginning
at the next line.
* If you can not find your line and you reach EOF, start parsing again at
the beginning of the logfile.
* If this is your first run and you don't have a line stored yet, start
parsing at the beginning of the logfile

When you are finished you have to remember the last line from the logfile
again at some place.

> 6) What are the best approaches to analyze postgresql query performance
> and how to improve postgresql query performance?

Here are some general recommendations for performance testing from
my experience:

*) Test with real data. For table partitioning this means you have
to create really large datasets to make your tests useful.
You should write a small program to generate your test data if
possible or use some other means to create your test database.
You also need time: creating a test database and importing 100
million rows of test data will take several hours or even days!

*) Test with the real queries from your application!
Testing with just a few easy standard queries will almost for sure
not be sufficient to get the right numbers for the performance you
will see in your application later on!
Look at the thread "Query performance problems with partitioned tables"
I started on pgsql-performance just yesterday to see what I mean!

*) Use "EXPLAIN ANALYZE" and look at the "cost" and "actual time" numbers
this gives you. It also will show you the query plan used by PostgreSQL
when executing your query. You sometimes might be surprised what is going
on behind the scenes...

*) If you are just using some stopwatch to time your queries be aware
of other factors which might significantly influence your test:
Caching, other jobs running on the machine in parallel, cosmic rays, ...

*) Before running your tests you should always try to get to some well
defined starting point (this might even mean rebooting your server
before running each test) and you should always repeat each test
several times and then calculate a mean value (and standard deviation
to see how "good" your results are...)

*) Document your test setup and procedure as well as your results
(otherwise two days later you won't remember which test obtained
what result)

HTH

- - andreas

- --
Andreas Haumer | mailto:andreas(at)xss(dot)co(dot)at
*x Software + Systeme | http://www.xss.co.at/
Karmarschgasse 51/2/20 | Tel: +43-1-6060114-0
A-1100 Vienna, Austria | Fax: +43-1-6060114-71
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGN3NIxJmyeGcXPhERAspaAJ9MgymiwyehN6yU6jGtA0pbkdolsACfb6JC
kB5KLyQ5WOTUD9uabVzsjwY=
=3QSa
-----END PGP SIGNATURE-----

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Andrew Lazarus 2007-05-01 18:09:24 Re: index structure for 114-dimension vector
Previous Message Josh Berkus 2007-05-01 16:23:47 Re: Feature Request --- was: PostgreSQL Performance Tuning