pgBadger 3 released : now with parallel parsing

Posted on 2013-02-26 by Dalibo

DALIBO is proud to announce the release of pgBadger v3, the new PostgreSQL log analyzer. pgBadger is built for speed with fully detailed reports from your PostgreSQL log file.

This new release brings significant improvements. All pgBadger users should upgrade as soon as possible.

pgBadger 3 parallel log parsing

The first versions of pgBadger were bound to only one CPU. The PostgreSQL log files were scanned sequentially. Analyzing very large log files could take several hours.

This limitation is now removed. You can use as many CPU cores as you want and scan your logs in parallel.

To enable parallel processing, you just have to use the -j N option, N being the number of cores you want to use.

Please note that the parallel mode has a little drawback. With this method, some queries may be truncated. If you enable N cores, then result may differ in a maximum of N queries per log file.

However, this is a minor issue: parallel mode is interesting if you have millions of queries to analyze. And if you have millions of queries in a log file, you can afford to loose a few as it's quite unlikely that the lost queries would have changed the overall results.

However, to avoid this problem, you can use the pgBadger "per-file parallel mode" to analyze your logs but with lower performance than the standard parallel mode. To enable this behaviour, you have to use the "-J N" option instead of "-j N". In per-file mode, the performances start being really interesting when there's hundreds of small log files (e.g. 10MB rotation size limit) and with at least 8 cores.

How fast is pgBadger 3?

The goal was to allow pgBadger to use as many cores as specified to have parallel log parsing. Here are some performance results using pgbadger on five log files for a total of 9.5 GB:

  • with 1 core => 1h 41m 18s
  • with 2 cores => 50m 25s
  • with 4 cores => 25m 39s
  • with 8 cores => 15m 58s

We feel this performance gain is quite interesting :)

New binary format

In addition to the classic HTML, TXT and Tsung output formats, pgBadger 3 is now able to generate a binary input/output format. This new format is useful if you only want to store the log statistics and generate the HTML report with graph later.

In a nutshell, the two main activities of pgBadger are parsing and reporting. With this binary format, you can now split those activities and run them at different times. For exemple, you can parse your log once a day, and generate the HTML reports only when needed.

You can also combine several binary files to . For Instance, you may create a binary report every week and aggregate the last 4 week reports to build a monthly report in HTML.

This new binary format is also compatible with other tools such as pgShark https://github.com/dalibo/pgshark/.

More stats, more pie charts!

This major release also has additional features:

  • New pie graph to show number of autovacuum per table and number of tuples removed by autovacuum per table
  • No more distinction between log_duration, log_statement and log_min_duration_statement format
  • New report of tuples/pages removed in report of VACUUM by table
  • New VACUUM and ANALYZE hourly reports and graphs

... and many bugfixes .

For the complete list of changes, please checkout the release note on https://github.com/dalibo/pgbadger/blob/master/ChangeLog

Deprecated options

WARNING : for the sake of simplicity, the ''--enable-log_min_duration'' and ''--enable-log_duration'' command line options have been removed. pgbadger is now parsing any log_duration, log_statement and log_min_duration_statement lines without distinction and adapt the reports following those lines.

If you are running pgBadger using cron, please take care: if one of theses options appears in the command line, pgbadger will refuse to start.

Links & Credits

DALIBO would like to thank the developers who submitted patches and the users who reported bugs and feature requests, especially Matt Romaine, Luke Cyca, Kevin Brannen, Adam Schroder, pilat66, Euler Taveira de Oliveira, stuntmunkee, pierrestroh, Vipul, Dirk-Jan Bulsink and Vincent Laborie.

pgBadger is an open project. Any contribution to build a better tool is welcome. You just have to send your ideas, features requests or patches using the GitHub tools or directly on our mailing list.

Links :


About pgBadger :

pgBagder is a new generation log analyzer for PostgreSQL, created by Gilles Darold, also author of ora2pg migration tool. pgBadger is a fast and easy tool to analyze your SQL traffic and create HTML5 reports with dynamics graphs. pgBadger is the perfect tool to understand the behavior of your PostgreSQL server and identify which SQL queries need to be optimized.

Docs, Download & Demo at http://dalibo.github.com/pgbadger/


About DALIBO :

DALIBO is the leading PostgreSQL company in France, providing support, trainings and consulting to its customers since 2005. The company contributes to the PostgreSQL community in various ways, including : code, articles, translations, free conferences and workshops

Check out DALIBO's open source projects at http://dalibo.github.com