Re: Syslog to postgresql - need peaks of 5.000/s

From: paul rivers <rivers(dot)paul(at)gmail(dot)com>
To: Michael Monnerie <michael(dot)monnerie(at)it-management(dot)at>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Syslog to postgresql - need peaks of 5.000/s
Date: 2008-01-26 22:49:20
Message-ID: 479BB8F0.9090508@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Michael Monnerie wrote:
> Dear list,
>
> I shall build up a central syslog server able to do 5.000 log lines per
> second peak, 1.000 sustained. I found some pages describing current
> syslog-ng with magic "log to file then insert":
> http://ben.muppethouse.com/SYSLOG-DOC.html
> http://www.kdough.net/docs/syslog_postgresql/
>
> But according to syslog, they will release the "direct syslog to postgresql"
> feature soon:
> http://www.balabit.com/network-security/syslog-ng/
>
> Anyway, I'd like to know if anybody has a syslog-ng with postgresql that
> does up to 5000 inserts/s, and what hardware I need for that. Of course,
> some selects for statistical purposes must also be possible, so not only
> writes here. Any ideas?
>
> mfg zmi
>

Hi Michael,

I do something like this with syslog-ng logs. The processing looks
something like this:

- Today, about 20 different devices feed the same kind of logs to a
central syslog server.
- For each device,a parse-and-load (p&l) daemon is running that parses
data elements out of the log and then stores the parsed data plus a copy
of the original log line to the db. The parsing is probably less trivial
than one would guess (much more complex and varied than say web server
logs, for example).
- This p&l daemon is safely following the tail of the file, and so this
parsing and loading is happening in near real time, where each log line
turns into an insert.
- For a given <device>/<year>/<month>/<day> file, the p&l of course
follows to <day+1> when the next day's file appears.
- Everything is of course transactional with a status table in the db
tracking where it is, so if a p&l is shutdown for whatever reason, it
knows what line in the log to pick up and start at again. There are
config options to commit every so many log lines or every x seconds and
that sort of thing.

Using the same libraries I wrote for the p&l daemon, there is a bulk
load utility to take a raw syslog file, parse and load it. The main
difference between this bulk utility and the p&l is the bulk utility
completely parses the syslog file and generates an intermediate file it
then does a COPY to load in one fell swoop.

There is some selects done on this throughout the day, but that's hard
to quantify -- it varies a lot depending on need. On a nightly basis,
though, we end up touching the entire day's input to generate summary
data that allows us to do rapid trend analysis. In addition to some
ad-hoc access, a slew of reports run periodically against both the raw
and rollup data. There has been talk of revising the rollup process, so
it can happen more or less continuously throughout the day. No work has
been done, but I believe it's entirely feasible.

We also use table partitioning. There is a partition for each day +
device. This dramatically help performance, so long as you're very up on
the documented caveats of Pg's partitioning.

I did extensive metrics on what our throughput was for both the p&l
process and the bulk-loader. In our case, the p&l seemed to peak around
1,600 lines/sec and the bulk loader around 5,500 lines/sec for the whole
process: read, parse, load and commit the data to the db. For each
additional p&l daemon running on the same host, we lose a small
percentage of peak (same for the bulk processor).

The machine is absolutely nothing special: a 32 bit dell blade, dual
core 2 cpu machine with 4gb running RHEL4 with postgres 8.2.6. The disk
is SAN-attached to a hitachi 9990, but that's because that's what where
I work insists on for any box. This box is running everything --
receiving the logs, parsing and loading them, and running the db. [There
is, incidentally, a redundant machine receiving the same logs and doing
the same thing.] I'm sorry I don't know offhand how many spindles in the
SAN back the machine's filesystems, but we're not talking dozens. This
machine is really considered nothing special. This isn't even the only
thing this machine is doing.

Also, the parsing of our files adds a significant amount of time. If you
are doing minimal parsing and more or less bulk-loading the data, I
would expect you to see far better numbers than the 1,600 l/s and 5,500
l/s.

Our system has been running for about a year and a half, and postgres
has always been rock solid. I just ran a query to see we processed about
1.5 trillion log lines last month. That works out to an average of
around 560 lines/s or so, which is less than what you're targeting. But
if rates doubled (either if we doubled the number of tracked devices or
the existing doubled their volumes), I'm confident there would be no
problems.

Were I in your shoes, I would start collecting the syslogs immediately,
and then prototype something up, so you can profile your process end to
end, think about what you do if you ever need to catch up the loading
[we never have needed to, but we know we can, which is great], and so
forth.

I think a lot will be determined by how much you need to process the
logs before loading, whether your loads are going to be via insert or
copy, and how many logfiles will you be feeding the db at the same time.
Hardware and all that will come into play, but until you answer these
questions first, it's going to be hard to say more about it.

Anyway, that's my experience with this, hope it helps.

Regards,
Paul

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message NUWAN LIYANAGE 2008-01-27 15:22:58 Re: backup including symbolic links?
Previous Message Scott Marlowe 2008-01-26 22:30:32 Re: backup including symbolic links?