Re: performance for high-volume log insertion

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: david(at)lang(dot)hm
Cc: Ben Chobot <bench(at)silentmedia(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: performance for high-volume log insertion
Date: 2009-04-21 17:25:57
Message-ID: 20090421172557.GZ8123@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

* david(at)lang(dot)hm (david(at)lang(dot)hm) wrote:
> I think the key thing is that rsyslog today doesn't know anything about
> SQL variables, it just creates a string that the user and the database
> say looks like a SQL statement.

err, what SQL variables? You mean the $NUM stuff? They're just
placeholders.. You don't really need to *do* anything with them.. Or
are you worried that users would provide something that would break as a
prepared query? If so, you just need to figure out how to handle that
cleanly..

> an added headache is that the rsyslog config does not have the concept of
> arrays (the closest that it has is one special-case hack to let you
> specify one variable multiple times)

Argh. The array I'm talking about is a C array, and has nothing to do
with the actual config syntax.. I swear, I think you're making this
more difficult by half.

Alright, looking at the documentation on rsyslog.com, I see something
like:

$template MySQLInsert,"insert iut, message, receivedat values
('%iut%', '%msg:::UPPERCASE%', '%timegenerated:::date-mysql%')
into systemevents\r\n", SQL

Ignoring the fact that this is horrible, horrible non-SQL, I see that
you use %blah% to define variables inside your string. That's fine.
There's no reason why you can't use this exact syntax to build a
prepared query. No user-impact changes are necessary. Here's what you
do:

build your prepared query by doing:
copy user string
newstring = replace all %blah% strings with $1, $2, $3, etc.
myvars = dynamically created C array with the %blah%'s in it
call PQprepare(newstring)

when a record comes in:
allocate a myvalues array of pointers
loop through myvars
for each myvar
set the corresponding pointer in myvalues to the string which
it corresponds to from the record
call PQexecPrepared(myvalues)

That's pretty much it. I assume you already deal with escaping %'s
somehow during the config load so that the prepared statement will be
what the user expects. As I mentioned before, the only obvious issue I
see with doing this implicitly is that the user might want to put
variables in places that you can't have variables in prepared queries.
You could deal with that by having the user indicate per template, using
another template option, if the query can be prepared or not. Another
options is adding to your syntax something like '%*blah%' which would
tell the system to pre-populate that variable before issuing PQprepare
on the resultant string. Of course, you might just use PQexecParams
there, unless you want to be gung-ho and actually keep a hash around of
prepared queries on the assumption that the variable the user gave you
doesn't change very often (eg, '%*month%') and it's cheap to keep a
small list of them around to use when they do match up.

Thanks,

Stephen

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message david 2009-04-21 18:09:18 Re: performance for high-volume log insertion
Previous Message Greg Smith 2009-04-21 17:17:41 Re: performance for high-volume log insertion