Skip site navigation (1) Skip section navigation (2)

Re: performance for high-volume log insertion

From: david(at)lang(dot)hm
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Ben Chobot <bench(at)silentmedia(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: performance for high-volume log insertion
Date: 2009-04-21 18:35:02
Message-ID: alpine.DEB.1.10.0904211111040.12662@asgard.lang.hm (view raw or flat)
Thread:
Lists: pgsql-performance
On Tue, 21 Apr 2009, Stephen Frost wrote:

> * david(at)lang(dot)hm (david(at)lang(dot)hm) wrote:
>> I think the key thing is that rsyslog today doesn't know anything about
>> SQL variables, it just creates a string that the user and the database
>> say looks like a SQL statement.
>
> err, what SQL variables?  You mean the $NUM stuff?  They're just
> placeholders..  You don't really need to *do* anything with them..  Or
> are you worried that users would provide something that would break as a
> prepared query?  If so, you just need to figure out how to handle that
> cleanly..
>
>> an added headache is that the rsyslog config does not have the concept of
>> arrays (the closest that it has is one special-case hack to let you
>> specify one variable multiple times)
>
> Argh.  The array I'm talking about is a C array, and has nothing to do
> with the actual config syntax..  I swear, I think you're making this
> more difficult by half.

not intentinally, but you may be right.

> Alright, looking at the documentation on rsyslog.com, I see something
> like:
>
> $template MySQLInsert,"insert iut, message, receivedat values
> ('%iut%', '%msg:::UPPERCASE%', '%timegenerated:::date-mysql%')
> into systemevents\r\n", SQL
>
> Ignoring the fact that this is horrible, horrible non-SQL,

that example is for MySQL, nuff said ;-) or are you referring to the 
modifiers that rsyslog has to manipulate the strings before inserting 
them? (as opposed to using sql to manipulate the strings)

> I see that
> you use %blah% to define variables inside your string.  That's fine.
> There's no reason why you can't use this exact syntax to build a
> prepared query.  No user-impact changes are necessary.  Here's what you
> do:

<snip psudocode to replace %blah% with $num>

for some reason I was stuck on the idea of the config specifying the 
statement and variables seperatly, so I wasn't thinking this way, however 
there are headaches

doing this will require changes to the structure of rsyslog, today the 
string manipulation is done before calling the output (database) module, 
so all the database module currently gets is a string. in a (IMHO 
misguided) attempt at security in a multi-threaded program, the output 
modules are not given access to the full data, only to the distiled 
result.

also, this approach won't work if the user wants to combine fixed text 
with the variable into a column. an example of doing that would be to have 
a filter to match specific lines, and then use a slightly different 
template for those lines. I guess that could be done in SQL instead of in 
the rsyslog string manipulation (i.e. instead of 'blah-%host%' do 
'blah-'||'%host')

> As I mentioned before, the only obvious issue I
> see with doing this implicitly is that the user might want to put
> variables in places that you can't have variables in prepared queries.

this problem space would be anywhere except the column contents, right?

> You could deal with that by having the user indicate per template, using
> another template option, if the query can be prepared or not.  Another
> options is adding to your syntax something like '%*blah%' which would
> tell the system to pre-populate that variable before issuing PQprepare
> on the resultant string.  Of course, you might just use PQexecParams
> there, unless you want to be gung-ho and actually keep a hash around of
> prepared queries on the assumption that the variable the user gave you
> doesn't change very often (eg, '%*month%') and it's cheap to keep a
> small list of them around to use when they do match up.

rsyslog supports something similar for writing to disk where you can use 
variables as part of the filename/path (referred to as 'dynafiles' in the 
documentation). that's a little easier to deal with as the filename is 
specified seperatly from the format of the data to write. If we end up 
doing prepared statements I suspect they initially won't support variables 
outside of the columns.

David Lang

In response to

Responses

pgsql-performance by date

Next:From: davidDate: 2009-04-21 18:44:46
Subject: Re: performance for high-volume log insertion
Previous:From: Kenneth MarshallDate: 2009-04-21 18:34:22
Subject: Re: performance for high-volume log insertion

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group