[Gross] Weighted checks in production
Eino Tuominen
eino at utu.fi
Sat Apr 5 09:33:07 EEST 2008
Jesse Thompson wrote:
>
> Cool! Nice work, as always. I will try this out soon, I hope.
It would be nice to get some feedback how the current version is behaving.
>> I have now uploaded gross-svn-236.tar.gz, it's running now in
>> production here. I couldn't get check_blocker to reuse connections as
>> Sophos' blockerd keeps shutting every connection down right after the
>> answer. And, now there's watchdog to see if blocker threads get stuck.
>
> Was this a bug? I can't remember the details.
I don't remember if there has been any discussion about this in perl-mx
mailing list. But I even configured a test server to have postfix make
policy queries to blockerd, and even then it's one query / connection.
Blockerd just closes the connection. I will do some reverse engineering
with sendmail. It would be really nice if those connections could be reused.
> Yeah, I think that this is the most efficient way to enable longer
> greylisted states. From what I understand, the current method of
> setting a greylist time period keeps the entry in memory and waits X
> seconds before updating the filter. This seems to me like it wouldn't
> scale well to larger time values.
Yes, the current implementation uses a delayed message queue implemented
in msgqueue.c. Message queues are basically linked lists with mutexes,
and delay queue is just two message queues (inq and outq) with a handler
thread moving messages from inq to outq. It actually scales well
performance-wise. The biggest problem with them is that they are saved
in memory. Should both replicated grossd nodes fail at the same time,
the queue is lost.
Now that most of the messages will be blocked and only a fraction will
be greylisted, it would be possible to use a lot bigger number_buffers
with a bit smaller filter_bits and rotate_interval. How about 22 bits,
15 minutes and 48 buffers?
PID COMMAND RPRVT RSHRD RSIZE VSIZE
13410 grossd 25M 200K- 25M 46M
New options to implement could be:
'skip_buffers' = how many (maximum) buffers to skip
'skip_style' = how to do the skipping, 'absolute' means that grossd
always puts update on the first buffer, 'weighed' (or something) would
place more suspicious triplets in the first buffer, and less suspicious
triplets nearer to skip limit, so that the greylisting time would be
relative to susp_weight.
If you set skip_buffers = 16 (around 3 hours), skip_style = weighed
and block_threshold = 9, grey_threshold = 1, then grossd would calculate
the filter to update like this
s = how many to skip, S = skip_buffers, w = total weight from the
checks, b = block_threshold, g = grey_threshold
s = S * ( w - g ) / ( b - g - 1)
With the given config this would be
s = 16 * ( w - 1 ) / ( 9 - 1 - 1) = 16 / 7 * ( w - 1)
If w = 1, then s = 0, with w = 8, s = 16
With that, more suspicious hosts would wait more, and less suspicious
would have to wait less.
Thoughts? This is not entirely trivial modification, as it means
modifications in the code that has been very stable since early days.
The good thing is that this is easy to stresstest if implemented.
--
Eino Tuominen
More information about the Gross
mailing list