[Gross] Weighted checks in production

Sat Apr 5 09:33:07 EEST 2008

Jesse Thompson wrote:
> 
> Cool!  Nice work, as always.  I will try this out soon, I hope.

It would be nice to get some feedback how the current version is behaving.

>> I have now uploaded gross-svn-236.tar.gz, it's running now in 
>> production here. I couldn't get check_blocker to reuse connections as 
>> Sophos' blockerd keeps shutting every connection down right after the 
>> answer. And, now there's watchdog to see if blocker threads get stuck.
> 
> Was this a bug?  I can't remember the details.

I don't remember if there has been any discussion about this in perl-mx 
mailing list. But I even configured a test server to have postfix make 
policy queries to blockerd, and even then it's one query / connection. 
Blockerd just closes the connection. I will do some reverse engineering 
with sendmail. It would be really nice if those connections could be reused.

> Yeah, I think that this is the most efficient way to enable longer 
> greylisted states.  From what I understand, the current method of 
> setting a greylist time period keeps the entry in memory and waits X 
> seconds before updating the filter.  This seems to me like it wouldn't 
> scale well to larger time values.

Yes, the current implementation uses a delayed message queue implemented 
in msgqueue.c. Message queues are basically linked lists with mutexes, 
and delay queue is just two message queues (inq and outq) with a handler 
thread moving messages from inq to outq. It actually scales well 
performance-wise. The biggest problem with them is that they are saved 
in memory. Should both replicated grossd nodes fail at the same time, 
the queue is lost.

Now that most of the messages will be blocked and only a fraction will 
be greylisted, it would be possible to use a lot bigger number_buffers 
with a bit smaller filter_bits and rotate_interval. How about 22 bits, 
15 minutes and 48 buffers?

   PID COMMAND    RPRVT  RSHRD  RSIZE  VSIZE
13410 grossd      25M   200K-   25M    46M

New options to implement could be:

'skip_buffers' = how many (maximum) buffers to skip

'skip_style' = how to do the skipping, 'absolute' means that grossd 
always puts update on the first buffer, 'weighed' (or something) would 
place more suspicious triplets in the first buffer, and less suspicious 
triplets nearer to skip limit, so that the greylisting time would be 
relative to susp_weight.

If you set skip_buffers = 16 (around 3 hours), skip_style = weighed
and block_threshold = 9, grey_threshold = 1, then grossd would calculate 
the filter to update like this

s = how many to skip, S = skip_buffers, w = total weight from the 
checks, b = block_threshold, g = grey_threshold

s = S * ( w - g ) / ( b - g - 1)

With the given config this would be

s = 16 * ( w - 1 ) / ( 9 - 1 - 1) = 16 / 7 * ( w - 1)

If w = 1, then s = 0, with w = 8, s = 16

With that, more suspicious hosts would wait more, and less suspicious 
would have to wait less.

Thoughts? This is not entirely trivial modification, as it means 
modifications in the code that has been very stable since early days. 
The good thing is that this is easy to stresstest if implemented.

-- 
   Eino Tuominen