[Gross] Rotate stuck

Eino Tuominen eino at utu.fi
Sun Sep 27 11:26:05 EEST 2009


Eino Tuominen wrote:
> Steve Wardle wrote:
>> On Fri, 18 Sep 2009 19:00:20 +0300
>> Eino Tuominen <eino at utu.fi> wrote:
>>
>>> Hi,
>>>
>>> pstack looks fine to me. This is how the thing should work:
>>>
>>> 	...
>> Hi Eino,
>>
>> Thanks for the explanation.
>>
>> I installed 1.0.2 this morning but I'm still seeing the same problem.
>>
>> There is no "received rotate command" in the log once the rotation is "stuck".
>>
>> I'll send you the log off list.
> 
> [ a lot of debugging off list ]
> 
> Can anybody replicate the issue Steve is seeing? I know of at least to 
> major sites using Gross as a milter, but they are not running on 
> Solaris. One is running on NetBSD and another one on FreeBSD, I think.
> 
> The right thing to do is to separate milter from grossd and run it as a 
> separate process.

Hello,

I just had another look of pstack of the grossd process. This is from 
Steve's main thread:

----------------  lwp# 1 / thread# 1  --------------------
  ff0cc21c pause    ()
  ff379cc0 sleep    (ff390000, ffbff820, 4, 0, 12c, 3dbe0) + f4
  0001aaec main     (ffbff740, 2, ffbffcac, 2a400, 2a400, 2a400) + 320
  00014590 _start   (0, 0, 0, 0, 0, 0) + 5c
-

And this is from a running grossd of our own MTA:

-----------------  lwp# 1 / thread# 1  --------------------
  ff19c648 nanosleep (ffbff7a0, ffbff798)
  ff09dc5c sleep    (1, a22dd, 0, 488ebd, 1, ff3cdb8c) + 58
  0001f51c main     (1, ffbffcc4, ffbffccc, 47000, 0, 0) + a3c
  00014110 _start   (0, 0, 0, 0, 0, 0) + 108
-

There it is, Steve's grossd is in pause(), but what is causing that I 
just cannot understand. It looks like sleep() is using single-thread 
implementation of sleep (which I think uses pause() and alarm()). Check 
out the manpage of sleep() on Solaris 10.

Could you send me output of configure and make, I'm interested to see 
what options get used?

What you could do is to replace sleep(1) at the end of the gross.c with 
this code segment:

sleeptime.tv_sec = 1;
sleeptime.tv_nsec = 0;
do {
     ret = nanosleep(&sleeptime, &sleepleft);
     if (ret) {
         /* sleep was interrupted */
         sleeptime.tv_sec = sleepleft.tv_sec;
         sleeptime.tv_nsec = sleepleft.tv_nsec;
     }
} while (ret);

And of course add

struct timespec sleeptime, sleepleft;

at the beginning of the main() function.

There's also one sleep() in syncmgr.c (which looks a bit weird, it looks 
like it should be replaced with a mutex), but that gets called only when 
a sync peer connects.

-- 
   Eino Tuominen



More information about the Gross mailing list