[Gross] grossd problems after solaris 10 patching
Jesse Thompson
jesse.thompson at doit.wisc.edu
Tue Aug 5 18:11:23 EEST 2008
After patching Solaris 10 on one server to the latest patch sets. We
did not change gross on either host.
Start grossd (-dDD) on the server that was patched.
_STDOUT shows:_
Tue Aug 5 09:37:42 2008 #1: config: ...
...
Tue Aug 5 09:37:42 2008 #1: Listening host address 1.2.3.4
Tue Aug 5 09:37:42 2008 #1: Sync listen address 1.2.3.4
Tue Aug 5 09:37:42 2008 #1: Peer 2.3.4.5 configured. Replicating.
Tue Aug 5 09:37:42 2008 #1: Sync peer address 2.3.4.5
Tue Aug 5 09:37:42 2008 #1: updatestyle: GREY
Tue Aug 5 09:37:42 2008 #1: adding dnsbl: ...
...
Tue Aug 5 09:37:42 2008 #1: Blocker host address 127.0.0.1
Tue Aug 5 09:37:42 2008 #1: doubling the space for message queues from
1 to 2
Tue Aug 5 09:37:42 2008 #2: delay queue manager thread starting
Tue Aug 5 09:37:42 2008 #2: waiting for messages
Tue Aug 5 09:37:42 2008 #3: Found the correct state file magic string.
Tue Aug 5 09:37:42 2008 #3: fixing bloom ring queue memory pointers,
offset=0
Tue Aug 5 09:37:42 2008 #3: bloommgr starting...
Tue Aug 5 09:37:42 2008 #4: Peer fd 6
Tue Aug 5 09:37:42 2008 #3: received rotate command
Tue Aug 5 09:37:42 2008 #4: Waiting sync connection on host
144.92.197.229 port 1112
Tue Aug 5 09:37:42 2008 #6: rotate thread starting
Tue Aug 5 09:37:42 2008 #6: rotation not needed
(no activity for 10 minutes)
_Truss shows minor activity_
/5: read(6, "\0\0\0\0\0\0\088", 8) = 8
/5: read(6, "\0\0\0\0\0\0 Q !\0\0\0\0".., 136) = 136
/5: clock_gettime(4, 0x00058814) = 0
/5: lwp_unpark(3) = 0
/3: lwp_park(0x00000000, 0) = 0
/3: time() = 1217947218
(similar activity is repeated in truss)
(snoop indicates that the peer is seeing the activity)
_The peer's log shows:_
Aug 5 09:37:42 grossd: [ID 702911 local3.info] #4: Got sync connection
Aug 5 09:37:42 grossd: [ID 702911 local3.info] #4: Examining peer config
Aug 5 09:37:42 grossd: [ID 702911 local3.error] #4: freeze queue 0
(continues to process client requests normally)
_After the 10 minutes, the first gross server shows:_
Tue Aug 5 09:47:50 2008 #4: Got sync connection
Tue Aug 5 09:47:50 2008 #4: Examining peer config
(then nothing)
(truss shows: lwp_park(0x00000000, 0) (sleeping...) )
_And the peer's log shows:_
Aug 5 09:47:50 grossd: [ID 702911 local3.error] #1299: Grossd shutdown
with exit code 2: pthread_create Bad file number
Aug 5 09:47:50 grossd: [ID 702911 local3.info] #3: bloommgr starting...
Aug 5 09:47:50 grossd: [ID 702911 local3.info] #3: received rotate command
Aug 5 09:47:50 grossd: [ID 702911 local3.info] #4: Waiting sync
connection on host 2.3.4.5 port 1112
Aug 5 09:47:50 grossd: [ID 702911 local3.info] #6: rotate thread starting
(no activity, including client requests, for the next X minutes)
(truss shows: lwp_park(0x00000000, 0) (sleeping...) )
_So, I kill (-9 is required) gross on the first server, and the peer's
log shows:_
Aug 5 09:54:16 grossd: [ID 702911 local3.info] #5: connection closed by
client
(still no activity in log or truss)
I kill (-9 is required) gross on the peer, start is back up, and it is
functioning fine as long as I don't start gross on the server that was
just patched.
I also tried deleting and recreating the state file on the upgraded
server... same problem.
_Also, I see from this morning, log entries like this, but I can't seem
to trigger it now on demand:_
Aug 5 07:59:57 grossd: [ID 702911 local3.error] #4: Send filters: Bad
file number
Aug 5 07:59:57 last message repeated 291 times
Aug 5 07:59:57 grossd: [ID 702911 local3.error] #4: Send filters: Bad
file number
Aug 5 07:59:57 last message repeated 402 times
Aug 5 07:59:57 grossd: [ID 702911 local3.error] #4: Send filters: Bad
file number
Aug 5 07:59:57 last message repeated 21 times
...
Any ideas?
Jesse
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3353 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.utu.fi/pipermail/gross/attachments/20080805/5af6d00f/attachment.bin>
More information about the Gross
mailing list