Persistent problems...
 
Notifications
Clear all

Persistent problems after upgrade to 24.9.5

9 Posts
4 Users
0 Reactions
179 Views
Myriad
(@myriad)
Joined: 13 years ago
Posts: 33
Topic starter  

Ok, I have had nothing but grief after upgrading to 24.9.5 and the problems are still continuing weeks after the upgrade. The mail server randomly stops responding with "unable to start zmconfigd" messages and when I restart the server I get this:

Host mail.xxx.ca
        Starting directory server...Done.
        Starting config service...Failed.
Starting zmconfigd...Failed to start zmconfigd.


        Starting mailbox...Done.
        Starting memcached...Done.
        Starting proxy...Done.
        Starting amavis...Done.
        Starting antispam...Done.
        Starting antivirus...Done.
        Starting opendkim...Done.
        Starting mta...Done.
        Starting stats...Done.
        Starting service webapp...Done.

So I issue the following command to check the status:

zextras@mail:~$ zmcontrol status
Host mail.xxx.ca
        amavis                  Running
        antispam                Running
        antivirus               Running
        directory-server        Running
        mailbox                 Running
        memcached               Running
        mta                     Running
        opendkim                Running
        proxy                   Running
        service webapp          Running
        service-discover        Running
        stats                   Running
        config service          Running

And lo and behold, the server reports that it is running normally. But it's not, because when we look at the logs we see:

Oct 25 10:33:11 mail zmconfigd[1499788]: Command not defined for service-discover
Oct 25 10:33:07 mail zmconfigd[1499788]: Command not defined for directory-server
Oct 25 10:32:50 mail zmconfigd[1499788]: All configs fetched in 0.11 seconds
Oct 25 10:32:50 mail zmconfigd[1499788]: Fetching All configs
Oct 25 10:31:49 mail zmconfigd[1499788]: All restarts completed in 0.00 sec
Oct 25 10:31:49 mail zmconfigd[1499788]: All rewrite threads completed in 0.02 sec
Oct 25 10:31:49 mail zmconfigd[1499788]: Watchdog: service antivirus status is OK.
Oct 25 10:31:46 mail zmconfigd[1499788]: Command not defined for service-discover
Oct 25 10:31:43 mail zmconfigd[1499788]: Command not defined for directory-server
Oct 25 10:31:16 mail zmconfigd[1499788]: All configs fetched in 0.36 seconds
Oct 25 10:31:16 mail zmconfigd[1499788]: Fetching All configs
Oct 25 10:30:15 mail zmconfigd[1499788]: All restarts completed in 0.00 sec
Oct 25 10:30:15 mail zmconfigd[1499788]: All rewrite threads completed in 0.13 sec
Oct 25 10:30:14 mail zmconfigd[1499788]: Watchdog: service antivirus status is OK.
Oct 25 10:30:10 mail zmconfigd[1499788]: Command not defined for service-discover
Oct 25 10:30:04 mail zmconfigd[1499788]: Command not defined for directory-server
Oct 25 10:29:48 mail zmconfigd[1499788]: All configs fetched in 0.10 seconds
Oct 25 10:29:48 mail zmconfigd[1499788]: Fetching All configs
Oct 25 10:28:46 mail zmconfigd[1499788]: All restarts completed in 0.00 sec
Oct 25 10:28:46 mail zmconfigd[1499788]: All rewrite threads completed in 0.01 sec
Oct 25 10:28:46 mail zmconfigd[1499788]: Watchdog: service antivirus status is OK.
Oct 25 10:28:43 mail zmconfigd[1499788]: Command not defined for service-discover
Oct 25 10:28:42 mail zmconfigd[1499788]: Command not defined for directory-server

So I issued a command to register the service, like so:

zextras@mail:~$ zmprov ms `zmhostname` -zimbraServiceEnabled zmconfigd -zimbraServiceInstalled zmconfigd

Doesn't work though, as I am still getting the same log errors as before, which can only mean that the server WILL AGAIN crash at some indeterminate point in the near future! This is really becoming a serious problem for me as I am losing faith in Carbonio as a stable platform. Any ideas how I can fix this?


   
Quote
(@emejia)
Joined: 2 months ago
Posts: 1
 

Could you provide more information about your environment?. And also provide some logs about the time the crash occurs?


   
ReplyQuote
Myriad
(@myriad)
Joined: 13 years ago
Posts: 33
Topic starter  

Just completed an uneventful upgrade to 24.9.7. Let's hope this one works out better than the last upgrade (I'm looking at you, broken ldap module!). BTW, I upgraded with all services running.

This post was modified 1 month ago by Myriad

   
ReplyQuote
(@stefanodavid)
Joined: 3 years ago
Posts: 227
 

@myriad

glad to hear it worked smoothly. Keeping services running (especially OpenLDAP) during the upgrade is the way to go now.


   
ReplyQuote
Myriad
(@myriad)
Joined: 13 years ago
Posts: 33
Topic starter  

@stefanodavid

Posted by: @stefanodavid

Keeping services running (especially OpenLDAP) during the upgrade is the way to go now.

That should be explicitly mentioned in the upgrade instructions btw.

 


   
ReplyQuote
(@stefanodavid)
Joined: 3 years ago
Posts: 227
 

@myriad

we removed the zmcontrol stop command from the upgrade procedure, which was responsible for stopping (among others) the OpenLDAP service, which we believe suffices. If you don't agree, we'll be happy to discuss this further.

 


   
ReplyQuote
Myriad
(@myriad)
Joined: 13 years ago
Posts: 33
Topic starter  

Happy to. I used this upgrade guide which does not explicitly state that you should keep your server running before you upgrade to 24.9.7. This is at odds with previous upgrade which required server shutdown, so it might confuse people. I only kept the server running as I upgraded because I remembered my nightmare from the previous upgrade with the broken LDAP module, where the fix was to keep your server running when you re-installed the fixed LDAP.


   
ReplyQuote
(@sigtrap)
Joined: 1 year ago
Posts: 38
 

Step one could be:

1. Verify that Carbonio is running and no critical errors with

$ zmcontrol status

# systemctl status carbonio*


   
ReplyQuote
(@stefanodavid)
Joined: 3 years ago
Posts: 227
 

@myriad

@sigtrap

Fair points. I am going to start working on 24.12 docs this week, and I'll take your suggestions into account. Well, unless we don't change something else... 🙂

 


   
ReplyQuote