Automatically restart job service?

Something about our setup (antivirus, firewalls, network instability?) is occasionally stopping our job service from running. It's not a big deal - as long as I catch it super quickly, login to the server, and restart it. It's usually following server patching weekends, but we've had a handful of instances where there was no logical cause.

Is there any way to automatically restart the job service anytime it goes down? There's no unplanned reason we'd stop the service seeing as it drives our e-mails and notifications (the most end user facing items) and everything else about our community.

I'm hoping to go on a short vacation end of the month and I'm petrified about leaving my community alone!

  • Unfortunately, this would be outside of the scope of what the platform itself can perform, though someone else may have an idea about how to monitor and perform this via third party.

    I am curious about what may be causing the job service to stop. Are you seeing any exceptions when this occurs? And when restarted, do you see any jobs in a stuck/hung state? You can check in Administration > Jobs, and look for any jobs that claim to have been running for a long period of time.

  • We haven't had a failure in awhile, but the only job that I currently see hung up is the User Recommendation Calculation. It says "Running since 22 hours ago". The next time this job stop happens, I'll check here first.

    Outside of that, I haven't come across any errors or exceptions that explain the job service not running. I think it's something outside the platform causing the interruption. We definitely have an overactive antivirus system and our network lacks stability.

  • We haven't had a failure in awhile, but the only job that I currently see hung up is the User Recommendation Calculation. It says "Running since 22 hours ago". The next time this job stop happens, I'll check here first.

    Outside of that, I haven't come across any errors or exceptions that explain the job service not running. I think it's something outside the platform causing the interruption. We definitely have an overactive antivirus system and our network lacks stability.
