Updated for 2023!
We've created an all new server maintenance checklist template for 2023.
Server maintenance is a tricky thing, there is a lot you need to keep an eye on. If you miss a single task required for the maintenance the outcomes can be disastrous for any business. Weeks wasted searching for something that could be prevented. It’s something hard to do by yourself.
But you don’t have to do it alone. Manifestly can help anyone in charge of it. With checklists you’ll just follow the steps made by you, helping you to get the work done perfectly. Even going back to earlier runs and check any information you need. In our platform, you can create any checklist you need to help you do things in the best order for you. We also have a public-facing API that is available for several common use cases. With it you can:
- Get your list of Checklist Workflows
- Create a new Checklist Runs
- Complete a Checklist Run Step
- Get your list of Team Members
With our tools, you can make your own checklist however you see fit. But we know doing something for the first time is difficult. Down below you’ll find some general tips to help you get started, as well as a checklist already made for daily, weekly, and monthly server maintenance, which you can modify to serve your needs. One last thing before you continue, we are available if you need any help with our API. Just contact us, we’ll be more than happy to help you.
“…Also, there should be a checklist for preventive maintenance as one for corrective. Plus, there should be an emergency plan to accompany the process.”
Libana Abdul, DevOps Engineer (LinkedIn: https://www.linkedin.com/in/libanaabdul)
- First and Last name of the Maintenance Technician
- The static IP address of server computer
- MAC address of server computer
- Maintenance Date
- Is it a daily, weekly, or monthly maintenance?
Data, Software and System checks
- Check backups are working
- Check and update OS
- If the kernel is updated, reboot the server
- Update your control panel
- Check and update applications
- Check remote management tools
- Remote console
- Remote reboot
- Rescue mode
- Check server usage
- If too high, get rid of any old or outdated software.
- If too high, troubleshoot CPU utilization
- If too high, troubleshoot RAM utilization
- If too high, troubleshoot network utilization
- Review user accounts
- Free up server storage space
- Change server passwords
- Perform a server malware scan
- Check fans and power supplies
- Check RAID fault tolerance
- Check for disk read errors
- Perform all driver, controller firmware, and storage management application updates
- Run system consistency check
- Replace any drives that have failed or are showing signs of failing
- Check cable integrity
- Cables are securely fixed at each connection point
- Cables are not twisted or under unnecessary strain
- Cables are all in good condition
- Check A/C unit at the facility
Server Maintenance Tips
Daily, Weekly, Monthly and more server checks
To properly maintain your servers, you need different checks made periodically. For example, you probably don’t need to check your backups every day. But if you wait more than a week to check on them, problems could arise and you wouldn’t be ready for them. You need different checklists for every kind of check you want to perform on your servers. So you should have a daily, weekly, and monthly checklist. We’ve seen cases where network administrators made even quarterly or annual ones. With Manifestly you can schedule daily, weekly, or quarterly workflow runs to ensure recurring tasks are completed on time. You’ll also get notifications on run activity, such as when a run has started.
Verify your backups are working
Be sure that your backups are working before making any changes to your production system. You may even want to run some test recoveries if you are going to delete critical data. Whilst you should already have automatic system backups scheduled regularly, these efforts are in vain if you haven’t even tested if the backups are doing what they’re supposed to be doing. Even checking that you have the correct server location is something to keep in mind.
Check application updates
Web applications account for more than 95% of all security breaches happening in the world. Ensuring you’re using the most recent version guarantees that any problem they’ve corrected is no longer an issue for you. Remember to perform a complete backup before updating, just in case something breaks. With our Zapier integration, you can automate runs to start whenever a new update rolls out to your applications.
Check disk usage
Keep your production system clean, they’re not an archival system. Delete old logs, emails, and software versions no longer used. Keeping your system free of old software limits the security issues that can appear. The less data you have, the faster it’ll be to recover said data. Don’t let it exceed 90% of its disk capacity. Either reduce usage or add more storage. A big problem for your servers is that if any partition reaches 100%, your server may stop responding, database tables can corrupt and data may be lost.
Check server utilization
Review your server’s disk, CPU, RAM, and network utilization. Be proactive if they are nearing their limits. You may need to plan on adding resources to your server or migrating to a new one. With the help of most monitor tools, you can set them to send you a notification when any usage reaches a certain threshold. Therefore, this will trigger a run for your team.
Update Your OS
Linux systems release updates frequently. So, it’s hard to keep track of all of them. This is why you should use automated patch management tools. Also, have monitoring in place to alert you when a system is out of date. If you are not updating your server or even updating them manually, you may miss important security updates. As a result, your servers will be at risk. Hackers often scan for vulnerable systems within hours of an issue being disclosed. So rapid response is the key to safety.
If you cannot automate your updates, then create a schedule to update your system. We believe weekly checks work best, but for older OS versions you can do them monthly. You need to monitor release notices from your distribution so you are aware of any major security threats. To help you, you can set a run to start every time a new update comes live.
Changing passwords on a regular basis reduces the danger of live passwords falling into the hands of a hacker. You should change passwords every 3 to 6 months. But if you have given out passwords to others for any reason, consider changing them after the people you gave it to are done with their work. With our Departments & Locations feature you can set this activity private so nobody that doesn’t need to, sees when the passwords are changed. Therefore, your servers will be safer.
Update your Control Panel
Control panel software (such as cPanel) require manual updates. When updating cPanel, only the control panel is updated. You still need to update the applications that it manages. As an example, if you are using WHM/cPanel, you must manually update PHP versions to fix resolved issues.
You should be monitoring your RAID status. Just a single disk failure can cause a complete system failure. Even if data says that roughly 1% of servers per year present disk failure, a complete system failure will turn a simple drive replacement into a disaster recovery scenario. That’s not something you want to deal with. As with the application updates, you can automate runs to start whenever a RAID alarm goes off.
“Verifying the RAID status highly depends on the kind of server you have, but, usually you need to do it very, very often.”
Libana Abdul, DevOps Engineer (LinkedIn: https://www.linkedin.com/in/libanaabdul)
Check remote management tools
If your server is co-located or with a dedicated server provider, you will want to check that your remote management tools work. Remote console, remote reboot, and rescue mode are called the 3 essential tools for remote server management. You need to make sure that these will work in case you need them.
Check for hardware errors
Hardware problems are not common but create a big issue. So you need to review the log for any hardware problems like disk read error, network failure, overheating notices. Even if these problems are rare, you don’t want to risk your server because you weren’t cautious enough.
Check cable integrity
Cable wear and tear is a big factor that is often forgotten when determining points of system failures. It’s a tedious task, but it needs to be included in the routine maintenance process for all wire-dependant hardware.
Review user accounts
Remove any user that’s no longer relevant. Staff changes, client cancellations, or any other user changes apply. That data is not just a security risk. It also can present legal problems. Depending on your service contracts, you may not have the right to retain a client’s data after they have ended services.
Be Smart About Scheduled Maintenance
Never schedule maintenance for a time when you won’t be able to get help. Always think ahead to avoid any national holidays or weekends. Instead, plan to execute during the week, when you can fall back on expert help if something goes wrong.
Check system security
We suggest a periodic review of your server’s security using a remote auditing tool such as Nessus. Regular security audits serve as a check on system configuration, OS updates, and other potential security risks. We recommend you do this monthly. The minimum we believe is safe is at least 4 times a year.
Perform a server malware scan
It should be part of your routine process to run a malware check on your server machines. ClamAV is a useful tool for scanning against known databases of viruses and malware for Linux machines.
Typical business networks run on TCP/IP. An incorrect TCP/IP setting results in address and routing problems. Always ensure that your server’s TCP/IP settings are correct.
Remember that we have a lot of features in place to help you when creating a run:
- With our Role Based Assignments, you can have the workflow made for server maintenance be automatically assigned to you and anybody else working in maintenance.
- With our platform, you can improve and update your workflows easily. But don’t do it alone. With the help of the Process Improvement through Feedback feature, your team can give you feedback after completing a run.
- Use our Organize With Tags feature to classify the daily, weekly, monthly, and specific maintenance checklists.
- Leave comments on your runs to see that info later when needed.
Also with our open API, you can develop any integration you need. Keep in mind we already have 1000+ through Zapier so check those out first.
“[Checklists] Are completely necessary so that every step is properly followed, they’re shared knowledge.”
Julian Cuevas, DevOps Engineer at YellowPepper. (LinkedIn: https://www.linkedin.com/in/juliancuevas/)
https://www.plesk.com/blog/featured/effective-server-maintenance/ https://blog.etech7.com/what-is-a-server-maintenance-plan-and-why-is-it-important https://www.rackaid.com/blog/server-maintenance-checklist/ https://dev.to/wedigtech/handy-tips-for-server-maintenance-in-2018-fc
The 15 Point Server Maintenance Checklist IT Pros Depend On