The Server Maintenance Policy in intended to describe procedures and guidelines around how the Infrastructure Systems Team maintains the server infrastructure necessary to sustain SPU's Enterprise Applications & Services
Table of Contents
Effective Date: February 1, 2017
Last Review/Update: May 2021
The Infrastructure Systems Team manages over 260 virtual servers, the campus perimeter firewall, and several dedicated physical servers and storage platforms (Vidnet, DFS, Faith&Co). We heavily use server virtualization software along with machine image "templates" to standardize our server builds, automate the creation of newly requested servers, and dynamically manage compute and storage resources to best serve the SPU community. We currently offer Windows and Linux server builds in our virtual environment.
System Reviews and Updates
For systems and services that cannot be interrupted during the normal school year, IST updates these machines during the Christmas and Summer breaks. During this time, IST reviews and then updates these systems with all the necessary cumulative firmware, OS and application patches and updates. In addition, we conduct ad hoc assessments of needed system maintenance activities as recommended by the system vendor or industry advisories as noted below:
All SPU server builds are configured to install security patches automatically. Linux machines check for and install patches nightly; Windows machines check for and install patches weekly based on a staggered schedule defined by machine group policy. Perimeter systems are updated automatically as pushed from our firewall vendor.
Application and Firmware Patches
Application and firmware patches are reviewed as we're notified of their availability from the respective vendors. Our general process for application and firmware patches involves:
Immediate installation of high-level (zero day) security patches that are recommended and verified by the vendor;
Feature/functionality patches and step releases are applied as needed/recommended, but not necessarily immediately;
Unless there are extenuating circumstances, our goal is to keep systems on the latest major versions of software and firmware, with discretionary application of point/step releases between major revs. In most instances, major updates will be scheduled during the twice-annual lift rather than risk bringing systems down during times of peak utilization. Lower-risk step upgrades will be considered on a case by case basis.
We use PRTG to monitor over 2000 data points across our server fleet. These metrics include criteria such as:
Network availability (Ping)
Disk space / usage trends
Custom SQL Queries
We use this data to establish baselines for what is deemed "normal" behavior – we then have alerting configured so that when the metrics report data outside the norm, the branches of CIS responsible for the particular server / service are notified for further investigation and remediation.
Server log files are aggregated and copied off the servers directly to a centralized platform. This process and architecture is currently under review.