Introduction

Architecture Overview

The Infrastructure Systems Team manages over 260 virtual servers, the campus perimeter firewall, and several dedicated physical servers and storage platforms (Vidnet, DFS, Faith&Co). We heavily use server virtualization software along with machine image "templates" to standardize our server builds, automate the creation of newly requested servers, and dynamically manage compute and storage resources to best serve the SPU community. We currently offer Windows and Linux server builds in our virtual environment.

System Reviews and Updates

For systems and services that cannot be interrupted during the normal school year, IST updates these machines during the Christmas and Summer breaks. During this time, IST reviews and then updates these systems with all the necessary cumulative firmware, OS and application patches and updates. In addition, we conduct ad hoc assessments of needed system maintenance activities as recommended by the system vendor or industry advisories as noted below:

Security Patches

All SPU server builds are configured to install security patches automatically. Linux machines check for and install patches nightly; Windows machines check for and install patches weekly based on a staggered schedule defined by machine group policy. Perimeter systems are updated automatically as pushed from our firewall vendor.

Application and Firmware Patches

Application and firmware patches are reviewed as we're notified of their availability from the respective vendors. Our general process for application and firmware patches involves:

Immediate installation of high-level (zero day) security patches that are recommended and verified by the vendor;
Feature/functionality patches and step releases are applied as needed/recommended, but not necessarily immediately;

Unless there are extenuating circumstances, our goal is to keep systems on the latest major versions of software and firmware, with discretionary application of point/step releases between major revs. In most instances, major updates will be scheduled during the twice-annual lift rather than risk bringing systems down during times of peak utilization. Lower-risk step upgrades will be considered on a case by case basis.

Backups

All SPU servers run daily backups; please see our Backup and Recovery Policy for more detail

Monitoring

We use PRTG to monitor over 2000 data points across our server fleet. These metrics include criteria such as:

Network availability (Ping)
Disk space / usage trends
CPU Load
Memory Usage
Website Availability
Custom SQL Queries

We use this data to establish baselines for what is deemed "normal" behavior – we then have alerting configured so that when the metrics report data outside the norm, the branches of CIS responsible for the particular server / service are notified for further investigation and remediation.

Log Files

Server log files are aggregated and copied off the servers directly to a centralized platform. This process and architecture is currently under review.

Server Maintenance Policy