Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction



Info
titleServer SLA Policy

The Server SLA Policy in intended to describe procedures and guidelines around how the Infrastructure Systems Team maintains the server infrastructure necessary to sustain SPU's Enterprise Applications & Services

Download Policy as PDF

Table of Contents

Table of Contents
maxLevel2
indent20px

Effective Date: February 1, 2017

Architecture Overview

The Core Services Team manages over 270 virtual servers along with several dedicated physical servers and storage platforms (Vidnet, DFS, Faith&Co).  We use VMware ESXi as our preferred virtual environment hypervisor.  We use a "template" process in VMware to standardize our server builds, to automate the creation of newly requested servers.  We currently offer Windows and Linux server builds in our virtual environment.

OS Security Patches

All SPU server builds are configured to install OS security patches automatically.  Linux machines check for and install patches nightly; Windows machines check for and install patches nightly when possible, but no less frequently than weekly (during our Wednesday morning downtime).

Application and Firmware Patches

Application and firmware patches are kept up to date as we're notified of their availability from the respective vendors.

Firmware Patches

Process under reviewOur general process for application patches involves:
- Immediate installation of security patches that are recommended and verified by the vendor;
- Feature/functionality patches and step releases are applied as needed/recommended, but not necessarily immediately;
Unless there are extenuating circumstances, our goal is to be on the latest major versions of software and firmware, with discretionary application of point/step releases between major revs.

Backups

All SPU servers run daily backups; please see our Backup and Recovery Policy for more detail

Monitoring

We use PRTG to monitor over 2000 data points across our server fleet.  These metrics include criteria such as:

  • Network availability (Ping)
  • Disk space / usage trends
  • CPU Load
  • Memory Usage
  • Website Availability
  • Custom SQL Queries

We use this data to establish baselines for what is deemed "normal" behavior – we then have alerting configured so that when the metrics report data outside the norm, the branches of CIS responsible for the particular server / service are notified for further investigation and remediation.

Log Files

Server log files are aggregated and copied off the servers directly to a centralized platform.  We currently do not have any active monitoring / alerting on this data. This process is currently under review.