CIS Service Disruption Notification Procedures
Statement and Purpose
Notification Procedures
Routine Updates, Patches, and Operational Tasks
CIS review and approval
- These types of tasks should be reviewed in advance by CIS Directors and teams. If the type of task is likely to be noticed and/or changes the basic operations or function of a system or resource, the recommended changes should be reviewed during the weekly CIS Change Control meetings (generally Tuesday morning).
- Broad categories of routine tasks might be reviewed and approved only occasionally. Only when there are substantial changes in the type of activity or level of impact would they require additional review.
- The tasks in this category might fall into several maintenance areas including security, performance, application patches from vendors, software or hardware updates, drivers, system level tools (SMS, ePO, etc…), data backups or restores.
Level of disruption
- Routine tasks affecting Critical, Essential or Departmental resources (defined below) should be scheduled for the Weekly Maintenance Window (see below) whenever possible.
- Routine tasks that impact individual or personal computing resources should be scheduled for off-hours (nightly reboot, system scans, patches and updates) whenever possible.
- The changes or impacts that a normal user would experience should be predictable, requiring minimal training or explanation, and not likely to seriously impact work or productivity.
Reasonable testing, planning and coordination
- It is expected that tasks in this categories are reviewed and tested, prior to being deployed on wide-scale levels. Testing within CIS and limited departments may be needed before pushing out to all of campus.
- Reasonable coordination for these routine tasks should be completed with other CIS teams and activities that might be planned. Cross-team coordination is maintained via the weekly CIS Change Control meeting.
Notification procedures
- Tasks that fall into this category should be expected and common-place to most campus users. Additional user communication is not required. Regular contact and input from campus users is important to assess whether the operational tasks performed are predictable and expected. When additional communication is needed, focus the information narrowly to the audience that will be impacted directly. Specific distribution lists can assist with this.
- There may be the need for an occasional “general update” to campus users from CIS regarding the types of tasks that are routinely performed in this category. Such tasks are communicated to campus via the CIS Technology Blog.
- If a specific update or task has the potential for broader impact, but is still categorized as routine or operational, a courtesy communication to !Help, !SDMG, and !Security Info.
- Notifications should be posted prior to the action. Follow-up messages may also be used to provide updates.
Planned System Maintenance or Disruption
CIS review and approval
- Items in this category should be reviewed and approved in advance by CIS Directors and teams. Significant upgrades, changes or modifications should be reviewed and approved during weekly CIS Change Control meetings.
- Tasks in this category may include items to be performed during the regular Weekly Maintenance Window, or that must be planned for other times.
Level of disruption
- Planned system maintenance for Critical, Essential or Departmental resources should be scheduled for the Weekly Maintenance Window whenever possible. If it is not possible to utilize the maintenance window, the task should be scheduled at times least disruptive to the majority of campus users. This might require major upgrades to be performed on weekends, holidays or other off-hours times. The academic cycles and time of the school year should also be considered in planning disruptions.
- Planned system maintenance for individual/personal resources should be coordinated with individual users but may be possible during normal working hours.
- Major upgrades that impact Banner (or related systems) should be reviewed in advance by the SDMG group.
- Major upgrades that impact Departmental resources should be reviewed in advance by departmental managers or directors.
Reasonable testing, planning and coordination
- It is expected that tasks in this categories are carefully reviewed and tested, prior to being deployed on wide-scale levels. Testing within CIS and limited departments may be needed before pushing out to all of campus.
- Reasonable coordination for these routine tasks should be completed with other CIS teams and activities that might be planned.
Notification procedures
- Reminders and notification about the Weekly Maintenance Window should be provided on an occasional basis to campus users. A web page describing the schedule, types of disruptions and potential impacts are maintained within the JIRA ticket tracking system.
- Planned system maintenance tasks that are performed in the Weekly Maintenance Window should be reviewed for potential impact.
- Planned maintenance scheduled for Critical resources during the Weekly Maintenance Window with a predicted duration of 30 minutes or more requires notification in advance.
- Planned maintenance scheduled for Essential and Departmental resources during the Weekly Maintenance Window doesn’t require additional user notification.
- Planned system maintenance that takes place outside of the Weekly Maintenance Window should be announced in advance to the primary users of the resource that will be impacted.
- Planned maintenance scheduled for Critical resources OUTSIDE of the Weekly Maintenance Window with an expected duration of 30 minutes or more requires notification in advance.
- Planned maintenance scheduled for Essential and Departmental resources OUTSIDE of the Weekly Maintenance Window with an expected duration of 2 hours or more requires notification in advance.
- Planned system maintenance down times should be posted as a courtesy communication to !Help, !SDMG, and !Security Info.
- Follow-up messages may also be used to provide updates.
- A follow-up or “after the disruption has been resolved” message may also be appropriate in some circumstances.
Un-Planned/Emergency Disruptions or System Failures
CIS event management and assessment
- Unplanned disruptions or system failures may take place at any time. The steps to be taken will vary more widely than the planned tasks or events outlined above. Internal CIS coordination and communication is essential for the effective communication of unplanned disruptions.
- Disruptions during normal office hours may be reported to CIS is a wide variety of ways. Such disruptions may trigger system alerts and logs, they may be realized via a call to the CIS HelpDesk, or communicated by an individual staff member; or they could come by way of email inquiry. Regardless of the initial method of communication, a quick assessment to determine resource availability should be made. If it appears there may be a problem, contact to the support team should be made will be made via the CIS Teams Outages channel.
- If the event takes place during normal office hours, and it can’t be resolved quickly – a CIS Director and/or the Assistant Vice President for Technology Services/CIO (or the designate) should be notified. An ad hoc decision will be made about the severity of the problem, the level of event management required and the possible communication steps to be taken (see below).
- Disruptions during off-hours may be reported to CIS via email, by the user contacting Campus Security with an off-hours emergency, or may be observed by a CIS staff member. Regardless of the method of communication, a quick assessment to determine resource availability should be made.
- If the event takes place off-hours, and can’t be resolved quickly – a CIS Director and/or the AVP for Information Technology / CIO (or the designate) will be notified. The initial CIS representative will follow the procedures outlined in the CIS Off-Hours Support Policy.
Level of disruption
- Disruptions to Critical and Essential resources should be deemed an “emergency” and will require the immediate attention of the support team representative, team manager, or CIS executive in charge.
- Disruptions to Departmental or Personal resources will be reviewed on a case-by-case basis and will be handled as a high priority service request. Action required will be determined by the respective CIS support representative or team manager.
Reasonable testing, planning and coordination
- It is expected that events in this category are carefully reviewed and tested. The goal is to minimize the length of disruption, while assuring that the corrective action is sound and effective.
Notification procedures
- CIS uses the Teams Outages channel as the primary mans for internal service disruption notifications. Once CIS staff are notified, individual assignments for "Technical Lead" and "Incident (Communication) Management" are established. The responsibilities for these roles is further defined in the CIS Incident Response (CIS Internal) wiki.
- As soon as is reasonable following the identification of a unresolved disruption, a courtesy message to !Help, !SDMG, and !Security Info should be made. Follow-up messages may also be used to provide updates.
- Unplanned/emergency disruptions of Critical, Essential or Departmental resources during normal office hours with a predicted duration of 30 minutes or more require notification. If an estimated time for completion/resolution can’t be provided, a statement regarding the next update will be provided. (example: “The <blank> is down. Corrective action is not yet determined. An update will be provided in approximately 2 hours.”)
- Unplanned/emergency disruptions of Critical, Essential and Departmental resources during off-hours with a predicted duration of 2 hours or more require notification. If an estimated time for completion/resolution can’t be provided, a statement regarding the next update will be provided. (example: “The <blank> is down. Corrective action is not yet determined. An update will be provided in approximately 4 hours.”)
- Planned system maintenance down times should be posted as a courtesy communication to !Help, !SDMG, and !Security Info. Follow-up messages may also be used to provide updates.
- A follow-up or “after the disruption has been resolved” message may also be appropriate in some circumstances.
Alternate Notification Procedures
SPU-Alert System
The SPU emergency alert system is called “SPU-Alert” and is hosted at an off-campus vendor. The primary purpose of this system is to provide rapid dissemination of information in the event of a significant campus-wide event or incident. System related disruptions will not generally be allowed on the SPU-Alert system unless other communication mechanisms have failed or are unavailable.
The SPU-Alert system has the capability to use phones (on- and off-campus), email (on- and off-campus), plus SMS/Text messaging through cell phones to alert campus constituents. If some of those on-campus hosted systems (SPU phone system, campus network, etc.) are not available, the use of the SPU-Alert system could be used as an alternative notification option.
Notification Table
Type of Disruption | Critical Resources | Essential Resources | Departmental Resources | Personal Resources |
---|---|---|---|---|
Representative Systems Included: | Central Network Core Campus-wide Phones Email System SPUWeb Internet Connectivity (or similar) | Banner, Canvas, DataSync Department Shares Library Catalog Campus-wide VMS Wide spread network (or similar) | Argos Velocity Raiser’s Edge Recruitment + Cbord Advance Web Departmental Servers (or similar) | Personal computer |
Routine Patches, Updates, Operational tasks (reboots) | Use maintenance window | Use maintenance window | Use maintenance window | Nightly maintenance |
Planned: Maintenance Window | 30 minutes | 2 hours | 2 hours | Maintenance window Allowed |
Planned: Off-Hours | 30 minutes | 2 hours | Arranged with department | Arranged with individual |
Unplanned: During Normal Hours (emergencies only) | 30 minutes | 30 minutes | Contact department | Contact individual if needed |
Unplanned: Off-Hours | 2 hours | 2 hours | Contact department | Contact Individual if needed |
Population to Notify if Time frame exceeded | !All-SPU | !All-SPU or !Fac/Staff - based on service impact | Relevant department, or broader based on impact | Contact Individual if needed |
Notes:
- The time limits are approximate and provided to give general guidance.
- If a time-frame for completion/resolution can’t be provided in the notification, a statement regarding the next update will be provided.
- Narrowly targeted notification is preferred, when the disruption impacts only a limited audience (note other restricted lists).
Definition of Terms
Types of Disruptions
- Routine Updates, Patches and Operational Tasks The regular maintenance and upkeep actions that may cause short disruptions to systems on a frequent or scheduled basis. Nightly reboots of PC’s, installation of application and OS patches on personal computers, minor adjustments to network resources that only impact a few users.
- Planned Maintenance or Disruptions Work that can be scheduled in advance. Whenever possible, planned maintenance should be scheduled for the Weekly Maintenance Window (see below). If planned maintenance can’t be scheduled for the maintenance window it should be performed at times least disruptive to the majority of system users.
- Weekly Maintenance Window Planned maintenance of Critical and Essential servers, network devices, applications, and online resources, will be scheduled to take place between 4:00am-7:00am (Seattle time) on Wednesday mornings whenever possible.
- Unplanned/Emergency Disruptions or System Failures
Those events that result in systems not being available that can’t be planned in advance, are of an emergency nature, the result of hardware, software, network failures, or the result of accidental human actions.
Types of Resources
The categories described below are intended to provide general guidance for specific notification procedures. Domain services, storage systems, and network resources may all impact the availability of any or all systems and applications.
Critical Resources
Those few systems that are deemed critical for maximum availability. These resources include:
- Central Network Backbone systems (network devices in the PBX room)
- Campus-wide telephone service and connectivity to the public switched network
- Email resources (Exchange and related systems)
- SPUweb (primary web server)
- Internet connectivity
Personal Resources
Systems that might be assigned to an individual or department.
- Personal computer or laptop
- Personal phone line or network port
Essential Resources
Systems used by a wide range of campus users and support the day-to-day operations or activities of the university. While a disruption might be inconvenient, primary activities of the university can continue. These resources include:
- Banner Information System
- Canvas LMS
- File services for faculty and staff operations (data sync, departmental shares)
- Other file services (netstore, personal web space)
- Campus-wide voice mail service
- Library Catalog and Database Access
- Building level network resources (outside of the backbone/core)
Departmental resources
Systems that may be heavily used by one or two departments.
- Argos
- Velocity
- Raiser’s Edge
- Talisma
- Cbord
- Departmental servers and network resources
Notification Tools
- Microsoft Teams "Outages" channel is the primary means for CIS communication during service disruptions.
- Email communication via the pre-populated restricted distribution lists is the primary means of communicating service disruptions (planned, unplanned) to the campus community at large.
- General systems-status are reported via the CIS Systems Status page.
- In the event that disruptions prevent the use of email as the means of communication, alternate options such as voice mail, posting signs, or other choices may be considered (see below).
- Some systems provide a splash, log-in or intermediary page that allows disruption information to be communicated to the end user.
Off-hours
- The determination of whether a proposed or actual disruption is during “normal hours” or “off-hours” will vary depending on the type of resource impact. For many critical or essential resources the expectation is that service is available at all times (7 by 24). When planning a disruption of critical or essential resources, consideration should be given to determine a time of least impact on the broadest range of users.
- Currently, the Weekly Maintenance Window of 4:00am-7:00am on Wednesday mornings is a reasonable “off-hours” time.
- Other lower use times might include early morning hours on other days of the week, week-ends, vacations, or holiday periods.