Part three - Proactive Systems Performance Monitoring
Supporting a large population of 24,000 users is a daunting task for any IT department. With only a lean IT staff strength of eight, CED believes in adopting the right technology to do the right things right the very first time. As such, the Centre strives to automate the manual operational mundane tasks using the available technologies in the IT industry. The guiding principle here is that customer’s experience comes first and eLearning services must be highly available.
To monitor the health status of the core eLearning servers and network, CED uses Mercury Interactive’s SiteScope to track the performance of the hardware resources (e.g. CPU, Memory usage, etc) and the application software (e.g. the number of Apache web processes). SMS alerts are configured to inform key administrators of any server crash. Illustrations of Site Scope monitoring are in Figure 7 and Figure 8.
Other than this internal health status monitoring within the eLearning Servers Farm, CED also engages SUN Remote Monitoring Services to help track the status of all SUN servers hosted within the eLearning Operations Centre on a 24x7 basis. If any SUN server were to malfunction (e.g. hardware component failure), the remote monitoring service team will alert CED IT team via email and phone. If required, SUN would proactively dispatch engineers to our Data Centre.
CED hosts a total of 35 Windows Servers (Dell, HP, IBM, Compaq Blade and NEC Blade), 20 SUN domains/servers (1 SUN Fire F15K, 1 E10K and 5 V880), 2 Red Hat Linux Servers, 1 SGI server and 2 Apple XServe Servers in EOC. Besides these servers, CED is also responsible for the technical support of 110 PCs located at the Lecture Theaters, Audio Video Control Rooms, Smart Classroom, CED Instruction Room and Nanyang Executive Centre. In such a multi-vendors server environment, and with this huge number of PCs to administer, it is a challenging task when there is a need to perform security and operating system patches on the servers and the PCs. CED has acquired Patch Link software management tool to help perform effective patch management. It deploys patches and install them when the servers and PCs are not in used (e.g. 4am in the morning). A screen capture of Patch Link deployment is shown in Figure 9. A project plan is in progress to implement a software application called SystemSkan that tracks all PC activities (e.g. what application did a particular user start at a certain date/time stamped, document opened, etc). This would be useful if there is a need to perform audit trail related to suspicious PC activities, e.g. the hard disk was reformatted on a certain date and there is need to find out who was the last user. It is possible to retrieve past information as all activities are tracked and captured in a remote database. SystemSkan also provides the means for a helpdesk staff to take control of a remote PC to troubleshoot a problem instead of going down to the site.
In terms of servers and network security, 2 units of load-balnced Fortigate Intrusion Prevention Systems have been set up, in addition to the existing security firewalls at the Internet gateway. This provides intrusion prevention, content filtering and anti-virus scanning. CED also adopted managed security monitoring services from a local service provider that detects and proactively prevents potential wide area and zero-day security attacks to the eLearning Servers round the clock.
Figure 7: SiteScope monitors – eLearning Servers and Network
Figure 8: Details of eLearning Servers Parameters
Figure 9: Patch Link Management in operation