Dennis Adams Associates Limited. ensuring that Information Technology systems are Production-Ready.

Management Paper - IT Production made simple.

This article first appeared in the February 2007 issue of Quality World magazine.


Key control

Commercial IT systems impact our lives every day – but how best to manage them? Dennis Adams discusses some of the challenges of managing IT production, and how they can be addressed.

No doubt you are the user of a PC or workstation. You might use a specialised financial, enterprise planning or HR software package, or maybe a marketing, sales, invoicing, or scheduling system? Even if your use of IT is confined to email and the internet, you are still relying on IT production infrastructure on a day-by-day (if not minute-by-minute) basis.

Behind the scenes, each of these systems require ongoing specialised monitoring, maintenance and support. The purpose of this article is to provide a high-level view of some of the challenges involved in managing IT production and to outline some of the techniques which can be used to address them.

What is IT production?

The term IT production refers to the IT infrastructure (servers, applications, disk storage, networking etc) that are used to provide an ongoing service to a business. This is in contrast to IT development which is the part of the organisation that is responsible for writing or enhancing software applications.

As with any product, IT applications are subject to a life cycle (see figure 1), which comprises of design, development, test or deployment, before handover to production.

Life cycle for IT applications: design to development to test and deployment, then on to production
Figure 1. The application life cycle for IT applications

The timescales for this life cycle are significant. Whereas application development may take months or years, the production part of the life cycle is typically many times longer – there are still companies today with IT systems dating back 20 or 30 years. Studies have shown that the cost of managing IT production is significantly more than the cost of development.

To a non-expert, the quantity of equipment used in IT production can seem staggering. For example, a small insurance company using IT to manage its activities would probably use:

  • PCs for all employees
  • systems for processing claims and tracking premiums
  • financial planning systems
  • specialised risk analysis and fraud detection
  • reporting systems to generate letters, reports etc
  • marketing information
  • connections to enable brokers to electronically request data
  • proxy servers to enable employees acceptable browsing of the internet
  • a website to enable customers to browse prices
  • email servers
  • file and print servers

This would be called the production estate. All these machines need to be backed up on a regular basis so that data can be recovered in the event of a failure. Linking these machines together would be a spider's web of network cabling, routers and switches.

Figure 2 is a schematic of the production estate for a fictional company, which might include hundreds of computers. Since we want these servers to be accessible 24 hours-a-day, we need to have a team of support people working around the clock to identify and fix problems. Consequently, IT production can be a frantic, stressful occupation, as teams of skilled specialists struggle with the sheer weight of numbers, and the mathematical near-certainty that 'something will go wrong this week'.

Infrastructure and cross-connections from users to both specialist and genral application software, through to the company firewall and out to the Internet
Figure 2. The IT production infrastructure for a typical company

The fire-fighting syndrome

Suppose, for example, the company's website becomes unavailable due to a hardware error. This would be all hands to the pump to bring the system back online as soon as possible. Failure to do so would result in loss of money for the company, and a loss of reputation in the marketplace.

So IT production is often under pressure to provide quick fix solutions. But adopting a short-term approach can compromise the medium- and long-term goals of the organisation. For example, it may be tempting to continually maintain old, inefficient computers beyond their realistic lifespan, whereas the better long-term approach would be to replace them with more modern machines which would deliver a greater return on investment to the company. This is why management needs to take a strategic business approach to running IT production, which can be a complex process.

A systematic approach

It is possible to address this in a systematic way by looking at the four main interconnected aspects of IT production management. These are known by the acronym MOPS which represents metrics; operational tools; processes and procedures; and standards. These four aspects combine both management and technical practices. Metrics and processes are very familiar to quality managers, whereas operational tools (specialised IT software) and standards are highly technical.

Also important is to differentiate between the evaluation and evolution techniques. Metrics, such as key performance indicators, and operational tools, such as software to manage work orders and timesheets, are classic means of evaluating and tracking. On the other hand, the appropriate use of processes, procedures and standards can be used to evolve the way people are managed (see figure 3).

Figure 3. The different aspects of the MOPS techniques
ManagementTechnical
EvaluationMetricsOperational tools
EvolutionProcesses and proceduresStandards

Metrics – understanding the scope of the activity

To manage and control a system (a physical computer system or a procedural system such as an IT organisation) it is necessary to have some objective numerical indication of the current state of that system. A team that handles incoming IT user-support calls needs to keep track of how many calls are outstanding, priorities, time elapsed, service levels and so on so that they can prioritise their workload and, if necessary, request additional support resources.

It is ironic that the IT industry, guardian of most things numeric within the organisation, can often be lax in collecting and publishing numbers about its own activity. This is because, like any organisation, IT has a limited budget. Priority is often given to IT investment which is directly business-related. Indirect investment, such as improving the efficiency of IT support, is often given a lower priority. Yet collecting statistical data is essential for proper management, and also enables the department to:

  • explain to the business what the IT production team is actually doing
  • justify the current IT expenditure, and/or build a case for future investment
  • identify problem applications which require a disproportionate percentage of the total support manpower
  • be more efficient in its own internal planning processes

Experience suggests that it is necessary to capture metrics under the following categories: infrastructure assets; activity; technical behaviour; and business behaviour. Infrastructure assets refers to all physical assets which IT production is responsible for, such as the computer servers, disk storage and networking, as well as more intangible assets such as software licences. Each asset is part of the technical solution for one or more business functions. It is essential to put in place a mapping (or cross-index) between these assets, and the business function that they perform. Stored in a database, this will allow the team to easily identify all the business functions related to an asset or, conversely, find out all the assets which contribute to a specific business function.

Activity refers to the other significant investment made in IT production – HR. It is useful to include a timesheet system in order to gain a meaningful understanding of where the IT investment is being used.

The third and fourth categories – technical behaviour and business behaviour – are closely related. Technical behaviour includes statistics relating to the hardware itself: central processing unit consumption, disk space utilisation and so on. Business behaviour statistics typically represent business throughput or activity, such as the number of orders taken or the number of new customers.

Operational tools – software for managing IT

The second MOPS is operational tools or specialised software. There are a whole range of software tools available to the IT production manager which come under three major categories: metrics collection tools; technical support tools; and workflow tools. Tools for metrics collection enable the team to track the activity of IT production. Examples include timesheets, help desk, incident management, change control and asset management.

Technical support tools, on the other hand, facilitate the technical support function itself. Examples include monitoring tools for operating systems, networks, databases and applications, and tools for backup, recovery and business continuity.

The third category is workflow tools. A typical example is an automated help desk. This ensures that incidents are routed to the correct resolving team, where they can be marked for progress, and the resolution recorded.

Most IT professionals recognise that the sheer breadth of different tools required is an impossible task for any single software product. Therefore, they need a collection of tools, often from different vendors, which need to be linked together. These tools must have common naming and data structures in order to gain a single view of the estate. For example, if support incidents are recorded according to the financial cost centre, the same convention should be used when reporting on PC usage. Integrating tools together in this way can simplify reporting and minimise inconsistency.

Processes and procedures – implementing a system view

Quality managers are well aware of the value of appropriate processes in organising teams: reducing costs; predictability; repeatability; auditability; and verification. There are a number of different methodologies that look at process improvement in a quality context. One of the most well known of these is six sigma. MOPS encourages the use of these techniques, where applicable, to improve the processes within IT production.

A recommended framework for processes and procedures in IT production is the information technology infrastructure library (ITIL), developed by the UK government office of government com-merce.This is a non-prescriptive set of best practices gaining increasing recognition in the IT industry. The core books refer to the two disciplines of service delivery and service support. Service delivery includes procedures for availability, capacity, service level, continuity and IT financial management. The service support book encompasses:

  • service desk – which is, strictly speaking, describing a function, rather than a process
  • incident management – responding to problems raised by end-users and resolving them
  • problem management – proactive identifying the underlying cause of incidents
  • change management – standardising procedures for controlling change to the infrastructure
  • release management – controlling quality of releases which impact the infrastructure
  • configuration management – ensuring IT assets are appropriately tracked and managed

In addition to utilising the ITIL framework, MOPS also highlights process interfaces between IT production and the outside world, including: IT project handover; development project initiation and sponsorship of research and development.

Under the heading of IT project handover, procedures need to be put in place to ensure that IT production will able to run with the ball from the first day that a new application is handed over from development. One useful technique is to create a support checklist highlighting all technologies in a project (servers from Dell, database from Oracle, etc) so that IT production can ensure proper skills and training are in place before the live date for handover.

Development project initiation highlights the important lesson that modern application development must be a collaborative process from the start. Decisions on architecture design in the development stage will have massive repercussions on the later infrastructure choices. For example, a banking project might propose a new system for worldwide electronic messages requiring a costly investment in dozens of new machines. However, IT production may already be able to support a similar technology for additional messages with minimal additional investment.

The third area – sponsorship of research and development – can sometimes be neglected. It is best illustrated by a couple of examples:

  • a major operating-system upgrade is being mandated by the suppliers. IT production needs to validate that the new upgrade is capable of running existing applications, backup and recovery, monitoring and other infrastructure tools
  • a new type of network attached storage has been developed by an unknown supplier. IT production needs to assess whether purchasing this technology will reduce costs of ownership to the business, and at the same time deliver the performance and reliability needed

This research and development activity is 'funded', in the sense that it takes resources from the short-term day-to-day operational support role. The challenge for IT production managers is how to equitably balance the short- and long-term activities.

Standards – a statement of capabilities

In some organisations, the choice of technology for a new application can be driven by the development function. A product might be chosen because of the availability of useful tools that make the application development exercise much faster.

Although this approach can result in applications with low development costs, the support costs may be high or even prohibitive. It is essential to consider the IT support costs when making technology choices. Defining IT production standards can redress this balance. This is achieved through an IT production architecture role, to define a set of production readiness criteria against which any new or proposed application can be judged. With this definition, the production architect publishes a technology menu of production standards – a list of all the technologies that are currently supportable by IT production.

This represents a list of solution options for IT developers. This is because they are confident that if they choose these technologies, they will be choosing infrastructure that represents the optimum cost of support for the application. These standards will change over time, for a number of reasons:

  • the introduction of new technologies or newer versions of current technologies which need to be added to the standards
  • the removal of old technologies that are no longer supportable in the future
  • mergers and acquisitions, which lead to the introduction of a new set of supportable standards, and the need to put in place a new roadmap of technologies

Implementing MOPS

When looking to implement MOPS, the approach has to be systematic. It should follow a classical project-management cycle which:

  • analyses the existing environment under the MOPS headings
  • identifies the gaps under each of these headings
  • prioritise
  • engages with the sponsors and the corporate business
  • creates an IT production strategy

Changes should be rolled out incrementally. Stability and reliability are, after all, the watchwords of successful IT production. It is vital to remember that any changes to that environment constitute a risk to the ongoing operations.

MOPS in practice

Each IT team is different, and the business context in which it exists differs from others (see case studies box). In many respects, the challenges faced by an IT production team are no different from those facing management in other disciplines

Biography

Dennis Adams is MD of Dennis Adams Associates Limited, a specialised consultancy that focuses on working with IT production managers.

Dennis can be contacted on e: dennis@dennisadams.co.uk or visit www.dennisadams.co.uk

Case studies

Metrics

A hosting company commissioned a database for a central register of the IT infrastructure that it was responsible for (servers, storage, network equipment etc) and each of the key applications that it hosted.

The company had several data centres in various parts of the UK. Shortly after the information was captured, the company experienced a major power outage that brought down one entire data centre over a weekend.

They needed to restart all the servers in a controlled, prioritised way. Some of these hosted mission-critical applications. Others were only required at periodic intervals (eg at the end of the month), or had backup systems. What was needed was a means of prioritising the start-up work. In a matter of minutes, the database was able to provide sufficient information to enable the company to appropriately prioritise this work, and minimise the financial loss that would be incurred by not having systems available when they were needed.

Operational tools

An international banking organisation introduced proactive monitoring software to give them an advance warning of servers likely to run out of disk space.

One server, critical to US trading, generated a threshold alert just before 4 July – a holiday in the US but normal working day in the UK.

Armed with this information, the UK teams arranged for the server to be taken out of commission on 4 July, address the problem, and reinstated for the following day. Consequently, all work could be done in normal business hours and the bank experienced no computer outage.

Processes and procedures

A software company introduced more rigorous change control process gates for the introduction of software changes into production.

Each software change (or release) now needed additional administrative work and testing before it could be deployed into production. This was seen as unnecessary bureaucracy by the developers, who typically gauge their success by the number of software releases they have been able to hand over.

However, it soon became apparent that the additional testing requirements were resulting in far higher quality of software, which broke the negative cycle of continuous release, 'patch' and re-release. Freed from the back-log of fixing bugs which had been introduced in earlier releases, the developers were able to concentrate their effort on adding new business features such as better search routines, more efficient validation, faster integration and so on.

The net result was improved code quality, a reduction in support costs, and improved customer satisfaction.

Standards

An international company was in the process of searching for a new software product to address key business needs. They interviewed a number of software houses, looking for an off-the-shelf package.

However, when the IT production architecture team looked at this package, it demonstrated that it would have been unsupportable in production. For example, the package lacked any ability to take a point-in-time backup – essential if the system needed to be restored in the case of a hardware failure or a major disaster.

In this case, the definition of production ready criteria saved the organisation hundreds of thousands of euros that may have been spent on the wrong product, notwithstanding the potential risk to its business and reputation.

PDF copy of this article, as it first appeared in the February 2007 issue of Quality World magazine and later on the Chartered Quality Institute website - QW_ keycontrol.pdf - 140 KB.

Link to original article in Quality World on the Chartered Quality Institute website.

Other Management Papers


    Home     Search     About Us     News Reviews     Book Reviews     Management     Technical     Register

Dennis Adams Associates
114 Pinner View, Harrow
Middx. UK. HA1 4RL
tel: +44 (0)845 055 8935
fax: +44 (0)845 055 8935
email:info@dennisadams.co.uk
http://www.dennisadams.co.uk
(c) Dennis Adams Associates Limited: 2004-2010