Friday, May 9, 2008

Attrition Rate SLA? Devil in the details

One of the most annoying aspects of outsourcing is Supplier FTE turnover. Especially with offshore firms in countries experiencing enormous growth and internal opportunity (India, for example), staff turnover can be a serious issue.

Crafting an SLA for attrition is a great way to monitor account consistency.

The actual calculation is fairly straightforward. I recommend rolling 12-month measurement windows, with some adjustment made in the beginning of the agreement so that the SLA can be implemented at Month 6 rather than waiting for a full 12 months of data.

The challenge is that, in practice, the Supplier wants to factor out some circumstances that, at best, may result in a misunderstanding between Client and Suppier regarding Attrition Rate; at worst enable outright manipulation of the Attrition Rate. Specifically, each party needs to decide how certain classifications of FTE will be included in the calculation:

- FTE who are partially re-assigned or support additional accounts
- FTE who are promoted or serving different roles in the account
- FTE who are terminated involuntarily
- FTE who take extended leave or absence (medical leave, maternity leave, vacation, sabbatical).

The devil is in the details on this one. Agreeing on what is in and what is out of the calculation can be a lifely negotiation, and has far reaching effects on managing the agreement.

Sunday, May 4, 2008

ITIL SLAs for Incident Management

Ever wonder what a reasonable SLA is for Incident Management? If you have a powerful Service Desk who can rapidly triage calls, SLAs become routine. There are additional SLAs, but here’s a good start for ITIL-like Service Desks. Note that ITIL v3 removes Request Management from Incident Management, so it’s now possible to design SLAs for diminishing numbers of Incidents
  • Severity Level 1 Incidents Resolved Within 4 Hours
  • Severity Level 2 Incidents Resolved Within 8 Hours
  • Severity Level 1 and 2 Incidents Responded to Within 30-minutes
  • Severity Level 3 Incidents Resolved Within Applicable Time Frame
  • Problems Resolved Within Applicable Time Frame
Believe it or not, those basic SLAs cover the vast majority of Incidents. Under ITIL, the multitude of SLAs are categorized into just a few, thus preserving penalty dollars for other metrics (like availability). Common definitions of Severity Levels

Severity 1 Incident
A Incident shall be categorized as a “Severity 1 Incident” if the Incident is characterized by the following attributes: the Incident (a) renders a business critical System, Service, Software, Equipment or network component un-Available, substantially un-Available or seriously impacts normal business operations, in each case prohibiting the execution of productive work, and (b) affects either (i) a group or groups of people, or (ii) a single individual performing a critical business function.

Severity 2 Incident
A Incident shall be categorized as a “Severity 2 Incident” if the Incident is characterized by the following attributes: the Incident (a) does not render a business critical System, Service, Software, Equipment or network component un-Available or substantially un-Available, but a function or functions are not Available, substantially Available or functioning as they should, in each case prohibiting the execution of productive work, and (b) affects either (i) a group or groups of people, or (ii) a single individual performing a critical business function.

Severity 3 Incident
A Incident shall be categorized as a “Severity 3 Incident” if the Incident is characterized by the following attributes: the Incident causes a group or individual to experience a Incident with accessing or using a System, Service, Software, Equipment or network component or a key feature thereof and a reasonable workaround is not available, but does not prohibit the execution of productive work.

Severity 4 Incident
A Incident shall be categorized as a “Severity 4 Incident” if the Incident is characterized by the following attributes: the Incident may require an extended Resolution Time, but does not prohibit the execution of productive work and a reasonable workaround is available.