The Issue of Pricing Software Using T-shirt Sizes

Share on:

I have recently been involved in a project whose purpose was the migration of a legacy software application primarily based on TIBCO BusinessWorks to a new, pure Java-based platform. In order to produce a complexity baseline, I asked the team to provide a T-Shirt size (S, M, L or XL) for each of the legacy platform’s capabilities. The aim of this exercise was to have a cost factor that we could use to estimate the price of migrating the application to the new platform.

The initial draft had a subset of capabilities that were judged by the team as follows:

Size Capability
S Suspend Customer
M Verify Customer Address
XL Change Order (Pre-seller acknowledgement)
XL Change Order (Post-seller acknowledgement)

A few weeks later, I asked the team to provide the number of days it would take to implement each T-shirt size; they said that they would rather provide an estimation per each specific capability as opposed to give a general average number per T-shirt size—this was already a bad sign. At the end of the process I got the following:

Size Capability Man days
S Suspend Customer 5
M Verify Customer Address 10
XL Change Order (Pre-seller acknowledgement) 80
XL Change Order (Post-seller acknowledgement) 5000

I was shocked. How come an XL capability is estimated to require 80 man days and another 5000? I asked for evidence: UI screens, code, database tables…every element of proof that justifies such an inflated number.

It turned out that the team was right. The Change Order (Post-seller acknowledgement) capability involved over 300 discrete use cases. The difference between changing an order when the sellers had not yet acknowledged it, versus this scenario, in which they had, made a radical difference. For example, there was the need to contemplate challenges such as sellers that had a “no cancellation after shipment” policy but that had failed to update the order’s shipment status.

Whereas the capabilities judged S, M, and L had a fairly consistent relative complexity, the developers dumped anything above “trivial” in the XL bucket without regard to how complex a given capability was from one another. At this point, I said to the legacy’s platform team lead: “This means that the 4-point, T-shirt scale is not large enough to capture the wide differences between one capability and another. We need either a larger scale or to break down the capabilities into more atomic ones”.

The conclusion is that a T-shirt size model in software is only appropriate when the set of properties (capabilities, use cases, features, etc.) to which the scale is applied have a relatively normal distribution in terms of the complexity. That is to say, if the complexity index (say, in McCabe terms) for an XL capability is 100, another XL capability should not be 150 or larger.

Note: The domain and scenario have been changed from the original for illustration purposes.