The US Department of Transportation (DOT) has for decades collected and reported “on-time” data for US airlines. This metric is widely reported in the media, and is regularly used as a proxy for “reliability”. Headlines like “Airline X is the most on-time for March” or “Airline Y stuck at the bottom of the on-time list again” are common.  So customers should flock to the on-time airlines and be wary of those less reliable, right? Not so fast, as the way the DOT measures on-time is far from accurate.

The DOT considers a flight on-time if it arrives 14 minutes or earlier compared to its scheduled arrival time. Historically, it has also not included international flights or flights sold by one carrier but operated by another (“codeshare” or “commuter feed” flights).  Here are the problems with this measurement:

  1. It Doesn’t Measure Consistency

Take two airlines, A and B. Airline A has every flight arrive exactly 15 minutes late, and reports a 0% on-time record. Airline B reports a 90% on-time record, but the 10% of their flights that arrived late were each four hours late. Which airline would you trust more, the 90% airline or the 0% airline? Fifteen minutes late is easy to deal with, but a 10% chance of a long delay could be highly penalizing.

  1.  It Measures Airplane On-Time, not Customer On-Time

Again, let’s consider airlines A and B. Each has a flight scheduled to depart at noon and arrive in the same destination at 3pm. Each airline has a maintenance problem at the time of departure. Airline A cancels the flight, and books everyone on their 4pm flight that same day, getting them to their destination at 7pm instead of 3pm. Airline B fixes the problem and departs two hours late, at 2pm, and gets everyone to their destination at 5pm. Since Airline A’s 4pm flight was on-time according to the DOT, the customers on that flight would not be considered as “late”. Yet all of the customers on Airline B are “late” because the flight left two hours late. Airline B’s customers got to their destination at 5pm, while Airline A’s customers got there at 7pm. The DOT metric would not show this.

  1. The DOT Metric Raises Consumer Prices

How does a metric raise prices? Indirectly, of course. Since there is media pressure to look good on the on-time metric, airline ”pad” their flights to give them room to deal with uncertainties. A flight that should take two hours may be schedule to take 2:45 for example, allowing 45 minutes of extra time to deal with situations that can come up. This puts pressure on consumer prices in two ways. First, many airlines pay their crews for the actual time of a flight or the scheduled time, whichever is greater. With a lot of padding in the flight time, the airline ends up paying crews for time they are not actually flying. Customers must cover this cost in their fares. Also, a plane cannot be scheduled to fly one route when committed to another. Padding flight times increases the amount of time a plane sits on the ground, meaning that all the costs of the airline must be spread over fewer total flights. This means higher fares for everyone. Everyone pays so that airlines can “game” the on-time system.

  1.  Not All Flights are Included

International flights and flights sold by one airline but operated by another are not included in an airline’s on-time statistic. This can result in an on-time metric that measures fewer than 50% of the airlines total flights as sold to consumers.

  1.  Connections Take Two Flights or more!

Let’s get back to our airlines A and B. Airline A has big hubs and 50% of their customers make at least one connection of their trips. Airline B flies only point-to-point roués, and does not carry connections. Both airlines have a 85% on-time rate. But the customers who connect on Airline A have an on-time rate of 85% x 85%, which is 72%! And it could be worse if the delay on the first flight causes the customer to miss their connecting flight at the hub.

  1.  Not all Delays are Equal

Most would agree that an 18-minute delay is not the same as a four-hour delay. Long delays are disproportionality disruptive to customers but count no more than a several minute delay in the DOT metric.

So if you agree that the current measurement is vague at best and misleading at worst, what should be done? I suggest a three-part metric that would be easy to collect and report.

First, report the percentage of flights delayed with no buffer time. If a flight arrives at its scheduled time or earlier, it is “on time”, otherwise it is delayed.

Second, report the average delay for each delayed flight by airline, rather than a fixed 14-minute threshold. It would be meaningful to know, for example, that customers on Airline A had an average delay of 16 minutes and those on Airline B had an average delay of 24 minutes. The longer the average, the higher risk that you could be caught on a significantly delayed flight. Averaging the delay on only the delayed flights is essential, as averaging in the on-time flights would make this number less useful to consumers.

Third, report the percentage of all flights that were more than 45 minutes late. Some may argue this should be 60 or 30 minutes, but the point is that at some point the delay causes significant disruption, and these flights should be the primary concern of regulators who wish to modify behavior with reporting.

These three measures should include all flights sold by a carrier. If the airline’s code is sold on the flight, a customer could buy it thinking they are buying that airline, and so the airline should be held accountable for their partner’s reliability.

The result would be something like this: for the month of X, Airline Y had 45% of its flights delayed, with an average delay of 18 minutes. Three percent (3%) of the delayed flights had a delay of longer than 45 minutes. This provides real, customer-centric information rather than the arbitrary and often deceptive 83% on-time as published today.