Statistics: Stuff Happening Roughly Hourly
Karl Muth explores the relationship between statistics, hypotheses, headlines and policies.
When I roam around the Internet, I collect statistics.
And LOLcats (hey, quit looking in my download folder!).
But mostly statistics.
I’ve been on a few long flights recently and I thought I’d compile a list of things that happen between twenty and twenty-five times per day on average (taken primarily from annualised statistics but, as you’ll see, also monthly tabulations), since large datasets are a nice thing to look at when you don’t have wifi (for those of us who don’t think international air travel is time set aside for watching Game of Thrones). It’s an interesting pool of statistical data I’ve compiled as something of a hobby, a frolic – and something worthwhile for students to dig into as they ponder statistics and what they mean for public policy. To interpret data takes more than good statistical, spreadsheet, and database skills. It takes other research skills and the ability to draw from multiple disciplines.
I try to demonstrate this here with two kinds of fatality statistics. Fatality statistics are nice to work with because deaths are 1) binary (people are either alive or dead) and certain 2) often noted or recorded and 3) likely to be aggregated into reporting for a variety of agencies, bureaucracies, or reports. They are also whole numbers reported with regular frequency across a range of categories and uniformly across geographies (unlike ethnic categorisations or qualitative filters, the definitions of alive and dead are basically the same in all regions). Finally, unlike other things, cause is often reported for deaths, which is useful (meanwhile, the cause of burglaries or the cause of homelessness is a bit more nuanced to unravel than cause of death).
So, I look at the list of things that happen about hourly and it includes two particularly interesting things.
It includes American military veterans committing suicide (happens 22.8 times per day on average, according to the newest Veterans Administration statistics, which cover 2013 year-to-date), which seems like a high number. But can we dig further into that number? Of course we can. First, we should taxonomize the data by conflict and non-conflict periods. We see that the suicide rate is roughly three times higher for veterans who have served during periods when the U.S. is engaged in active conflict, which tracks with statistics on PTSD and other mental health diagnoses.
We don’t have discrete data (though the U.S. military undoubtedly does, since it is obsessed with service records) on which soldiers were actually engaged in conflict, though we can use a proxy here. If we split the suicide dataset by branch of service, combining reserve services since these are samples during conflict, we see that the Coast Guard – which has essentially zero contact with enemy forces – has a very low suicide rate compared to the Marines and Army. This seems to suggest that conflict-exposed or conflict-period troops (I assume the Coast Guard’s mental health is essentially unaffected by overseas conflicts, since its main contacts with hostile forces involve drug smugglers and others who are active whether or not the U.S. is fighting a war).
Now we bifurcate the data into Vietnam and post-Vietnam, and isolate and purge the data in the period between the end of the war in Vietnam and Bush 43 (the first Gulf War during Bush 41 is pretty uninteresting from a suicide standpoint; it also involved a very brief mobilisation). This leaves two peaks in the data that I decouple, stack, clean, and merge. Now, my dataset’s rough “middle” is the end of the war in Vietnam, which is immediately followed by late 2001. This leaves you with some interesting data, but we must remember that we are organising this data by the date range the soldier was in the U.S. military, not the date range during which the soldier commits suicide. This is interesting and we’ll come back to it.
There is also an argument that suicide has been historically under-reported. While this may be true (the argument is that deaths that were suicides were intentionally mis-classified for social or cultural reasons), it is very difficult to figure out at what rate deaths by suicide were mis-classified, as it is not a systematic error (if it were a systematic error, e.g. a medical examiner mis-classifies every tenth suicide as a motor vehicle accidental death, then we could unwind it accurately). I’m not that interested in this mis-reporting idea, but let’s say we were. There are three main methodologies to deal with this sort of problem: proxy coefficients (basically increasing the sample by a fixed multiplier), pattern-and-practice surveys (an anthropological or ethnographic methodology attempting to figure out the frequency of the mis-reporting), and incentive studies.
Of these three options, incentive studies interest me most. Here, people look at what incentives drive people to behave differently (in this case, what incentives drive the mis-reporting of suicide as some other type of death) and attempt to build a model of what a person thus incentivised might do. Obviously, we do not have enough data to build such a model, but it seems there are few incentives in view that would be compelling enough to convince a medical examiner to falsify a death certificate. Committing a crime to help an unrelated party not violate social norms is hardly commonplace in other parts of our society. There may have been cases where a family concerned about appearances bribed a medical examiner, but it’s awfully hard to believe these cases were commonplace or routine.
Okay, enough talk about under-/over-reporting, as I don’t see any incentive of any value that could have lasted over decades to consistently and underreport the incidence of suicide. Let’s just assume the reported statistics, which seem plenty damning, are accurate. We return to our stacked and cleaned data of people who saw combat duty or had a higher probability of seeing combat since they served during wartime. Comparing this to the raw set, we see that people who serve during wartime contribute substantially to the suicide total. This is unsurprising even without any psychological angle, particularly since wartime periods generally have more people serving in the armed forces. What is interesting is that the suicides are not skewed incredibly far toward recent service. In fact, there seems to be substantial latency between the in-service date of a soldier and the suicide of that same person.
This brings me to my hypothesis. It seems that we now live in a decade where people who served during Vietnam are reaching an age where their physical health is failing, their mental health may be failing, and the social and veterans services they depend upon are threatened. At the same time, recent veterans (Iraq and Afghanistan) are leaving the military into the worst job market for young men in America for three generations and the first American job market in history with such a large employment and wage gap between college-educated and non-college-educated male workers under 35. Despite a great deal of marketing around hiring veterans and a range of companies with pro-veteran hiring preferences, the unemployment rate for male veterans in New York City remains 50% higher than the average in that city and cities like Detroit, Philadelphia, and Pittsburgh (all of which have substantial veteran populations) have worse statistics.
As suicides do not seem to be concentrated according to age within either the early (Vietnam) or late (Iraq and Afghanistan) sample, I hypothesise that it is neither the conflict in Vietnam, nor the recent conflicts, that drives the veteran suicide rate, but an unfortunate combination of the two.
Let’s change topics. The list of things that happen about once an hour also includes people killed in a vehicle built by Toyota or one of its subsidiaries (happens 20.3 times per day globally, which is an estimate including all 204.3 million vehicles that have been built by Toyota, of which Toyota estimates 30 to 40 percent are still in working condition), which seems like a low number. Let’s unpack it a bit. It suggests that Toyotas may be safer than the average car in America, where 88 people per day die in car crashes (despite having 14.9% market share in car sales, Toyota’s share is only five, rather than the expected 13, of the 88 vehicle deaths).
However, Toyota’s rise in sales in America is a relatively recent phenomenon, meaning the older (and cars that are older are generally less safe) cars in America tend to be non-Toyotas. While Toyotas are known to last longer than some domestic brands, this longevity factor is not sufficient to correct for the ratio of initial sales. In other words, Toyotas are underrepresented among older cars. I could not find a good dataset for any major market that was sorted by model year of vehicle, but this would be interesting. I suspect people driving cars built more than twenty years ago are substantially less likely to survive a major accident.
There is likely some international heterogeneity that results from multiple crash test standards. German cars built ten to twelve years ago are still very safe, as the German crash standards today are not substantially different from in 1999, though the safety of small cars has likely improved (witness how much safer a BMW Mini Cooper is than, say, a Ford F-150 pickup truck, in an offset collision, for instance) since the elimination of a special crash test for small cars (which Mercedes-Benz had lobbied for on behalf of its MCC Smart Car brand). Many small Japanese cars only sold in Japan (the K-cars, for instance) are unsafe by German standards, but are essentially never driven at autobahn speeds, and hence may be as safe or safer in their intended use.
Twenty-six of Toyota’s current 129 models are sold in Germany. These vehicles presumably are just as safe as other vehicles that pass the strict German crash tests which, most automotive experts agree, are among the most difficult in the world. As we see from the Japanese multi-tiered crash tests, the Toyota models seem to have been designed primarily to meet various legislative challenges, with some vehicles (the Landcruiser FZJ79 pickup truck, for instance) not having been substantially redesigned in decades, while other vehicles (the Lexus LS, for instance) are constantly revised to obtain the highest crash-test ratings. This latter group of vehicles is available in every Toyota market, or the vast majority of Toyota markets, probably at least in part to recoup the massive engineering costs of frequent redesigns.
I take the total deaths in Toyotas, which is in part empirical and in part estimated, and look instead at broader causes of death. Then I look at the geography of these deaths. We find, oddly, that deaths in Toyotas are more strongly correlated with people dying overall than with other vehicle death statistics. Why would this be?
Well, many people dying in Toyotas are dying “in bulk” so to speak. They are dying in speeding minibuses in places like Venezuela and Uganda where twenty people sandwiched into a minibus built for ten all collide with an oncoming two-tonne truck. Toyota’s global safety initiatives suggest that vastly more people are dying in Toyotas in the developing world than in the developed world. While it’s tempting to blame this on lack of resources or poor maintenance, it’s almost certainly a confluence of factors. These people are driving inferior Toyota products (often ones that are more than twenty years old) that have been poorly-maintained. They are also driving these vehicles on roads that are poorly-maintained, poorly-lit, and populated with cars, oxcarts, and bicycles that probably aren’t being helped by the poor road conditions, either.
This leads to my hypothesis: That Toyotas are incredibly safe cars in general and that if you’re driving a relatively new Toyota in a market that enforces crash standards, you’re unlikely to die in that Toyota. In fact, in some countries, like Denmark, you’re far more likely to die just about anywhere else but in your Toyota. But if you’re in the developing world on poor roads in poor conditions, you’re simply more likely to die than a person on the autobahn. And because their legendary reliability and toughness makes them the vehicle of choice in developing countries, you’re more likely to be in a Toyota when you die in a developing country. But it’s probably not Toyota’s fault.
Obviously, veterans and Toyota drivers (and passengers) dying are not the only things happen about once an hour. I’ll very briefly run through two more to get you thinking about how to look at a statistic and drill down a bit.
The State of Alaska experiences an earthquake of at least 2.5 on the Richter scale about every hour (27 times per day). This sounds like a lot, but only two earthquakes per year are above 4 in magnitude on the same scale within 250 miles of a populated area and Alaska averages only one quake every two years of a 5 magnitude within 250 miles of a populated area (remember that the Richter scale is superlinear: this is a base-10 log scale, so a magnitude of 5 is ten times as powerful as a magnitude of 4). Moreover, the number of earthquakes in Alaska causing more than five million dollars in damage – often used in other states to define a quake’s size or relevance – is also somewhat confusing, as a leak in an oil pipeline or damage to the runway of a military airfield can easily cost five million dollars to repair, particularly as the repair equipment is rarely on-site or even near-site.
About 31,000 people in Japan commit suicide each year (between 80 and 100 per day, depending upon the year). This sounds like a lot (and it is in a country with only 127 million people), but the distribution of these suicides is not uniform across society. Young people commit suicide more often in Japan (over a quarter of 12-year-olds in Japan suffer from depression), with youth suicide peaking the day after the dreaded “Center” (National Center Test for University Admissions) exam. The Center posts the answer sheet at 9 p.m. the night after the second day of the exam, and typically the wave of suicides begins that shortly thereafter (January). This means, somewhat shockingly, the Japanese youth suicide rate is about one person per hour in the slowest month of the year (September). In January, for every U.S. veteran who commits suicide, roughly five Japanese teenagers – who have likely never seen combat and never visited Vietnam, Iraq, or Afghanistan – commit suicide.
The top student from last term's microeconomics course reached out to me recently about the statistics class she’s now taking. The basic tools of descriptive statistics are, in my view, crucial to someone’s ability to function in society. And the ability to perform basic analytical tasks with data is important, too. But so are the research and contextual pieces I’ve tried to highlight in this article. Too often, creating the statistic is seen as the end of the task. Policy, however, is not made from statistics. It’s made from hypotheses drawn from statistics. And it is in the journey toward these hypotheses that we can all learn to be more careful.