Yesterday the Guardian published an article about rail suicides in the UK and Europe. It ends with the chart series below, which shows the total number of rail suicides in several European countries from 2008 to 2011.
This got me stuck, because of the huge gap between Germany and the other countries. Is it because Germany has a bigger rail transport network than all other European countries? This is probably a factor, but it doesn't explain these huge differences.
Looking back at the Guardian chart the Netherlands had 215 suicides in 2011 compared to 853 in Germany, but the Netherlands have about 17 million inhabitants and Germany about 80 million.
So does it make sense to look at the absolute numbers alone? I think it doesn't in this case and created the new chart below, which takes the population of the countries into account.
The new chart tells a different story showing the number of suicides by 100,000 inhabitants for each country. It turns out that the Czech Republic is the country with the highest ratio, then with a considerable gap we see follow-up countries like Hungary, Luxembourg, Austria, the Netherlands, and Germany.
The UK (GB in my chart), which ranked in the top 3 to 5 in the Guardian charts actually has a rather low rail suicide rate. Which of course doesn't mean that this is not a problem.
Another more technical difference regarding the charts is that I chose to use the same scale across all years, a nice feature datawrapper offers. This way it becomes more clear that rail suicides are an increasing problem in Europe.
As mentioned before the railway density would be another factor to consider. The main point, however, is to show how important it is to provide context, especially when doing comparisons. Absolute and relative numbers often tell very different stories.
Published on September 12, 2013 (updated on August 29, 2019) by Ramiro Gómez (@yaph). To be informed of new posts, subscribe to the RSS feed.
Tags: data story, datawrapper, guardian, europe, pandas, bar chart.
Check out the source code used to process the data and create the visualizations.
Code Repository