One of the most frequent questions I get regarding the nature of the regional economies within North Dakota focuses on proper comparisons. The question boils down to a search for comparable peers, and while there are jokes to be made regarding nobody compares it is an excellent question.
So I start this process with a simple cluster analysis (k-means) looking only at the annual percentage change in farm and non-farm income from 2016 to 2017. The interesting constraint on this looks like it might be data. There are many suppression flags in the data set for counties based on disclosure concerns. However, all counties in the analysis include those grass categories.
There will be a need to look at levels and rates in the analysis at some point, but this seems to be a decent starting point. I set it up with three, four, and six clusters to see how groupings changed, it at all.
The three clusters give us an interesting outcome. There are two major clusters with several Bakken counties in an upper grouping. Then a group of others, and then two outliers on the negative side, Ramey and Oliver counties. To be clear the way the data are set up they could be growing, but are below the average performance of the other counties in the state.
When adding a fourth cluster the Bakken region distinguishes itself and really sets itself apart. The lower outliers remain in their own grouping again. In this case while it is not completely the case you almost have a vertical pecking with the highest being thought of as the best. Lastly we have six clusters.
The larger groupings stay the same at this point and the outliers are each in their own group. I do not like setting off individual observations on their own in these analyses (the goal is comparable peers after all) so I think less than 6 seems to be optimal. We also have another cluster of lower performers in this case (Cluster 1) in the above graph.
Further work will add in new variables and probably some levels as well because that is an important check on the North Dakota data typically (any of my former students reading this want to hazard a guess why?)