Two of my colleagues (Dan Owens and Dara Morehouse (I believe in full attribution)) and I are working on population projections for the state. One of the side questions we asked ourselves (always a danger) was when looking at subregions within the state, how would we divide out the state. There are likely candidates such as dividing the state into thirds or quarters. Highway 2 and I-94 make interesting dividers as well. There are many interesting ways to do this and constraints you would want to provide. We also decided to take a little more empirical approach as well. We ran a simple k-means cluster analysis to see what we get with just a few variables.
This is all preliminary and we are adding some more data to see what we get. We are also exploring the best algorithm to use right now as well since this seems to be an interesting little side project. Optimal county clusters seems to be either two or three counties based on the current data. We will see what happens as we add more data to the analysis.
Taking a look at the clusters in the level data we see that really only two counties change groups after we transition from two to three clusters. Cass moves into a category on its own and Stark moves into the cluster of larger counties. One of the things about this is that it is pretty clear that this data (personal income and taxable sales and purchases) are highly correlated with population. In fact, the two smaller clusters are in fact all the larger counties in the state. This is not a surprise really since North Dakota is a small population state still and there are advantages to scale clearly. They are also highly correlated with larger populations.
This ends up with maps that look like this:
This s pretty unremarkable when all is said and done really. The first map, when there are only two clusters allowed is all the bigger counties. When we switch to three clusters Cass is on its own and then Stark joins the other group as well. Not very interesting.
The issue of course is that scale does matter and we want to account for that at some level. There are economic activities possible in these large counties that are not available in the smaller counties simply due to more constrained economic diversification and opportunities.
So the first pass was not a real eye-opener, but then we switch to percentage changes in personal income and taxable sales and purchases and the ratio or farm income to nonfarm income we get a different picture.
The dispersion of counties are very different in this situation. There is significant volatility in these variables. This is a weakness of the k means approach and we are adding in some variables to see if it can smooth out the results, as well as looking at new approaches. The breakdown is very interesting though with some poorer county performance clusters together and then some other combinations.
There is again an issue that the larger counties seem to have lesser volatility in their measures, for example in taxable sales and purchases. With lower numbers for taxables sales changes in a given year represent a bigger percentage movement. The combinations are certainly more interesting though. Here are the maps:
There is not an easy geographic divide in these situations, although many of the counties around the major economic centers seem to be highlighted to some extent. There is certainly more to consider here as far as approach and implications.
One of the things we are considering is the inclusion of both levels data and percentage changes to get at both the importance of scale and growth performance, especially over multiple years.
This fits into our population projection is still in terms of identifying appropriate subregions of the state for alternatives in our analysis. These early outcomes do not really add much beyond the notion that there are differences with counties that are already bigger in terms of population.