Car Brand Popularity by State

June 12, 2021

Where I’m from in Michigan, most people drive a Ford. Or maybe a Chevrolet, or a Jeep. Traveling elsewhere, however, I noticed the dominant auto brands change. You don’t see Subarus too often in the south, and the west coast is a stronghold for Toyota and Honda.

Ever-fascinated with regional variation, I was surprised to learn there’s little data about this online. Media stories have a sense of which car brands are the most popular, but seldom where in the US they are sold. So I made this map.

Here’s what the map is: a state-by-state percentage of the selected brand’s portion of used vehicle sales. This is not a perfect approach; for various reasons, people may be more likely to sell one vehicle brand online than another—but still yields interesting results. Here are some very quick observations.

Ford, GM

The two largest American automakers do best in the middle of the country. This isn’t surprising, as they’re likely best known for the trucks and SUVs. Note the F-150 is the best selling automobile in the country.

Ford and GM perform well everywhere. Relative to the center of the country, however, they appear to struggle in wealthy, urban costal regions. This makes sense; I’m probably not very interested in a large F-150 if I live in Boston, for example.

Honda, Toyota

Like Ford and GM, Honda and Toyota have relatively similar profiles and do well everywhere. One interesting observation is that Honda’s performance appears relatively stronger on the East Coast, in areas where Toyota possibly underperforms. Both automakers do well on the West Coast, as expected.

Overall, we see a reversal of the pattern for the previous two automakers. Honda and Toyota do somewhat better in coastal regions, and slightly worse in the interior.

;

Subaru

Subaru is indeed a brand with strong regional variation. Why does everyone in Vermont love Subaru? The states Subaru performs well in mostly have snowy, mountainous terrain. That said, Subaru is far from absent in the south. Surprisingly, the brand appears about as strong in Florida as in Michigan.

Luxury Brands

We see a familiar trend repeating itself here; American luxury brand cars do well in the interior and south. BMW performs well on the East Coast. One state luxury vehicles consistently perform well in is Nevada.

For various reasons, there aren’t very many Teslas on the used car market. Outside of the most populated states, that brand’s data here isn’t meaningful.

Methodology

Observations derived from this kaggle dataset. 426,881 transactions were analyzed. The data was processed and formatted for display using the following Python code. The auxiliary files (imports) can be found on GitHub.

from collections import defaultdict
import csv
import os
import regionToState
from stateToId import getValFromState
import json

with open('./vehicles.csv', 'r') as read_obj:
    csv_reader = csv.reader(read_obj)
    stateToBrandToCount = defaultdict(lambda: defaultdict(lambda: 0))
    for row in csv_reader:
        location: str = row[2]
        manufacturer: str = row[6]
        if not (None in [location, manufacturer] or '' in [location, manufacturer]):
            state = ''
            try:
                state = regionToState.regionToStateMapping[location]
            except:
                print(f'failed to map {location} to a state abbrev.')
            if state != '':
                stateToBrandToCount[state][manufacturer] += 1

manufacturers = set([brand for state in stateToBrandToCount.keys()
                    for brand in list(stateToBrandToCount[state].keys())])

stateToTotalVehicles = dict({state: 
                       sum(stateToBrandToCount[state].values()) 
                       for state in stateToBrandToCount.keys()})

stateBrandToPercent = defaultdict(lambda: defaultdict(lambda: 0))
for state in filter(lambda x: x != "washington, DC", stateToBrandToCount.keys()):
    for brand in stateToBrandToCount[state].keys():
        count: int = stateToBrandToCount[state][brand]
        stateBrandToPercent[state][brand] = count / stateToTotalVehicles[state]

def BrandPercentages(stateBrandToPercent: dict, brand: str):
    stateId = 1
    results = sorted([
        {
            "id": state,
            "percentage": str(stateBrandToPercent[state][brand]),
            "numericId": str(idx+1) if idx+1 > 10 else f'0{idx+1}',
            "val": getValFromState(state)
        }
        for idx, state in enumerate(sorted(stateBrandToPercent.keys()))
    ], key=lambda x: x["percentage"])
    jsonRes: str = json.dumps(results, indent=2)
    filename = f'./{brand}.json'
    if os.path.exists(filename):
        os.remove(filename)
    with open(filename, "w") as outputFile:
        outputFile.write(jsonRes)

any(BrandPercentages(stateBrandToPercent, brand) for brand in manufacturers)

© Bryce Smith, 2021