COGS 108 - Stepwise Health: Examining the Relationship Between Walkability and Physical Health in Counties and States

Permissions

Place an X in the appropriate bracket below to specify if you would like your group's project to be made available to the public. (Note that student names will be included (but PIDs will be scraped from any groups who include their PIDs).

Names

Abstract

This study investigates the relationship between walkability and health problems at both county and state levels in the United States. Building upon existing literature that highlights the importance of walkable environments, we explore whether states with higher walkability scores exhibit lower levels of health problems and improved health outcomes.

Drawing from publicly available datasets, including a walkability index across cities and states, physical health data rankings across counties and states, and a Core-Based Statistical Areas (CBSAs) county to city dataset, this study employs a multi-faceted approach to data analysis. By merging and cleaning these datasets, we construct comprehensive frameworks to assess the relationship between walkability and health outcomes across different geographic scales.

Our findings suggest a nuanced relationship between walkability and health indicators. While states with higher walkability scores generally exhibit lower levels of physical inactivity, the correlation is influenced by various socio-economic and environmental factors. Through visualizations and statistical analyses, this study highlights the importance of considering intersectional factors in promoting walkable communities and public health interventions.

Research Question

What is the relationship between walkability and health problems at the county and state level? Specifically, we will examine health problems associated with a lack of physical exercise, particularly those attributable to 'lack of walking'.

Background and Prior Work

For many young individuals, determining their future while navigating the complexities of academic and social life often take precedence over health concerns. As we grapple with adulthood, a recurring issue among college students is the dilemma of transportation. Undergraduates who have cars struggle with the costs that come from parking or gas, while other students are left with the option of public transportation which is often time-consuming. The seemingly simple solution to this is to walk, however, this is hindered due to the unwalkable nature of certain places. Thus, we began to ponder the varying degrees of walkability in cities we have visited or lived in, and the subjective criteria we use to define a location as walkable. Recognizing walking as a form of physical activity, we wanted to investigate whether states we deemed more walkable were associated with a healthier population.

A reference1 we found online is from the Boston University School of Public Health. They searched for links between city walkability and BMI levels, which is very similar to our experiment. Their findings supported the conclusion that there is a correlation between the walkability of state and BMI levels. But they also found there is a distinction between white and nonwhite people. This disparity has been "borne from systemic racism and policies that have created barriers for many communities of color to embrace health-protective behaviors." When we drafted the project, we did not have this in mind, but we will investigate this to see if we also reached the same conclusions.

Another reference2 we found is from the National Library of Medicine. In addition, they investigated whether there was a link between walkable communities and physical and social health in their Texas town. They discovered a positive correlation between the two, as well as statistics indicating that inhabitants are driving less. However, they admit that their conclusions were based on survey data, which limited the number of people who responded to the survey depending on characteristics such as computer and internet availability, as well as overall interest in the study.

We intend to carry out a thorough analysis using publicly available census data from several locations in order to set our approach apart from previous research. Although walkability, health outcomes, and demographic characteristics have all been studied before, our focus on analyzing a wide variety of urban environments will offer a more thorough and nuanced understanding of these correlations. We hope to capture a wider range of demographics by using census data, taking into account elements like population density, racial variety, and socioeconomic position.

  1. ^ Morris, Vivien. “US Neighborhood Walkability Influences Physical Activity, BMI Levels.” SPH US Neighborhood Walkability Influences Physical Activity BMI Levels Comments, 2 Feb. 2023, https://www.bu.edu/sph/news/articles/2023/us-neighborhood-walkability-influences-physical-activity-bmi-levels/.
  2. ^ Zhu, Xuemei, et al. “Walkable Communities: Impacts on Residents’ Physical and Social Health: Researchers from Texas A&M University Studied Residents in a Newly Developed ‘Walkable Community’ in Austin, Texas to See How It Changed Their Habits for Physical Activity and Whether It Increased Social Interaction and Cohesion in the Community.” World Health Design, U.S. National Library of Medicine, July 2013, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8776244/.

Hypothesis

Our main hypothesis is that states that have a higher walkability score will have a lower level of health problems since the increase in walkability may urge residents of the state to walk more throughout their day. Not only will the availability of everyday stores be closer by walking, the residents may be more inclined to walk/bike to work instead of driving.

Data

Data overview

The walkability dataset contains data that ranks block groups according to their relative walkability and assigns each block group a walkability index score using the formula (w/3) + (x/3) + (y/6) + (z/6) where w = block group's ranked score for intersection density, x = block group's ranked score for proximity to transit stops, y = block group's ranked score for employment mix, and z = block group's ranked score for employment and household mix.

The dataset's important variables include 'STATEFP', 'CBSA_Name', and 'NatWalkInd'. 'STATEFP' contains integers that point to a state, 'CBSA_Name' contains strings of various cities in the United States, and 'NatWalkInd' contains an integer up to 20, the maximum possible score for a city's walkability.

In order to clean the walkability dataset, we needed to remove all columns that were not relevant to this project, remove entries from the District of Columbia and Puerto Rico because they are not states, calculate the mean of all NatWalkInd because there are multiple entires for cities, and create a new dataframe with all of the changes.

The health dataset contains health information about states in the US broken down by county. The dataset contains a large amount of measures so we must focus on those that directly affect a person’s physical health.

The important variables in this dataset are ‘State Abbreviation’, ‘Name’ which contains strings representing counties for each state, ‘Poor or fair health raw value’ which contains a float representing the percentage of adults who record poor/fair health, ‘Poor physical health days raw value’ which contains a float that represents the number of poor physical health days, ‘Physical inactivity raw value’ which contains a float representing the percentage of adults who report no leisure-time physical activity, ‘Frequent physical distress raw value’ which contains a float representing the proportion of adults who have frequent physical distress, ‘Median household income raw value’ contains an integer representing the median income in a household and is broken down by ethnicity, ‘Percentage of households with high housing costs’ represents a float showing the percent of households that pay a large amount for housing, percent of each ethnicity as a float, and ‘Population raw value’ showing the amount of people that live in each state.

In order to clean the health dataset, we only included the columns that measured physical health, income, and ethnicity. We got the top two most walkable states and bottom two least walkable states along with California from the walkability dataset and used the ‘Name’ variable to get only these five states and remove the rows of counties. Then, we created a new dataframe with the rows in order of most walkable to least walkable.

This dataset has many additional features, however we will only use the columns Core-Based Statistical Area (CBSA) and county equivalent. This will essentially assist us in matching city names to their corresponding counties, allowing us to study the national walking index in relation to its respective county, resulting in a more thorough analysis. The remaining cities that the dataset was unable to match to a county will be matched manually.

To clean the county dataset, we needed to eliminate any columns that were irrelevant to our project and generate a new DataFrame with all of the adjustments.

We aim to use these three datasets, thus the merging technique will be to assign the NatWalkInd data to the appropriate county or state and construct two separate DataFrames.

Walkability Dataset

import pandas as pd
walkability_df = pd.read_csv('walkability.csv')
walkability_df = walkability_df[walkability_df['STATEFP'] != 11]
walkability_df = walkability_df[walkability_df['STATEFP'] != 72] 

selected_columns = ['STATEFP', 'CBSA_Name', 'NatWalkInd']
walkability_df = walkability_df[selected_columns]

walkability_df
STATEFP CBSA_Name NatWalkInd
0 48 Dallas-Fort Worth-Arlington, TX 14.000000
1 48 Dallas-Fort Worth-Arlington, TX 10.833333
2 48 Dallas-Fort Worth-Arlington, TX 8.333333
3 48 Dallas-Fort Worth-Arlington, TX 15.666667
4 48 Dallas-Fort Worth-Arlington, TX 10.166667
... ... ... ...
220735 78 NaN 7.333333
220736 78 NaN 7.333333
220737 78 NaN 7.333333
220738 78 NaN 4.000000
220739 78 NaN 4.666667

217696 rows × 3 columns

state_mapping = {
    '1': 'ALABAMA',
    '2': 'ALASKA',
    '4': 'ARIZONA',
    '5': 'ARKANSAS',
    '6': 'CALIFORNIA',
    '8': 'COLORADO',
    '9': 'CONNECTICUT',
    '10': 'DELAWARE',
    '12': 'FLORIDA',
    '13': 'GEORGIA',
    '15': 'HAWAII',
    '16': 'IDAHO',
    '17': 'ILLINOIS',
    '18': 'INDIANA',
    '19': 'IOWA',
    '20': 'KANSAS',
    '21': 'KENTUCKY',
    '22': 'LOUISIANA',
    '23': 'MAINE',
    '24': 'MARYLAND',
    '25': 'MASSACHUSETTS',
    '26': 'MICHIGAN',
    '27': 'MINNESOTA',
    '28': 'MISSISSIPPI',
    '29': 'MISSOURI',
    '30': 'MONTANA',
    '31': 'NEBRASKA',
    '32': 'NEVADA',
    '33': 'NEW HAMPSHIRE',
    '34': 'NEW JERSEY',
    '35': 'NEW MEXICO',
    '36': 'NEW YORK',
    '37': 'NORTH CAROLINA',
    '38': 'NORTH DAKOTA',
    '39': 'OHIO',
    '40': 'OKLAHOMA',
    '41': 'OREGON',
    '42': 'PENNSYLVANIA',
    '44': 'RHODE ISLAND',
    '45': 'SOUTH CAROLINA',
    '46': 'SOUTH DAKOTA',
    '47': 'TENNESSEE',
    '48': 'TEXAS',
    '49': 'UTAH',
    '50': 'VERMONT',
    '51': 'VIRGINIA',
    '53': 'WASHINGTON',
    '54': 'WEST VIRGINIA',
    '55': 'WISCONSIN',
    '56': 'WYOMING'
}
for state_code, state_name in state_mapping.items():
    walkability_df.loc[walkability_df['STATEFP'] == int(state_code), 'CBSA_Name'] = walkability_df['CBSA_Name'].fillna(state_name)

walkability_df = walkability_df.dropna()
walkability_df
STATEFP CBSA_Name NatWalkInd
0 48 Dallas-Fort Worth-Arlington, TX 14.000000
1 48 Dallas-Fort Worth-Arlington, TX 10.833333
2 48 Dallas-Fort Worth-Arlington, TX 8.333333
3 48 Dallas-Fort Worth-Arlington, TX 15.666667
4 48 Dallas-Fort Worth-Arlington, TX 10.166667
... ... ... ...
217734 56 Casper, WY 7.833333
217735 56 Casper, WY 10.333333
217736 56 Casper, WY 10.833333
217737 56 WYOMING 5.500000
217738 56 WYOMING 3.166667

217289 rows × 3 columns

state = (
    walkability_df.groupby('STATEFP')
    .agg({'NatWalkInd': 'mean'})
    .reset_index()
    .sort_values(by='NatWalkInd', ascending=False)
)

state['STATEFP'] = state['STATEFP'].astype(str)
state['STATE'] = state['STATEFP'].map(state_mapping)
state.drop(columns=['STATEFP'], inplace=True)
state.rename(columns={'STATE': 'State'}, inplace=True)
state = state[['State', 'NatWalkInd']]

state.reset_index(drop=True, inplace=True) 
state
State NatWalkInd
0 RHODE ISLAND 12.587935
1 CALIFORNIA 12.224970
2 NEW JERSEY 11.868249
3 MASSACHUSETTS 11.627215
4 OREGON 11.471147
5 NEW YORK 11.247009
6 UTAH 11.004635
7 NEVADA 10.979484
8 COLORADO 10.530861
9 WASHINGTON 10.503868
10 MARYLAND 10.482722
11 DELAWARE 10.481417
12 FLORIDA 10.470168
13 ILLINOIS 10.466429
14 PENNSYLVANIA 10.165075
15 ARIZONA 10.104197
16 CONNECTICUT 10.084462
17 HAWAII 9.988952
18 NEBRASKA 9.349561
19 MINNESOTA 9.184181
20 TEXAS 9.081610
21 VIRGINIA 8.976213
22 WISCONSIN 8.874471
23 NEW MEXICO 8.816885
24 OHIO 8.705997
25 KANSAS 8.611867
26 MISSOURI 8.577008
27 VERMONT 8.556194
28 NORTH DAKOTA 8.305070
29 OKLAHOMA 8.242608
30 ALASKA 8.204744
31 IOWA 8.100887
32 MONTANA 8.054236
33 IDAHO 7.976636
34 MICHIGAN 7.963254
35 INDIANA 7.809756
36 LOUISIANA 7.656007
37 SOUTH DAKOTA 7.625127
38 GEORGIA 7.590638
39 WYOMING 7.478455
40 TENNESSEE 7.309697
41 NORTH CAROLINA 7.271080
42 KENTUCKY 7.209691
43 NEW HAMPSHIRE 7.148952
44 SOUTH CAROLINA 7.101940
45 MAINE 7.070135
46 ALABAMA 6.831200
47 ARKANSAS 6.722559
48 WEST VIRGINIA 6.285385
49 MISSISSIPPI 6.004544

County Dataset

county_df = pd.read_csv('county.csv')
county = (
    walkability_df.groupby(['STATEFP', 'CBSA_Name'])
    .agg({'NatWalkInd': 'mean'})
    .reset_index()
    .sort_values(by='NatWalkInd', ascending=False)
)
city_to_county_mapping = {
    'Albany-Lebanon, OR': 'Linn County',
    'Atlanta-Sandy Springs-Alpharetta, GA': ['Barrow County', 'Butts County', 'Carroll County', 'Clayton County', 
                                             'Coweta County', 'Dawson County', 'DeKalb County', 'Douglas County', 
                                             'Fayette County', 'Forsyth County', 'Fulton County', 'Gwinnett County', 
                                             'Heard County', 'Henry County', 'Jasper County', 'Lumpkin County', 
                                             'Meriwether County', 'Morgan County', 'Newton County', 'Pickens County', 
                                             'Pike County','Rockdale County', 'Spalding County', 'Walton County'],
    'Ashtabula, OH': 'Ashtabula County',
    'Atmore, AL': 'Escambia County',
    'Austin-Round Rock-Georgetown, TX': ['Bastrop County', 'Caldwell County', 'Hays County', 'Travis County', 
                                         'Williamson County'],
    'Bakersfield, CA': 'Kern County',
    'Bardstown, KY': 'Nelson County',
    'Bennettsville, SC': 'Marlboro County',
    'Berlin, NH': 'Coos County',
    'Big Stone Gap, VA': 'Wise County',
    'Birmingham-Hoover, AL': ['Bibb County', 'Blount County', 'Chilton County', 'Coosa County', 'Cullman County', 
                              'Jefferson County', 'St. Clair County', 
                              'Shelby County', 'Talladega County', 'Walker County'],
    'Blacksburg-Christiansburg, VA': ['Floyd County', 'Giles County', 'Montgomery County', 'Pulaski County', 
                                      'Radford city'],
    'Bridgeport-Stamford-Norwalk, CT': 'Fairfield County',
    'Brownsville, TN': 'Haywood County',
    'Brunswick, GA': ['Brantley County', 'Glynn County', 'McIntosh County'],
    'Bucyrus-Galion, OH': 'Crawford County',
    'California-Lexington Park, MD': ['Calvert County', "St. Mary's County"],
    'Carbondale-Marion, IL': ['Jackson County', 'Williamson County'],
    'Chambersburg-Waynesboro, PA': 'Franklin County',
    'Central City, KY': 'Muhlenberg County',
    'Chicago-Naperville-Elgin, IL-IN-WI': ['Cook County', 'DeKalb County', 'DuPage County', 'Grundy County', 
                                           'Kane County', 'Kendall County', 'Lake County', 'McHenry County',
                                           'Will County', 'Jasper County', 'Lake County', 'Newton County', 
                                           'Porter County'],
    'Cleveland-Elyria, OH': ['Ashtabula County', 'Cuyahoga County', 'Geauga County', 'Lake County', 'Lorain County', 
                             'Medina County'],
    'Coffeyville, KS': 'Montgomery County',
    'Coos Bay, OR': 'Coos County',
    'Craig, CO': 'Moffat County',
    'Cullowhee, NC': 'Jackson County',
    'Dayton, TN': 'Rhea County',
    'Dayton-Kettering, OH': ['Greene County', 'Miami County', 'Montgomery County'],
    'Denver-Aurora-Lakewood, CO': ['Adams County', 'Arapahoe County', 'Broomfield County', 'Clear Creek County', 
                                   'Denver County', 'Douglas County', 'Elbert County',
                                   'Gilpin County', 'Jefferson County', 'Park County'],
    'Elizabethtown-Fort Knox, KY': ['Hardin County', 'Larue County'],
    'Evanston, WY': ['Rich County', 'Uinta County'],
    'Evansville, IN-KY': ['Posey County', 'Vanderburgh County', 'Warrick County'],
    'Fairbanks, AK': 'Fairbanks North Star Borough',
    'Fairfield, IA': 'Jefferson County',
    'Fernley, NV': 'Lyon County',
    'Fort Collins, CO': 'Larimer County',
    'Fort Madison-Keokuk, IA-IL-MO': 'Lee County',
    'Fort Polk South, LA': 'Vernon Parish County',
    'Gardnerville Ranchos, NV': ['Alpine County', 'Douglas County'],
    'Georgetown, SC': 'Georgetown County',
    'Glenwood Springs, CO': 'Garfield County',
    'Grand Rapids-Kentwood, MI': ['Barry County', 'Ionia County', 'Kent County', 'Montcalm County', 'Muskegon County', 
                                  'Ottawa County', 'Itasca County'],
    'Grants, NM': 'Cibola County',
    'Greenville-Anderson, SC': ['Greenville County', 'Anderson County', 'Laurens County', 'Pickens County'],
    'Hartford-East Hartford-Middletown, CT': ['Hartford County', 'Litchfield County', 'Middlesex County', 
                                              'New London County', 'Tolland County', 'Windham County'],
    'Helena-West Helena, AR': 'Phillips County',
    'Hilo, HI': 'Hawaii County',
    'Hilton Head Island-Bluffton, SC': ['Beaufort County', 'Jasper County'],
    'Hope, AR': 'Hempstead County',
    'Houma-Thibodaux, LA': ['Lafourche Parish', 'Terrebonne Parish'],
    'Houston-The Woodlands-Sugar Land, TX': ['Austin County', 'Brazoria County', 'Fort Bend County', 'Galveston County', 
                                             'Harris County', 'Liberty County', 'Montgomery County', 
                                             'San Jacinto County', 'Waller County'],
    'Indianapolis-Carmel-Anderson, IN': ['Boone County', 'Brown County', 'Hamilton County', 'Hancock County', 
                                         'Hendricks County', 'Johnson County', 'Madison County', 'Marion County', 
                                         'Morgan County', 'Shelby County', 'Tipton County'],
    'Indianola, MS': 'Sunflower County',
    'Jackson, OH': 'Jackson County',
    'Jamestown-Dunkirk-Fredonia, NY': 'Chautauqua County',
    'Jasper, AL': 'Walker County',
    'Jennings, LA': 'Jefferson Davis Parish',
    'Joplin, MO': ['Cherokee County', 'Jasper County', 'Newton County'],
    'Kahului-Wailuku-Lahaina, HI': ['Kalawao County', 'Maui County'],
    'Key West, FL': 'Monroe County',
    'Lamesa, TX': 'Dawson County',
    'Las Vegas-Henderson-Paradise, NV': 'Clark County',
    'Lebanon, NH-VT': ['Grafton County', 'Sullivan County'],
    'Levelland, TX': 'Hockley County',
    'London, KY': 'Laurel County',
    'Longview, WA': 'Cowlitz County',
    'Madera, CA': 'Madera County',
    'Malone, NY': 'Franklin County',
    'Maysville, KY': 'Mason County',
    'Miami-Fort Lauderdale-Pompano Beach, FL': ['Broward County', 'Miami-Dade County', 'Palm Beach County'],
    'Mount Gay-Shamrock, WV': 'Logan County',
    'Muskegon, MI': 'Muskegon County',
    'Myrtle Beach-Conway-North Myrtle Beach, SC-NC': 'Horry County',
    'New Castle, PA': 'Lawrence County',
    'New Haven-Milford, CT': 'New Haven County',
    'New York-Newark-Jersey City, NY-NJ-PA': ['Bergen County', 'Essex County', 'Hudson County', 'Hunterdon County', 
                                              'Middlesex County', 'Monmouth County', 'Morris County','Ocean County', 
                                              'Passaic County', 'Somerset County', 'Sussex County', 'Union County', 
                                              'Bronx County', 'Kings County','Nassau County', 'New York County', 
                                              'Putnam County', 'Queens County', 'Richmond County', 'Rockland County', 
                                              'Suffolk County','Westchester County'],
    'North Port-Sarasota-Bradenton, FL': ['Manatee County', 'Sarasota County'],
    'North Vernon, IN': 'Jennings County',
    'Norwich-New London, CT': 'New London County',
    'Ocean City, NJ': 'Cape May County',
    'Ogden-Clearfield, UT': ['Davis County', 'Morgan County', 'Weber County'],
    'Ogdensburg-Massena, NY': 'St. Lawrence County',
    'Omaha-Council Bluffs, NE-IA': ['Harrison County', 'Mills County', 'Pottawattamie County', 'Cass County', 
                                    'Douglas County', 'Sarpy County', 'Saunders County', 'Washington County'],
    'Panama City, FL': ['Bay County', 'Washington County'],
    'Pearsall, TX': 'Frio County',
    'Pecos, TX': 'Reeves County',
    'Point Pleasant, WV-OH': 'Mason County',
    'Portales, NM': 'Roosevelt County',
    'Poughkeepsie-Newburgh-Middletown, NY': ['Dutchess County', 'Orange County'],
    'Prineville, OR': 'Crook County',
    'Provo-Orem, UT': ['Juab County', 'Utah County'],
    'Racine, WI': 'Racine County',
    'Parsons, KS': 'Labette County',
    'Rockport, TX': 'Aransas County',
    'Salisbury, MD-DE': ['Somerset County', 'Wicomico County'],
    'Salt Lake City, UT': ['Salt Lake County', 'Tooele County'],
    'San Francisco-Oakland-Berkeley, CA': ['Alameda County', 'Contra Costa County', 'Marin County', 'San Francisco County', 
                                           'San Mateo County'],
    'Scottsburg, IN': 'Scott County',
    'Sebastian-Vero Beach, FL': 'Indian River County',
    'Sebring-Avon Park, FL': 'Highlands County',
    'Shelby, NC': 'Cleveland County',
    'Sioux Falls, SD': ['Rock County', 'Lincoln County', 'McCook County', 'Minnehaha County', 'Turner County'],
    'St. Marys, GA': 'Camden County',
    'Staunton, VA': ['Augusta County', 'Staunton city', 'Waynesboro city'],
    'Stevens Point, WI': 'Portage County',
    'Stockton, CA': 'San Joaquin County',
    'The Villages, FL': 'Sumter County',
    'Union, SC': 'Union County',
    'Vineland-Bridgeton, NJ': 'Cumberland County',
    'Virginia Beach-Norfolk-Newport News, VA-NC': ['Camden County', 'Currituck County', 'Gates County', 
                                                   'Gloucester County','Isle of Wight County', 'James City County', 
                                                   'Matthews County','Surry County', 'York County', 
                                                   'Chesapeake city', 'Hampton city', 'Newport News city',
                                                   'Norfolk city', 'Poquoson city', 'Portsmouth city','Suffolk city', 
                                                   'Virginia Beach city', 'Williamsburg city'],
    'Wauchula, FL': 'Hardee County',
    'Wausau-Weston, WI': 'Marathon County',
    'Wenatchee, WA': ['Chelan County', 'Douglas County'],
    'West Point, MS': 'Clay County',
    'Whitewater, WI': 'Walworth County',
    'Winfield, KS': 'Cowley County',
    'Worcester, MA-CT': 'Worcester County',
    'Youngstown-Warren-Boardman, OH-PA': ['Mahoning County', 'Trumbull County'],
}
selected_columns = ['cbsatitle', 'countycountyequivalent']
county_df = county_df[selected_columns]

county = pd.merge(county, county_df, left_on='CBSA_Name', right_on='cbsatitle', how='left')
county.drop(columns=['cbsatitle'], inplace=True)

county['countycountyequivalent'] = county['countycountyequivalent'].fillna(county['CBSA_Name'].map(city_to_county_mapping))
county = county.explode('countycountyequivalent')
county.dropna(inplace=True) 
county.drop('CBSA_Name', axis=1, inplace=True)

county['STATEFP'] = county['STATEFP'].astype(str)
county['STATE'] = county['STATEFP'].map(state_mapping)

desired_order = ['STATE', 'countycountyequivalent', 'NatWalkInd']
county = county[desired_order]
county.rename(columns={'countycountyequivalent': 'County'}, inplace=True)

county.reset_index(drop=True, inplace=True)
county
STATE County NatWalkInd
0 MONTANA Silver Bow County 13.693694
1 CALIFORNIA Los Angeles County 13.484239
2 CALIFORNIA Orange County 13.484239
3 CALIFORNIA San Benito County 13.442451
4 CALIFORNIA Santa Clara County 13.442451
... ... ... ...
2286 NORTH CAROLINA Portsmouth city 4.569892
2287 NORTH CAROLINA Suffolk city 4.569892
2288 NORTH CAROLINA Virginia Beach city 4.569892
2289 NORTH CAROLINA Williamsburg city 4.569892
2290 LOUISIANA Vernon Parish County 4.401515

2291 rows × 3 columns

Health Dataset

health_df = pd.read_csv('health.csv', low_memory=False)
selected_columns = ['State Abbreviation', 'Name', 'Poor or fair health raw value', 
                     'Poor physical health days raw value', 'Physical inactivity raw value', 
                     'Frequent physical distress raw value', 'Median household income raw value', 
                     'Median household income (Asian)', 'Median household income (Black)',
                     'Median household income (Hispanic)', 'Median household income (White)', 
                     'Percentage of households with high housing costs', 
                     '% Non-Hispanic Black raw value', '% American Indian & Alaska Native raw value', 
                     '% Asian raw value', '% Native Hawaiian/Other Pacific Islander raw value', 
                     '% Hispanic raw value', '% Non-Hispanic White raw value','Population raw value']

health_df = health_df[selected_columns]

health_df
State Abbreviation Name Poor or fair health raw value Poor physical health days raw value Physical inactivity raw value Frequent physical distress raw value Median household income raw value Median household income (Asian) Median household income (Black) Median household income (Hispanic) Median household income (White) Percentage of households with high housing costs % Non-Hispanic Black raw value % American Indian & Alaska Native raw value % Asian raw value % Native Hawaiian/Other Pacific Islander raw value % Hispanic raw value % Non-Hispanic White raw value Population raw value
0 state county v002_rawvalue v036_rawvalue v070_rawvalue v144_rawvalue v063_rawvalue v063_race_asian v063_race_black v063_race_hispanic v063_race_white v136_other_data_1 v054_rawvalue v055_rawvalue v081_rawvalue v080_rawvalue v056_rawvalue v126_rawvalue v051_rawvalue
1 US United States 0.1650958341 3.7419351474 0.227 0.1141986695 65712 88204 41935 51811 68785 0.1439905287 0.1253581154 0.0127592557 0.0594226491 0.0024583785 0.1845366958 0.601115369 328239523
2 AL Alabama 0.2137580922 4.4169654739 0.293 0.140394172 51771 63149 33928 41584 57935 0.1210690614 0.2646799988 0.0070972235 0.0150341054 0.0010421797 0.0455373395 0.6528058803 4903185
3 AL Autauga County 0.1983917887 4.501498764 0.306 0.1325295812 58233 NaN 28808 86220 65992 0.1203259827 0.1986432548 0.0047611377 0.011741753 0.0010381428 0.029909252 0.7377078523 55869
4 AL Baldwin County 0.1646067529 3.6479777573 0.247 0.1161164107 59871 47269 36616 41851 61872 0.1198750748 0.0860755978 0.0078034708 0.0106614584 0.0006898591 0.0471881523 0.8320730713 223234
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3190 WY Sweetwater County 0.1644323415 3.5948660902 0.244 0.1154084806 80639 76731 NaN 57409 78796 0.0894080997 0.0118555605 0.0147840257 0.0106983445 0.0016295492 0.159931984 0.7925985405 42343
3191 WY Teton County 0.1141171801 2.935198538 0.108 0.0888580926 98837 NaN NaN 58764 92595 0.1019362187 0.0061796795 0.0088646437 0.0168342994 0.0014064098 0.1514660757 0.8097511081 23464
3192 WY Uinta County 0.1691336951 4.0252764735 0.251 0.1251476862 70756 NaN NaN 46375 64605 0.0811271298 0.0062296055 0.0144368634 0.0049441313 0.0015326807 0.0925046969 0.8729852665 20226
3193 WY Washakie County 0.1665080133 3.7223337554 0.287 0.1172175047 55122 NaN NaN 51071 54493 0.0856725146 0.0048686739 0.0176809737 0.0081998719 0.000768738 0.1419602819 0.8221652787 7805
3194 WY Weston County 0.1688594111 4.0037039624 0.255 0.1241565431 59410 NaN NaN NaN 58372 0.120190779 0.0064963188 0.0189115057 0.0168904288 0.0002887253 0.0411433521 0.9002454165 6927

3195 rows × 19 columns

Merging Datasets by County and State

exclude_keywords = 'county|borough|area|municipality|city|parish|District of Columbia'
health_by_state = health_df[~health_df['Name'].str.contains(exclude_keywords, case=False)].copy()
health_by_state['Name'] = health_by_state['Name'].str.upper()
health_by_state.rename(columns={'Name': 'State'}, inplace=True)
health_by_state.reset_index(drop=True, inplace=True)
health_by_state
State Abbreviation State Poor or fair health raw value Poor physical health days raw value Physical inactivity raw value Frequent physical distress raw value Median household income raw value Median household income (Asian) Median household income (Black) Median household income (Hispanic) Median household income (White) Percentage of households with high housing costs % Non-Hispanic Black raw value % American Indian & Alaska Native raw value % Asian raw value % Native Hawaiian/Other Pacific Islander raw value % Hispanic raw value % Non-Hispanic White raw value Population raw value
0 US UNITED STATES 0.1650958341 3.7419351474 0.227 0.1141986695 65712 88204 41935 51811 68785 0.1439905287 0.1253581154 0.0127592557 0.0594226491 0.0024583785 0.1845366958 0.601115369 328239523
1 AL ALABAMA 0.2137580922 4.4169654739 0.293 0.140394172 51771 63149 33928 41584 57935 0.1210690614 0.2646799988 0.0070972235 0.0150341054 0.0010421797 0.0455373395 0.6528058803 4903185
2 AK ALASKA 0.1556596328 4.0631462386 0.193 0.1170578796 77203 73014 62191 69463 85841 0.1158771825 0.033067002 0.1557703217 0.0653397945 0.0143504501 0.0727392026 0.6015733824 731545
3 AZ ARIZONA 0.1862322982 4.1929086684 0.212 0.133480194 62027 78785 47386 48649 64657 0.1389793167 0.0447574758 0.0530179975 0.0369190065 0.0027709554 0.3174446815 0.5412615987 7278717
4 AR ARKANSAS 0.2327179193 4.822260347 0.304 0.1530059648 49020 71716 32070 42532 51681 0.1106856188 0.1541548093 0.0101785934 0.0167018799 0.0039054889 0.078411653 0.7203410162 3017804
5 CA CALIFORNIA 0.1761357003 3.8561413861 0.177 0.1161426032 80423 96962 51837 58703 87089 0.1965490281 0.0562196412 0.0164471131 0.15465961 0.0050567896 0.3941787836 0.3650452165 39512223
6 CO COLORADO 0.1378578425 3.2870247012 0.148 0.0953377412 77104 80261 51677 53929 78571 0.1377906991 0.0405510862 0.0161462515 0.035192445 0.001967272 0.218260396 0.6765552371 5758736
7 CT CONNECTICUT 0.1299142198 3.2953510849 0.199 0.0975555779 78920 96689 49000 47753 89527 0.160890666 0.1034514192 0.0057686801 0.0495272891 0.001087991 0.1685572578 0.6591679716 3565287
8 DE DELAWARE 0.1626761339 3.7034240552 0.273 0.113205773 70348 96966 50361 55321 74014 0.1283948833 0.2204415033 0.006740853 0.0409370238 0.0010875325 0.0959072219 0.616524127 973764
9 FL FLORIDA 0.1951156433 4.0102702468 0.258 0.1260930717 59198 72205 41702 49266 61682 0.1698282079 0.1552893119 0.0050726946 0.0295574436 0.001147281 0.2637084158 0.5324902246 21477737
10 GA GEORGIA 0.184150232 3.9486340908 0.264 0.1221256529 61950 80977 44670 49897 67955 0.1415380248 0.3164090759 0.005275103 0.043698551 0.0011689277 0.0987738738 0.5202164405 10617423
11 HI HAWAII 0.1542635517 3.2169431006 0.196 0.0966864443 83734 86443 69678 70468 82185 0.1793322204 0.0196204177 0.0039339714 0.3758743728 0.1013855772 0.1065520047 0.2165605365 1415872
12 ID IDAHO 0.1510581348 3.7389496377 0.204 0.1080348563 60830 53243 43034 47526 57543 0.1143231492 0.0076012904 0.017414028 0.0155064309 0.0022069706 0.1284172652 0.8160178841 1787065
13 IL ILLINOIS 0.1592408629 3.5757270926 0.216 0.1026618688 69212 90278 38573 55836 73686 0.1449008098 0.1407408612 0.0059581018 0.0590041479 0.0006617044 0.1751825566 0.6078566766 12671821
14 IN INDIANA 0.1819021442 3.9501272025 0.267 0.123349915 57617 63722 34895 47149 59861 0.1097344318 0.0958636372 0.0042096076 0.0259572661 0.0006742205 0.0726882177 0.7841369985 6732219
15 IA IOWA 0.1346237524 3.0569401642 0.226 0.0903022814 61807 59890 32139 47502 62628 0.0986489705 0.0387908985 0.00540717 0.0266615321 0.0015226287 0.062930458 0.8502809763 3155070
16 KS KANSAS 0.1627626831 3.6184392565 0.239 0.1103166264 62028 70987 38079 47203 63078 0.1039506953 0.0574345917 0.0120354346 0.0318750399 0.001265569 0.1222226646 0.7540769721 2913314
17 KY KENTUCKY 0.2184181907 4.5824123003 0.287 0.1441414324 52256 64044 36424 43804 52387 0.1148196218 0.0823468951 0.0030002196 0.0160047524 0.0009382961 0.0391044734 0.8414812364 4467673
18 LA LOUISIANA 0.2142376946 4.3229398684 0.28 0.1376391364 51108 60955 30540 43717 60959 0.1351924264 0.3228775033 0.0078721492 0.0181105035 0.0006139227 0.0531260366 0.5840830977 4648794
19 ME MAINE 0.1707057841 4.1927183156 0.208 0.1301245242 58824 63763 42901 52925 58522 0.1232700852 0.0160346731 0.007265967 0.0129592654 0.0003384883 0.0176311475 0.9296130372 1344212
20 MD MARYLAND 0.1517442226 3.372639409 0.219 0.1004547276 86644 105691 67583 72758 95238 0.140889779 0.2994314949 0.0060936073 0.0672055087 0.0011090564 0.1064929007 0.5004864631 6045680
21 MA MASSACHUSETTS 0.1351861633 3.495073488 0.2 0.1061313756 85700 96556 51842 44885 88656 0.1556137785 0.0733778426 0.0049778723 0.0721823407 0.0010750811 0.1240343312 0.7105981673 6892503
22 MI MICHIGAN 0.1834210504 4.3059490921 0.231 0.1337841224 59522 86611 35322 48256 61750 0.1289376678 0.1376695391 0.0073962209 0.0336826691 0.0004191509 0.0528900133 0.7474485717 9986857
23 MN MINNESOTA 0.1289091628 3.1410338557 0.196 0.0926863649 74529 79482 37811 51426 74945 0.1095977549 0.0678450296 0.0137383077 0.0518663274 0.0007472119 0.0558777594 0.7908581624 5639632
24 MS MISSISSIPPI 0.2205976941 4.4538195324 0.304 0.1449976804 45928 59529 30714 43929 56214 0.1262618304 0.3741892627 0.0062849676 0.0110989067 0.0006068245 0.0336374288 0.5638938104 2976149
25 MO MISSOURI 0.1945809511 4.2476027818 0.255 0.131264974 57375 68497 37179 47978 59138 0.1145952122 0.115960953 0.0058394168 0.0217211509 0.0016060474 0.0437818578 0.7914572684 6137428
26 MT MONTANA 0.1406327761 3.6331329707 0.217 0.108529949 57248 61022 44614 46342 56501 0.1204614903 0.0053949464 0.0665208303 0.0092002268 0.0008607962 0.0405032663 0.8586544633 1068778
27 NE NEBRASKA 0.1379458709 3.2081459775 0.227 0.0962297295 63290 58586 35976 49436 64768 0.1017830258 0.0490227501 0.0151389986 0.0274952337 0.001214325 0.1135463666 0.7822403547 1934408
28 NV NEVADA 0.19103824 4.2311596151 0.225 0.1316925083 63268 68965 41034 51995 66440 0.1494484904 0.093003731 0.0168647302 0.0870624735 0.0079817386 0.2923877882 0.4817720271 3080156
29 NH NEW HAMPSHIRE 0.1284475527 3.5473532347 0.208 0.1044216715 78571 87364 57925 60389 77493 0.1263275626 0.0147487223 0.0030256429 0.0296386512 0.0004942227 0.0401475019 0.8975708809 1359711
30 NJ NEW JERSEY 0.1553329131 3.729284726 0.266 0.1105831382 85786 121111 53247 57068 94462 0.1851954936 0.1293109019 0.0062404655 0.099841368 0.0011528688 0.2090524972 0.5461485287 8882190
31 NM NEW MEXICO 0.2025407755 4.2505316841 0.19 0.1337240079 52021 65144 40528 42421 59815 0.1360252157 0.0188775527 0.109591197 0.0179079935 0.0015933584 0.4926210006 0.3684754455 2096829
32 NY NEW YORK 0.1625275531 3.5614499438 0.234 0.1057357849 72038 76341 48557 49159 78782 0.1951656882 0.1446405108 0.0097538954 0.090122266 0.0013896685 0.1928211498 0.5528766687 19453561
33 NC NORTH CAROLINA 0.1798921561 3.5953899323 0.233 0.1141986695 57388 84513 39108 42397 62036 0.1281900409 0.2136337772 0.0157890612 0.0318741726 0.0012518969 0.0978090946 0.6261488752 10488084
34 ND NORTH DAKOTA 0.1363311476 3.1840303589 0.231 0.0966241187 67402 64953 37872 50466 68524 0.093272568 0.032629891 0.0557238125 0.0168752674 0.000807021 0.0413772108 0.8365631668 762062
35 OH OHIO 0.1776966347 4.075839218 0.261 0.123995401 58704 76054 33158 44500 61427 0.1195569032 0.1267293461 0.0029095482 0.0249358804 0.0006035537 0.0402479233 0.7844138556 11689100
36 OK OKLAHOMA 0.2085687326 4.4952823516 0.278 0.1416185901 54447 60082 35296 44709 57071 0.1090489806 0.0741622822 0.0937848673 0.0238194821 0.0021685779 0.1107185269 0.6500901826 3956971
37 OR OREGON 0.1819015807 4.6515192148 0.173 0.1490591325 66955 78790 41773 52537 64384 0.1575000885 0.0195863327 0.0182835013 0.0485423819 0.0045628734 0.1343960043 0.7505171612 4217737
38 PA PENNSYLVANIA 0.1759009922 3.9622106834 0.22 0.1202839749 63455 76682 38560 41725 66184 0.1300889609 0.1089385407 0.0039593066 0.0376407916 0.0007806599 0.0781245789 0.7571931205 12801989
39 RI RHODE ISLAND 0.1650958341 3.8807855506 0.235 0.1153938599 70383 77420 45727 41293 73652 0.1610837438 0.0613256482 0.0108112343 0.0373206112 0.0020059262 0.1629699413 0.7135726159 1059361
40 SC SOUTH CAROLINA 0.1781996122 4.0081422634 0.26 0.1242986231 56360 66846 35092 44166 62388 0.1264001177 0.2642100532 0.0054840879 0.0182785449 0.0009837408 0.0596494581 0.6365748806 5148714
41 SD SOUTH DAKOTA 0.1344573017 2.9717675909 0.22 0.0896283119 60414 52786 38706 44967 61746 0.0933898431 0.0219824814 0.0904348455 0.0154850626 0.0008896083 0.0422207879 0.8150632051 884659
42 TN TENNESSEE 0.2117597323 4.7187787359 0.272 0.1493978039 56047 76677 38791 43885 57216 0.1202510671 0.167192987 0.0047825696 0.0196328282 0.0009632204 0.0573102984 0.7350142199 6829174
43 TX TEXAS 0.1874316863 3.8026173365 0.232 0.1162345445 64044 88486 46572 49260 75879 0.1274549172 0.1207623248 0.010170479 0.0520925714 0.0014902806 0.3974901815 0.4121541953 28995881
44 UT UTAH 0.1480754639 3.5454067708 0.167 0.1043527258 75705 73139 41752 53547 75227 0.1020062392 0.0118703988 0.0155086249 0.0267399011 0.010598704 0.1441225992 0.7778514254 3205958
45 VT VERMONT 0.1276301545 3.6584868744 0.184 0.109752249 63293 59241 39400 47701 62770 0.1471636385 0.0130643329 0.0039119279 0.0191734149 0.0003958403 0.0203833721 0.9255595852 623989
46 VA VIRGINIA 0.1660634779 3.5405507843 0.222 0.1060954009 76471 105931 51654 68772 80036 0.1265358966 0.1912275047 0.0054597734 0.0690889447 0.0011823534 0.0977587889 0.6124881217 8535519
47 WA WASHINGTON 0.1501877382 3.7419351474 0.164 0.1124074974 78674 96975 52742 54962 76454 0.1363102685 0.0399511851 0.0192778809 0.0956002927 0.0079251278 0.1302343973 0.6750704179 7614893
48 WV WEST VIRGINIA 0.2358605757 5.3021883334 0.28 0.1720548359 48659 64567 33133 48729 47128 0.0946524175 0.0350278186 0.0025583839 0.0082041261 0.0002957347 0.0173880826 0.9198531147 1792147
49 WI WISCONSIN 0.1477416114 3.681936492 0.203 0.1108173476 64177 71786 31351 46266 64927 0.1197576998 0.0639376934 0.0117868232 0.0301040767 0.0005872115 0.0709682583 0.8087794555 5822434
50 WY WYOMING 0.1525964698 3.4919852788 0.231 0.1045686708 66152 54516 47386 52717 65727 0.094708253 0.0112654836 0.0272617791 0.0113536031 0.0010297896 0.1012666758 0.8369286698 578759
health_by_state = pd.merge(state, health_by_state, on='State')
health_by_state.drop('State Abbreviation', axis=1, inplace=True)
health_by_state
State NatWalkInd Poor or fair health raw value Poor physical health days raw value Physical inactivity raw value Frequent physical distress raw value Median household income raw value Median household income (Asian) Median household income (Black) Median household income (Hispanic) Median household income (White) Percentage of households with high housing costs % Non-Hispanic Black raw value % American Indian & Alaska Native raw value % Asian raw value % Native Hawaiian/Other Pacific Islander raw value % Hispanic raw value % Non-Hispanic White raw value Population raw value
0 RHODE ISLAND 12.587935 0.1650958341 3.8807855506 0.235 0.1153938599 70383 77420 45727 41293 73652 0.1610837438 0.0613256482 0.0108112343 0.0373206112 0.0020059262 0.1629699413 0.7135726159 1059361
1 CALIFORNIA 12.224970 0.1761357003 3.8561413861 0.177 0.1161426032 80423 96962 51837 58703 87089 0.1965490281 0.0562196412 0.0164471131 0.15465961 0.0050567896 0.3941787836 0.3650452165 39512223
2 NEW JERSEY 11.868249 0.1553329131 3.729284726 0.266 0.1105831382 85786 121111 53247 57068 94462 0.1851954936 0.1293109019 0.0062404655 0.099841368 0.0011528688 0.2090524972 0.5461485287 8882190
3 MASSACHUSETTS 11.627215 0.1351861633 3.495073488 0.2 0.1061313756 85700 96556 51842 44885 88656 0.1556137785 0.0733778426 0.0049778723 0.0721823407 0.0010750811 0.1240343312 0.7105981673 6892503
4 OREGON 11.471147 0.1819015807 4.6515192148 0.173 0.1490591325 66955 78790 41773 52537 64384 0.1575000885 0.0195863327 0.0182835013 0.0485423819 0.0045628734 0.1343960043 0.7505171612 4217737
5 NEW YORK 11.247009 0.1625275531 3.5614499438 0.234 0.1057357849 72038 76341 48557 49159 78782 0.1951656882 0.1446405108 0.0097538954 0.090122266 0.0013896685 0.1928211498 0.5528766687 19453561
6 UTAH 11.004635 0.1480754639 3.5454067708 0.167 0.1043527258 75705 73139 41752 53547 75227 0.1020062392 0.0118703988 0.0155086249 0.0267399011 0.010598704 0.1441225992 0.7778514254 3205958
7 NEVADA 10.979484 0.19103824 4.2311596151 0.225 0.1316925083 63268 68965 41034 51995 66440 0.1494484904 0.093003731 0.0168647302 0.0870624735 0.0079817386 0.2923877882 0.4817720271 3080156
8 COLORADO 10.530861 0.1378578425 3.2870247012 0.148 0.0953377412 77104 80261 51677 53929 78571 0.1377906991 0.0405510862 0.0161462515 0.035192445 0.001967272 0.218260396 0.6765552371 5758736
9 WASHINGTON 10.503868 0.1501877382 3.7419351474 0.164 0.1124074974 78674 96975 52742 54962 76454 0.1363102685 0.0399511851 0.0192778809 0.0956002927 0.0079251278 0.1302343973 0.6750704179 7614893
10 MARYLAND 10.482722 0.1517442226 3.372639409 0.219 0.1004547276 86644 105691 67583 72758 95238 0.140889779 0.2994314949 0.0060936073 0.0672055087 0.0011090564 0.1064929007 0.5004864631 6045680
11 DELAWARE 10.481417 0.1626761339 3.7034240552 0.273 0.113205773 70348 96966 50361 55321 74014 0.1283948833 0.2204415033 0.006740853 0.0409370238 0.0010875325 0.0959072219 0.616524127 973764
12 FLORIDA 10.470168 0.1951156433 4.0102702468 0.258 0.1260930717 59198 72205 41702 49266 61682 0.1698282079 0.1552893119 0.0050726946 0.0295574436 0.001147281 0.2637084158 0.5324902246 21477737
13 ILLINOIS 10.466429 0.1592408629 3.5757270926 0.216 0.1026618688 69212 90278 38573 55836 73686 0.1449008098 0.1407408612 0.0059581018 0.0590041479 0.0006617044 0.1751825566 0.6078566766 12671821
14 PENNSYLVANIA 10.165075 0.1759009922 3.9622106834 0.22 0.1202839749 63455 76682 38560 41725 66184 0.1300889609 0.1089385407 0.0039593066 0.0376407916 0.0007806599 0.0781245789 0.7571931205 12801989
15 ARIZONA 10.104197 0.1862322982 4.1929086684 0.212 0.133480194 62027 78785 47386 48649 64657 0.1389793167 0.0447574758 0.0530179975 0.0369190065 0.0027709554 0.3174446815 0.5412615987 7278717
16 CONNECTICUT 10.084462 0.1299142198 3.2953510849 0.199 0.0975555779 78920 96689 49000 47753 89527 0.160890666 0.1034514192 0.0057686801 0.0495272891 0.001087991 0.1685572578 0.6591679716 3565287
17 HAWAII 9.988952 0.1542635517 3.2169431006 0.196 0.0966864443 83734 86443 69678 70468 82185 0.1793322204 0.0196204177 0.0039339714 0.3758743728 0.1013855772 0.1065520047 0.2165605365 1415872
18 NEBRASKA 9.349561 0.1379458709 3.2081459775 0.227 0.0962297295 63290 58586 35976 49436 64768 0.1017830258 0.0490227501 0.0151389986 0.0274952337 0.001214325 0.1135463666 0.7822403547 1934408
19 MINNESOTA 9.184181 0.1289091628 3.1410338557 0.196 0.0926863649 74529 79482 37811 51426 74945 0.1095977549 0.0678450296 0.0137383077 0.0518663274 0.0007472119 0.0558777594 0.7908581624 5639632
20 TEXAS 9.081610 0.1874316863 3.8026173365 0.232 0.1162345445 64044 88486 46572 49260 75879 0.1274549172 0.1207623248 0.010170479 0.0520925714 0.0014902806 0.3974901815 0.4121541953 28995881
21 VIRGINIA 8.976213 0.1660634779 3.5405507843 0.222 0.1060954009 76471 105931 51654 68772 80036 0.1265358966 0.1912275047 0.0054597734 0.0690889447 0.0011823534 0.0977587889 0.6124881217 8535519
22 WISCONSIN 8.874471 0.1477416114 3.681936492 0.203 0.1108173476 64177 71786 31351 46266 64927 0.1197576998 0.0639376934 0.0117868232 0.0301040767 0.0005872115 0.0709682583 0.8087794555 5822434
23 NEW MEXICO 8.816885 0.2025407755 4.2505316841 0.19 0.1337240079 52021 65144 40528 42421 59815 0.1360252157 0.0188775527 0.109591197 0.0179079935 0.0015933584 0.4926210006 0.3684754455 2096829
24 OHIO 8.705997 0.1776966347 4.075839218 0.261 0.123995401 58704 76054 33158 44500 61427 0.1195569032 0.1267293461 0.0029095482 0.0249358804 0.0006035537 0.0402479233 0.7844138556 11689100
25 KANSAS 8.611867 0.1627626831 3.6184392565 0.239 0.1103166264 62028 70987 38079 47203 63078 0.1039506953 0.0574345917 0.0120354346 0.0318750399 0.001265569 0.1222226646 0.7540769721 2913314
26 MISSOURI 8.577008 0.1945809511 4.2476027818 0.255 0.131264974 57375 68497 37179 47978 59138 0.1145952122 0.115960953 0.0058394168 0.0217211509 0.0016060474 0.0437818578 0.7914572684 6137428
27 VERMONT 8.556194 0.1276301545 3.6584868744 0.184 0.109752249 63293 59241 39400 47701 62770 0.1471636385 0.0130643329 0.0039119279 0.0191734149 0.0003958403 0.0203833721 0.9255595852 623989
28 NORTH DAKOTA 8.305070 0.1363311476 3.1840303589 0.231 0.0966241187 67402 64953 37872 50466 68524 0.093272568 0.032629891 0.0557238125 0.0168752674 0.000807021 0.0413772108 0.8365631668 762062
29 OKLAHOMA 8.242608 0.2085687326 4.4952823516 0.278 0.1416185901 54447 60082 35296 44709 57071 0.1090489806 0.0741622822 0.0937848673 0.0238194821 0.0021685779 0.1107185269 0.6500901826 3956971
30 ALASKA 8.204744 0.1556596328 4.0631462386 0.193 0.1170578796 77203 73014 62191 69463 85841 0.1158771825 0.033067002 0.1557703217 0.0653397945 0.0143504501 0.0727392026 0.6015733824 731545
31 IOWA 8.100887 0.1346237524 3.0569401642 0.226 0.0903022814 61807 59890 32139 47502 62628 0.0986489705 0.0387908985 0.00540717 0.0266615321 0.0015226287 0.062930458 0.8502809763 3155070
32 MONTANA 8.054236 0.1406327761 3.6331329707 0.217 0.108529949 57248 61022 44614 46342 56501 0.1204614903 0.0053949464 0.0665208303 0.0092002268 0.0008607962 0.0405032663 0.8586544633 1068778
33 IDAHO 7.976636 0.1510581348 3.7389496377 0.204 0.1080348563 60830 53243 43034 47526 57543 0.1143231492 0.0076012904 0.017414028 0.0155064309 0.0022069706 0.1284172652 0.8160178841 1787065
34 MICHIGAN 7.963254 0.1834210504 4.3059490921 0.231 0.1337841224 59522 86611 35322 48256 61750 0.1289376678 0.1376695391 0.0073962209 0.0336826691 0.0004191509 0.0528900133 0.7474485717 9986857
35 INDIANA 7.809756 0.1819021442 3.9501272025 0.267 0.123349915 57617 63722 34895 47149 59861 0.1097344318 0.0958636372 0.0042096076 0.0259572661 0.0006742205 0.0726882177 0.7841369985 6732219
36 LOUISIANA 7.656007 0.2142376946 4.3229398684 0.28 0.1376391364 51108 60955 30540 43717 60959 0.1351924264 0.3228775033 0.0078721492 0.0181105035 0.0006139227 0.0531260366 0.5840830977 4648794
37 SOUTH DAKOTA 7.625127 0.1344573017 2.9717675909 0.22 0.0896283119 60414 52786 38706 44967 61746 0.0933898431 0.0219824814 0.0904348455 0.0154850626 0.0008896083 0.0422207879 0.8150632051 884659
38 GEORGIA 7.590638 0.184150232 3.9486340908 0.264 0.1221256529 61950 80977 44670 49897 67955 0.1415380248 0.3164090759 0.005275103 0.043698551 0.0011689277 0.0987738738 0.5202164405 10617423
39 WYOMING 7.478455 0.1525964698 3.4919852788 0.231 0.1045686708 66152 54516 47386 52717 65727 0.094708253 0.0112654836 0.0272617791 0.0113536031 0.0010297896 0.1012666758 0.8369286698 578759
40 TENNESSEE 7.309697 0.2117597323 4.7187787359 0.272 0.1493978039 56047 76677 38791 43885 57216 0.1202510671 0.167192987 0.0047825696 0.0196328282 0.0009632204 0.0573102984 0.7350142199 6829174
41 NORTH CAROLINA 7.271080 0.1798921561 3.5953899323 0.233 0.1141986695 57388 84513 39108 42397 62036 0.1281900409 0.2136337772 0.0157890612 0.0318741726 0.0012518969 0.0978090946 0.6261488752 10488084
42 KENTUCKY 7.209691 0.2184181907 4.5824123003 0.287 0.1441414324 52256 64044 36424 43804 52387 0.1148196218 0.0823468951 0.0030002196 0.0160047524 0.0009382961 0.0391044734 0.8414812364 4467673
43 NEW HAMPSHIRE 7.148952 0.1284475527 3.5473532347 0.208 0.1044216715 78571 87364 57925 60389 77493 0.1263275626 0.0147487223 0.0030256429 0.0296386512 0.0004942227 0.0401475019 0.8975708809 1359711
44 SOUTH CAROLINA 7.101940 0.1781996122 4.0081422634 0.26 0.1242986231 56360 66846 35092 44166 62388 0.1264001177 0.2642100532 0.0054840879 0.0182785449 0.0009837408 0.0596494581 0.6365748806 5148714
45 MAINE 7.070135 0.1707057841 4.1927183156 0.208 0.1301245242 58824 63763 42901 52925 58522 0.1232700852 0.0160346731 0.007265967 0.0129592654 0.0003384883 0.0176311475 0.9296130372 1344212
46 ALABAMA 6.831200 0.2137580922 4.4169654739 0.293 0.140394172 51771 63149 33928 41584 57935 0.1210690614 0.2646799988 0.0070972235 0.0150341054 0.0010421797 0.0455373395 0.6528058803 4903185
47 ARKANSAS 6.722559 0.2327179193 4.822260347 0.304 0.1530059648 49020 71716 32070 42532 51681 0.1106856188 0.1541548093 0.0101785934 0.0167018799 0.0039054889 0.078411653 0.7203410162 3017804
48 WEST VIRGINIA 6.285385 0.2358605757 5.3021883334 0.28 0.1720548359 48659 64567 33133 48729 47128 0.0946524175 0.0350278186 0.0025583839 0.0082041261 0.0002957347 0.0173880826 0.9198531147 1792147
49 MISSISSIPPI 6.004544 0.2205976941 4.4538195324 0.304 0.1449976804 45928 59529 30714 43929 56214 0.1262618304 0.3741892627 0.0062849676 0.0110989067 0.0006068245 0.0336374288 0.5638938104 2976149
health_by_county = health_df[health_df['Name'].str.contains('county|borough|area|municipality|city|parish', case=False)].copy()
health_by_county.drop(0, inplace=True)
health_by_county.rename(columns={'Name': 'County'}, inplace=True)
health_by_county.reset_index(drop=True, inplace=True)
health_by_county
State Abbreviation County Poor or fair health raw value Poor physical health days raw value Physical inactivity raw value Frequent physical distress raw value Median household income raw value Median household income (Asian) Median household income (Black) Median household income (Hispanic) Median household income (White) Percentage of households with high housing costs % Non-Hispanic Black raw value % American Indian & Alaska Native raw value % Asian raw value % Native Hawaiian/Other Pacific Islander raw value % Hispanic raw value % Non-Hispanic White raw value Population raw value
0 AL Autauga County 0.1983917887 4.501498764 0.306 0.1325295812 58233 NaN 28808 86220 65992 0.1203259827 0.1986432548 0.0047611377 0.011741753 0.0010381428 0.029909252 0.7377078523 55869
1 AL Baldwin County 0.1646067529 3.6479777573 0.247 0.1161164107 59871 47269 36616 41851 61872 0.1198750748 0.0860755978 0.0078034708 0.0106614584 0.0006898591 0.0471881523 0.8320730713 223234
2 AL Barbour County 0.2984149992 5.5692666827 0.28 0.181321046 35972 52841 22357 30563 47175 0.1259425999 0.4782872883 0.0068864944 0.0046990197 0.0021064571 0.0452483189 0.4551162602 24686
3 AL Bibb County 0.2385328355 4.8943765041 0.334 0.1506592018 47918 NaN NaN 46103 52543 0.0826373626 0.2107260873 0.0045994463 0.0021434313 0.0011610253 0.0278199518 0.7440832366 22394
4 AL Blount County 0.2198560956 4.9866219497 0.333 0.1481247144 52902 101071 77518 47529 49529 0.0746885899 0.0150797219 0.0063985059 0.0031992529 0.0011586484 0.0965309722 0.867706568 57826
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3136 WY Sweetwater County 0.1644323415 3.5948660902 0.244 0.1154084806 80639 76731 NaN 57409 78796 0.0894080997 0.0118555605 0.0147840257 0.0106983445 0.0016295492 0.159931984 0.7925985405 42343
3137 WY Teton County 0.1141171801 2.935198538 0.108 0.0888580926 98837 NaN NaN 58764 92595 0.1019362187 0.0061796795 0.0088646437 0.0168342994 0.0014064098 0.1514660757 0.8097511081 23464
3138 WY Uinta County 0.1691336951 4.0252764735 0.251 0.1251476862 70756 NaN NaN 46375 64605 0.0811271298 0.0062296055 0.0144368634 0.0049441313 0.0015326807 0.0925046969 0.8729852665 20226
3139 WY Washakie County 0.1665080133 3.7223337554 0.287 0.1172175047 55122 NaN NaN 51071 54493 0.0856725146 0.0048686739 0.0176809737 0.0081998719 0.000768738 0.1419602819 0.8221652787 7805
3140 WY Weston County 0.1688594111 4.0037039624 0.255 0.1241565431 59410 NaN NaN NaN 58372 0.120190779 0.0064963188 0.0189115057 0.0168904288 0.0002887253 0.0411433521 0.9002454165 6927

3141 rows × 19 columns

state_abbreviations_to_names = {
    'AL': 'ALABAMA',
    'AK': 'ALASKA',
    'AZ': 'ARIZONA',
    'AR': 'ARKANSAS',
    'CA': 'CALIFORNIA',
    'CO': 'COLORADO',
    'CT': 'CONNECTICUT',
    'DE': 'DELAWARE',
    'FL': 'FLORIDA',
    'GA': 'GEORGIA',
    'HI': 'HAWAII',
    'ID': 'IDAHO',
    'IL': 'ILLINOIS',
    'IN': 'INDIANA',
    'IA': 'IOWA',
    'KS': 'KANSAS',
    'KY': 'KENTUCKY',
    'LA': 'LOUISIANA',
    'ME': 'MAINE',
    'MD': 'MARYLAND',
    'MA': 'MASSACHUSETTS',
    'MI': 'MICHIGAN',
    'MN': 'MINNESOTA',
    'MS': 'MISSISSIPPI',
    'MO': 'MISSOURI',
    'MT': 'MONTANA',
    'NE': 'NEBRASKA',
    'NV': 'NEVADA',
    'NH': 'NEW HAMPSHIRE',
    'NJ': 'NEW JERSEY',
    'NM': 'NEW MEXICO',
    'NY': 'NEW YORK',
    'NC': 'NORTH CAROLINA',
    'ND': 'NORTH DAKOTA',
    'OH': 'OHIO',
    'OK': 'OKLAHOMA',
    'OR': 'OREGON',
    'PA': 'PENNSYLVANIA',
    'RI': 'RHODE ISLAND',
    'SC': 'SOUTH CAROLINA',
    'SD': 'SOUTH DAKOTA',
    'TN': 'TENNESSEE',
    'TX': 'TEXAS',
    'UT': 'UTAH',
    'VT': 'VERMONT',
    'VA': 'VIRGINIA',
    'WA': 'WASHINGTON',
    'WV': 'WEST VIRGINIA',
    'WI': 'WISCONSIN',
    'WY': 'WYOMING'
}
health_by_county['STATE'] = health_by_county['State Abbreviation'].map(state_abbreviations_to_names)
health_by_county.drop('State Abbreviation', axis=1, inplace=True)

state_column = health_by_county['STATE']
health_by_county.drop('STATE', axis=1, inplace=True)
health_by_county.insert(0, 'STATE', state_column)

health_by_county
STATE County Poor or fair health raw value Poor physical health days raw value Physical inactivity raw value Frequent physical distress raw value Median household income raw value Median household income (Asian) Median household income (Black) Median household income (Hispanic) Median household income (White) Percentage of households with high housing costs % Non-Hispanic Black raw value % American Indian & Alaska Native raw value % Asian raw value % Native Hawaiian/Other Pacific Islander raw value % Hispanic raw value % Non-Hispanic White raw value Population raw value
0 ALABAMA Autauga County 0.1983917887 4.501498764 0.306 0.1325295812 58233 NaN 28808 86220 65992 0.1203259827 0.1986432548 0.0047611377 0.011741753 0.0010381428 0.029909252 0.7377078523 55869
1 ALABAMA Baldwin County 0.1646067529 3.6479777573 0.247 0.1161164107 59871 47269 36616 41851 61872 0.1198750748 0.0860755978 0.0078034708 0.0106614584 0.0006898591 0.0471881523 0.8320730713 223234
2 ALABAMA Barbour County 0.2984149992 5.5692666827 0.28 0.181321046 35972 52841 22357 30563 47175 0.1259425999 0.4782872883 0.0068864944 0.0046990197 0.0021064571 0.0452483189 0.4551162602 24686
3 ALABAMA Bibb County 0.2385328355 4.8943765041 0.334 0.1506592018 47918 NaN NaN 46103 52543 0.0826373626 0.2107260873 0.0045994463 0.0021434313 0.0011610253 0.0278199518 0.7440832366 22394
4 ALABAMA Blount County 0.2198560956 4.9866219497 0.333 0.1481247144 52902 101071 77518 47529 49529 0.0746885899 0.0150797219 0.0063985059 0.0031992529 0.0011586484 0.0965309722 0.867706568 57826
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3136 WYOMING Sweetwater County 0.1644323415 3.5948660902 0.244 0.1154084806 80639 76731 NaN 57409 78796 0.0894080997 0.0118555605 0.0147840257 0.0106983445 0.0016295492 0.159931984 0.7925985405 42343
3137 WYOMING Teton County 0.1141171801 2.935198538 0.108 0.0888580926 98837 NaN NaN 58764 92595 0.1019362187 0.0061796795 0.0088646437 0.0168342994 0.0014064098 0.1514660757 0.8097511081 23464
3138 WYOMING Uinta County 0.1691336951 4.0252764735 0.251 0.1251476862 70756 NaN NaN 46375 64605 0.0811271298 0.0062296055 0.0144368634 0.0049441313 0.0015326807 0.0925046969 0.8729852665 20226
3139 WYOMING Washakie County 0.1665080133 3.7223337554 0.287 0.1172175047 55122 NaN NaN 51071 54493 0.0856725146 0.0048686739 0.0176809737 0.0081998719 0.000768738 0.1419602819 0.8221652787 7805
3140 WYOMING Weston County 0.1688594111 4.0037039624 0.255 0.1241565431 59410 NaN NaN NaN 58372 0.120190779 0.0064963188 0.0189115057 0.0168904288 0.0002887253 0.0411433521 0.9002454165 6927

3141 rows × 19 columns

health_by_county = pd.merge(county, health_by_county, on=['STATE', 'County'], how='outer')
health_by_county.dropna(subset=['NatWalkInd'], inplace=True)

health_by_county
STATE County NatWalkInd Poor or fair health raw value Poor physical health days raw value Physical inactivity raw value Frequent physical distress raw value Median household income raw value Median household income (Asian) Median household income (Black) Median household income (Hispanic) Median household income (White) Percentage of households with high housing costs % Non-Hispanic Black raw value % American Indian & Alaska Native raw value % Asian raw value % Native Hawaiian/Other Pacific Islander raw value % Hispanic raw value % Non-Hispanic White raw value Population raw value
0 ALABAMA Autauga County 9.626874 0.1983917887 4.501498764 0.306 0.1325295812 58233 NaN 28808 86220 65992 0.1203259827 0.1986432548 0.0047611377 0.011741753 0.0010381428 0.029909252 0.7377078523 55869
1 ALABAMA Baldwin County 6.450355 0.1646067529 3.6479777573 0.247 0.1161164107 59871 47269 36616 41851 61872 0.1198750748 0.0860755978 0.0078034708 0.0106614584 0.0006898591 0.0471881523 0.8320730713 223234
2 ALABAMA Barbour County 5.188406 0.2984149992 5.5692666827 0.28 0.181321046 35972 52841 22357 30563 47175 0.1259425999 0.4782872883 0.0068864944 0.0046990197 0.0021064571 0.0452483189 0.4551162602 24686
3 ALABAMA Bibb County 8.566756 0.2385328355 4.8943765041 0.334 0.1506592018 47918 NaN NaN 46103 52543 0.0826373626 0.2107260873 0.0045994463 0.0021434313 0.0011610253 0.0278199518 0.7440832366 22394
4 ALABAMA Blount County 8.566756 0.2198560956 4.9866219497 0.333 0.1481247144 52902 101071 77518 47529 49529 0.0746885899 0.0150797219 0.0063985059 0.0031992529 0.0011586484 0.0965309722 0.867706568 57826
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3599 WYOMING Sheridan County 7.746032 0.1308850285 3.3129790996 0.228 0.1022218479 64030 51442 NaN 48788 62330 0.1089792786 0.0079383303 0.0133180253 0.0090208299 0.0009840905 0.043956044 0.9141873052 30485
3601 WYOMING Sweetwater County 7.210784 0.1644323415 3.5948660902 0.244 0.1154084806 80639 76731 NaN 57409 78796 0.0894080997 0.0118555605 0.0147840257 0.0106983445 0.0016295492 0.159931984 0.7925985405 42343
3602 WYOMING Teton County 7.833333 0.1141171801 2.935198538 0.108 0.0888580926 98837 NaN NaN 58764 92595 0.1019362187 0.0061796795 0.0088646437 0.0168342994 0.0014064098 0.1514660757 0.8097511081 23464
3603 WYOMING Teton County 7.833333 0.1141171801 2.935198538 0.108 0.0888580926 98837 NaN NaN 58764 92595 0.1019362187 0.0061796795 0.0088646437 0.0168342994 0.0014064098 0.1514660757 0.8097511081 23464
3604 WYOMING Uinta County 6.979167 0.1691336951 4.0252764735 0.251 0.1251476862 70756 NaN NaN 46375 64605 0.0811271298 0.0062296055 0.0144368634 0.0049441313 0.0015326807 0.0925046969 0.8729852665 20226

2291 rows × 20 columns

health_by_ca = health_by_county[health_by_county['STATE'] == 'CALIFORNIA'].copy()
health_by_ca.sort_values(by='NatWalkInd', ascending=False, inplace=True)
health_by_ca
STATE County NatWalkInd Poor or fair health raw value Poor physical health days raw value Physical inactivity raw value Frequent physical distress raw value Median household income raw value Median household income (Asian) Median household income (Black) Median household income (Hispanic) Median household income (White) Percentage of households with high housing costs % Non-Hispanic Black raw value % American Indian & Alaska Native raw value % Asian raw value % Native Hawaiian/Other Pacific Islander raw value % Hispanic raw value % Non-Hispanic White raw value Population raw value
227 CALIFORNIA Los Angeles County 13.484239 0.2129827029 4.3303300828 0.165 0.1299599491 72721 80046 48823 56076 88038 0.2374281282 0.0795169331 0.0143407178 0.1539424772 0.0036508227 0.4862952452 0.2605756667 10039107
239 CALIFORNIA Orange County 13.484239 0.1689277729 4.023714906 0.145 0.115366049 95761 93777 76136 68971 101958 0.1957701268 0.0166067742 0.0102232836 0.2171624326 0.003874746 0.3404269054 0.3983390077 3175692
252 CALIFORNIA Santa Clara County 13.442451 0.14330467 3.7071646064 0.158 0.0999489103 132444 148942 76200 79914 133447 0.1603179436 0.0236522306 0.0117571266 0.3901959279 0.0045937136 0.2501737685 0.305804595 1927852
244 CALIFORNIA San Benito County 13.442451 0.2007368961 4.2152665268 0.26 0.1276600689 84209 118571 144643 71870 102884 0.1472533794 0.009903197 0.0307126481 0.0391032989 0.0044739524 0.6084256783 0.3278722456 62808
247 CALIFORNIA San Francisco County 13.138018 0.1264334194 3.1789773911 0.146 0.0942089287 121795 95057 34237 77074 146569 0.1671475039 0.0502966937 0.0072951135 0.3597871474 0.0045726329 0.152355683 0.4019844614 881549
230 CALIFORNIA Marin County 13.138018 0.1175500845 3.3462642479 0.131 0.0962142731 112069 107849 48602 67125 126501 0.1869887403 0.0255113474 0.0098599059 0.066021188 0.0027508828 0.1628893542 0.7113504826 258826
250 CALIFORNIA San Mateo County 13.138018 0.1305779065 3.4460570979 0.156 0.0972452527 135234 141341 70519 79761 138628 0.1710485796 0.0227532146 0.0085888754 0.3060152132 0.0150096599 0.2400006262 0.3868476975 766573
209 CALIFORNIA Alameda County 13.138018 0.149698913 3.8085572858 0.146 0.1053695245 107589 124079 51749 77990 114427 0.1708784276 0.1016879382 0.0106484121 0.3233301163 0.0094302199 0.2232085963 0.3064232117 1671329
215 CALIFORNIA Contra Costa County 13.138018 0.1492119463 3.8454953879 0.161 0.1100645248 106555 119516 66852 74373 111774 0.1636113954 0.0873825124 0.0101506165 0.1830197152 0.0059565194 0.2604362624 0.4268590392 1153526
257 CALIFORNIA Solano County 12.223977 0.1716460804 4.0038370583 0.224 0.1209397218 85704 97551 62015 71436 88013 0.1687480754 0.1368590596 0.0128718644 0.1615528446 0.0103855081 0.2727687018 0.3717002165 447643
246 CALIFORNIA San Diego County 12.186630 0.1578115667 3.9467092368 0.149 0.1229586341 83576 96856 55842 59850 89392 0.1979365021 0.047063352 0.013207502 0.1258090123 0.0058253678 0.3414979945 0.4495855712 3338330
259 CALIFORNIA Stanislaus County 12.022727 0.2209808781 4.8328123372 0.233 0.1474218877 62761 72225 48773 54190 66097 0.1760150752 0.0269876149 0.0199342607 0.0614244725 0.0093015654 0.4759161733 0.4037754694 550660
253 CALIFORNIA Santa Cruz County 11.977041 0.1626374638 3.9785162194 0.111 0.120523309 85770 70396 58971 58197 92871 0.2152965495 0.0099482821 0.0183300209 0.0528086145 0.0019984408 0.3401082672 0.5676669851 273213
236 CALIFORNIA Monterey County 11.879113 0.2356204729 4.5630146701 0.19 0.1415619143 76509 83180 63813 59486 86107 0.1826834264 0.0251554505 0.0259825232 0.067294689 0.0060521447 0.5937437365 0.2937306047 434061
243 CALIFORNIA Sacramento County 11.829810 0.1830917825 4.2461144162 0.198 0.1259565638 71891 74804 48321 57031 75110 0.1809532879 0.0976696747 0.015455608 0.1701051121 0.0129357279 0.2363081792 0.4379514168 1552058
217 CALIFORNIA El Dorado County 11.829810 0.1329780036 3.6612589677 0.182 0.1101170805 86202 122306 87944 71866 83123 0.1621982919 0.008794719 0.0133476455 0.0483139134 0.0022609065 0.1315992802 0.7721462537 192843
240 CALIFORNIA Placer County 11.829810 0.1254256596 3.6604678868 0.137 0.1078008978 97688 121425 85429 72709 90077 0.1564476619 0.017053742 0.0106068099 0.0824368801 0.0026887322 0.1440542868 0.7152881161 398329
266 CALIFORNIA Yolo County 11.829810 0.175560255 4.0647722406 0.147 0.1255786766 70951 63271 39813 54451 83307 0.1945869947 0.024521542 0.0178095238 0.150585034 0.0057505669 0.3192380952 0.4595056689 220500
265 CALIFORNIA Ventura County 11.691473 0.1764140634 4.200514117 0.181 0.1244308962 91446 113102 87083 69894 97969 0.1757478433 0.0178663035 0.0186381657 0.0788280461 0.0028604998 0.4324260112 0.4468337104 846006
228 CALIFORNIA Madera County 11.118846 0.2566778818 5.15433756 0.298 0.1608098308 61105 58036 42703 48228 65905 0.1646788991 0.0318826393 0.0443789051 0.0255391637 0.0030573265 0.587839341 0.3317739485 157327
218 CALIFORNIA Fresno County 11.118846 0.2487206531 4.7012412009 0.227 0.1543155486 56926 63293 33397 44049 69100 0.194422043 0.0463156378 0.0297517468 0.1114531964 0.002817533 0.5376633594 0.2863063894 999101
248 CALIFORNIA San Joaquin County 11.056962 0.2233631278 4.6523550991 0.256 0.1412599399 68458 80513 46119 55282 74162 0.1817832437 0.0712210752 0.0196825289 0.1735148554 0.0082398694 0.4203041929 0.3045642054 762148
233 CALIFORNIA Merced County 10.952546 0.2840865983 5.3921370184 0.274 0.1672944114 59733 63559 40809 48145 64537 0.16992 0.0294979833 0.0253565255 0.0776721406 0.003820945 0.6102059925 0.2648588303 277680
258 CALIFORNIA Sonoma County 10.863480 0.1512406708 4.0067363598 0.157 0.1177340029 87084 85992 68975 67701 85314 0.1820255112 0.0162258059 0.0218393967 0.0456976631 0.0041429311 0.2730005502 0.6291307936 494336
245 CALIFORNIA San Bernardino County 10.826422 0.2433102517 5.026449363 0.221 0.1516458921 67398 82040 51063 60222 67894 0.1923912601 0.0810835357 0.0207413931 0.0795959791 0.0047319256 0.5443861134 0.2729333948 2180085
242 CALIFORNIA Riverside County 10.826422 0.2027058312 4.6853022028 0.195 0.1343328851 72905 84703 63167 57903 74951 0.1952762758 0.0621923251 0.0192200429 0.0715327705 0.0044965769 0.5003072195 0.3407453251 2470546
224 CALIFORNIA Kings County 10.825103 0.2539287008 4.905479532 0.243 0.155522852 57297 77727 48427 46733 71086 0.1551889048 0.0631751013 0.0320714005 0.0438995685 0.0035111809 0.5525957892 0.3134431803 152940
249 CALIFORNIA San Luis Obispo County 10.744376 0.152139385 4.0323913968 0.156 0.1193626208 76599 76286 54145 60346 76812 0.1856660232 0.0173182957 0.0139909788 0.0399101413 0.0018402676 0.2290974212 0.6850422626 283111
220 CALIFORNIA Humboldt County 10.554012 0.1822640173 4.8951302161 0.162 0.1466199345 51134 45417 NaN 42816 50158 0.2041721729 0.0124374806 0.0637144248 0.0289912805 0.0033122354 0.1206420868 0.7382670149 135558
251 CALIFORNIA Santa Barbara County 10.495740 0.1930458391 4.2067599554 0.14 0.1269017295 74530 81520 53983 60418 87460 0.2018599754 0.0180224368 0.0214199808 0.0601837854 0.0025912712 0.460325779 0.4379785845 446499
223 CALIFORNIA Kern County 10.323099 0.2672410968 5.3428335009 0.251 0.1672083288 53245 73797 36812 45017 64354 0.1679498966 0.0521838432 0.0263474198 0.0536312961 0.0027260548 0.5460385558 0.3284984926 900202
260 CALIFORNIA Sutter County 10.274775 0.2213808876 4.8777042276 0.295 0.1472776906 60910 59215 87438 44196 67656 0.142319159 0.0217384579 0.0239040538 0.1696795949 0.0042693176 0.3187241546 0.4493405245 96971
267 CALIFORNIA Yuba County 10.274775 0.2119278686 4.8596761987 0.317 0.1488845061 56607 63897 86971 53465 58434 0.1534246575 0.0361137947 0.0285757869 0.0734606193 0.0055168556 0.2914145523 0.5396730564 78668
225 CALIFORNIA Lake County 9.440972 0.2121813885 5.0948131472 0.233 0.1562266578 46897 NaN 31638 44045 48819 0.1956777996 0.0179542136 0.0447302209 0.0137918181 0.00304414 0.2200944305 0.6890783711 64386
232 CALIFORNIA Mendocino County 9.016878 0.2013682119 4.9351384708 0.164 0.1473511203 52309 64904 NaN 44068 55466 0.216958717 0.0081614774 0.0633436697 0.0228244706 0.0026282724 0.2579741553 0.6434656307 86749
254 CALIFORNIA Shasta County 8.937659 0.1719840019 4.617080041 0.17 0.1378710063 61464 80135 41250 43734 55975 0.1925792507 0.0108784984 0.0318025322 0.0313971568 0.0022767659 0.1051477121 0.791625944 180080
229 CALIFORNIA Madera County 8.916667 0.2566778818 5.15433756 0.298 0.1608098308 61105 58036 42703 48228 65905 0.1646788991 0.0318826393 0.0443789051 0.0255391637 0.0030573265 0.587839341 0.3317739485 157327
216 CALIFORNIA Del Norte County 8.691667 0.2263008808 5.2045981326 0.328 0.1601692441 48979 82875 103661 41803 45974 0.174282678 0.0329713793 0.0967927513 0.0309578599 0.0019416079 0.2012081116 0.6197324896 27812
221 CALIFORNIA Imperial County 8.569444 0.2940808172 5.1343785783 0.272 0.1642263516 48102 87356 30917 44026 68500 0.1749440716 0.0242032944 0.0248103082 0.0210964876 0.0019700356 0.8503048865 0.1002179731 181215
261 CALIFORNIA Tehama County 8.414729 0.2192593451 4.9946234859 0.281 0.1577568398 51672 NaN 80123 37460 46945 0.1732283465 0.0080050396 0.0330956917 0.0144121443 0.0023200787 0.2577592035 0.6734681335 65084
237 CALIFORNIA Napa County 8.127358 0.1591147685 3.7421883086 0.176 0.1142836289 90230 114806 66528 68493 97374 0.1620014355 0.020755895 0.0126756882 0.088991172 0.0041889302 0.3460041817 0.5175615635 137744
238 CALIFORNIA Nevada County 7.786667 0.139272944 3.8192977586 0.138 0.115571218 69550 79637 77898 56569 66268 0.1823875736 0.0052729187 0.0127913388 0.0146258333 0.0021252068 0.0975590196 0.8473760714 99755
226 CALIFORNIA Lassen County 7.740000 0.205531187 4.7720027887 0.336 0.1450281513 53613 NaN 61450 52284 56911 0.1381914894 0.079547313 0.0425538874 0.015994505 0.0086023616 0.1933078206 0.6474994276 30573
263 CALIFORNIA Tulare County 7.356173 0.2802614161 5.2099530736 0.263 0.1702771347 56776 56563 40840 42717 62518 0.1881655917 0.0123060093 0.0277437553 0.0397752014 0.0022930319 0.6560001716 0.2766117183 466195
212 CALIFORNIA Butte County 7.058974 0.1885248337 4.722006569 0.231 0.1418333643 58394 48668 30360 47729 54549 0.193510394 0.0160867939 0.0253072733 0.0501126897 0.0028651465 0.1721414689 0.7090553229 219186
264 CALIFORNIA Tuolumne County 6.302083 0.169067007 4.4580909367 0.223 0.1325730335 64729 NaN NaN 49402 61319 0.1649170643 0.0184845259 0.0227981938 0.0148500312 0.002753405 0.1267667682 0.7971107603 54478

Results

Exploratory Data Analysis

The National Walking Index (NatWalkInd) is employed as a statistic in this study to investigate the association between walkability and health at the county and state level. The first bar graphs, which display NatWalkInd scores for each state in the United States, provide the foundation for further investigation. Scatter plots and joint plots reveal a potential negative relationship between walkability and physical inactivity, with a modest trend indicating that better walkability is related with reduced physical inactivity. The report also examines the counties in each state that have the greatest and poorest health scores. It demonstrates a strong relationship between walkability and overall health, lending support to the notion that a more walkable environment may lead to better health outcomes.

Simple Graph Visualizations

import matplotlib.pyplot as plt
import seaborn as sns
plt.figure(figsize=(15, 8))

sns.barplot(data=health_by_state, x='State', y='NatWalkInd', color='skyblue', dodge=True)

plt.title('Bar Graph of National Walking Index by State')
plt.xticks(rotation=90)
plt.show()

png

These are the fifty US states, graded from best to worst in terms of walkability. The index score is calculated using this formula: ((ranked score for intersection density) / 3) + ((ranked score for closeness to transit stations) / 3) + ((ranked score for employment mix) / 6) + ((ranked score for employment and household mix) / 6). The division shows that certain elements are more important than others in deciding the final score.

plt.figure(figsize=(15, 8))

sns.barplot(data=health_by_ca, x='County', y='NatWalkInd', color='skyblue', dodge=True)

plt.title('Bar Graph of National Walking Index by CA County')
plt.xticks(rotation=90)
plt.show()

png

This is the same graph, but for the state of California. Glad to see San Diego County have a high walkable score.

Exploring the Link: Walkability and its Impact on Health

fig, axes = plt.subplots(1, 2, figsize=(12, 6))

health_by_ca_sorted = health_by_ca.sort_values(by='Physical inactivity raw value', ascending=True)
health_by_ca_sorted['Physical inactivity raw value'] = pd.to_numeric(
    health_by_ca_sorted['Physical inactivity raw value'], errors='coerce')
sns.scatterplot(x='NatWalkInd', y='Physical inactivity raw value', data=health_by_ca_sorted, ax=axes[0])
sns.regplot(x='NatWalkInd', y='Physical inactivity raw value', data=health_by_ca_sorted, scatter=False, ax=axes[0])
axes[0].set_xlabel('NatWalkInd')
axes[0].set_ylabel('Physical inactivity raw value')
axes[0].set_title('Scatter plot of NatWalkInd vs Physical inactivity raw value')

health_by_state_sorted = health_by_state.sort_values(by='Physical inactivity raw value', ascending=True)
health_by_state_sorted['Physical inactivity raw value'] = pd.to_numeric(
    health_by_state_sorted['Physical inactivity raw value'], errors='coerce')
sns.scatterplot(x='NatWalkInd', y='Physical inactivity raw value', data=health_by_state_sorted, ax=axes[1])
sns.regplot(x='NatWalkInd', y='Physical inactivity raw value', data=health_by_state_sorted, scatter=False, ax=axes[1])
axes[1].set_xlabel('NatWalkInd')
axes[1].set_ylabel('Physical inactivity raw value')
axes[1].set_title('Scatter plot of NatWalkInd vs Physical inactivity raw value')

plt.tight_layout()
plt.show()

png

The left graph compares California counties' walkability and physical inactivity. The right uses the same comparisons, but for all fifty states. As you can see, they are pretty comparable, however for California, we believe the trend line is misleading in pointing downward because the data is highly spread out. For all fifty states, we believe the trend line is correct and shows a declining tendency. A decreasing trend indicates that the more walkable a state is, the more people are physically active.

fig, axes = plt.subplots(1, 2, figsize=(12, 6))

health_by_ca_sorted = health_by_ca.sort_values(by='Poor or fair health raw value', ascending=True)
health_by_ca_sorted['Poor or fair health raw value'] = pd.to_numeric(
    health_by_ca_sorted['Poor or fair health raw value'], errors='coerce')
sns.scatterplot(x='NatWalkInd', y='Poor or fair health raw value', data=health_by_ca_sorted, ax=axes[0])
sns.regplot(x='NatWalkInd', y='Poor or fair health raw value', data=health_by_ca_sorted, scatter=False, ax=axes[0])
axes[0].set_xlabel('NatWalkInd')
axes[0].set_ylabel('Poor or fair health raw value')
axes[0].set_title('Scatter plot of NatWalkInd vs Poor or fair health raw value')

health_by_state_sorted = health_by_state.sort_values(by='Poor or fair health raw value', ascending=True)
health_by_state_sorted['Poor or fair health raw value'] = pd.to_numeric(
    health_by_state_sorted['Poor or fair health raw value'], errors='coerce')
sns.scatterplot(x='NatWalkInd', y='Poor or fair health raw value', data=health_by_state_sorted, ax=axes[1])
sns.regplot(x='NatWalkInd', y='Poor or fair health raw value', data=health_by_state_sorted, scatter=False, ax=axes[1])
axes[1].set_xlabel('NatWalkInd')
axes[1].set_ylabel('Poor or fair health raw value')
axes[1].set_title('Scatter plot of NatWalkInd vs Poor or fair health raw value')

plt.tight_layout()
plt.show()

png

The left graph compares California counties' walkability and poor or fair health. The right uses the same comparisons, but for all fifty states. As you can see, they are pretty comparable, but we do not feel either of these graphs shows a falling tendency. The information is overly spread. We believe that the data varies depending on a number of factors, including access to healthcare, socioeconomic status, education level, environmental impact, public health policies, physical activity rates, smoking and tobacco use, diet and nutrition, chronic disease prevalence, and water and air quality.

health_by_county_sorted = health_by_county.sort_values(by='Physical inactivity raw value', ascending=True)
health_by_county_sorted['Physical inactivity raw value'] = pd.to_numeric(health_by_county_sorted['Physical inactivity raw value'], errors='coerce')
sns.jointplot(x='NatWalkInd', y='Physical inactivity raw value', data=health_by_county_sorted, kind = "reg", dropna = True)
plt.xlabel('NatWalkInd')
plt.ylabel('Physical inactivity raw value')
plt.show()

png

Since we determined that poor or fair health is not a trustworthy factor to include, the graph compares walkability to physical inactivity in counties across all 50 states. We see a slight downward trend, indicating that the more walkable the area, the less likely physical inactivity is. We say slight because there are more factors contributing to physical inactivity than just how walkable a county or state is.

health_by_county_sorted = health_by_county.sort_values(by='Physical inactivity raw value', ascending=True)
health_by_county_sorted['Physical inactivity raw value'] = pd.to_numeric(
    health_by_county_sorted['Physical inactivity raw value'], errors='coerce')
sns.jointplot(x='NatWalkInd', y='Physical inactivity raw value', data=health_by_county_sorted, kind='hex', gridsize=20, marginal_kws=dict(bins=30))
plt.xlabel('NatWalkInd')
plt.ylabel('Physical inactivity raw value')
plt.show()

png

This is the same graph, but with a joint plot, which is ideal for visualizing where the data is most significant. Aside from the fact that this is an intriguing graph, we can see that the data is most dense around walkability index scores 6-8, with physical inactivity raw values ranging from 0.23 to 0.32.

Breaking graphs further down

top_health_by_county = health_by_county[health_by_county['STATE'].isin(['RHODE ISLAND', 'CALIFORNIA', 'NEW JERSEY'])]
bot_health_by_county = health_by_county[health_by_county['STATE'].isin(['MISSISSIPPI', 'WEST VIRGINIA', 'ARKANSAS   '])]
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)

top_health_by_county_sorted = top_health_by_county.sort_values(by='Physical inactivity raw value', ascending=True)
top_health_by_county_sorted['Physical inactivity raw value'] = pd.to_numeric(
    top_health_by_county_sorted['Physical inactivity raw value'], errors='coerce')
sns.scatterplot(x='NatWalkInd', y='Physical inactivity raw value', data=top_health_by_county_sorted)
sns.regplot(x='NatWalkInd', y='Physical inactivity raw value', data=top_health_by_county_sorted, scatter=False)
plt.xlabel('NatWalkInd')
plt.ylabel('Physical inactivity raw value')
plt.title('Top Counties: NatWalkInd vs Physical Inactivity')


plt.subplot(1, 2, 2)
bot_health_by_county_sorted = bot_health_by_county.sort_values(by='Physical inactivity raw value', ascending=True)
bot_health_by_county_sorted['Physical inactivity raw value'] = pd.to_numeric(
    bot_health_by_county_sorted['Physical inactivity raw value'], errors='coerce')
sns.scatterplot(x='NatWalkInd', y='Physical inactivity raw value', data=bot_health_by_county_sorted)
sns.regplot(x='NatWalkInd', y='Physical inactivity raw value', data=bot_health_by_county_sorted, scatter=False)
plt.xlabel('NatWalkInd')
plt.ylabel('Physical inactivity raw value')
plt.title('Bottom Counties: NatWalkInd vs Physical Inactivity')
plt.tight_layout()

plt.show()

png

Because it was difficult to discern whether a high walkability score indicated improved physical health, we compared the top three walkable states to the worst three walkable states. The left graph compares the top counties' walkability and physical inactivity. The comparisons on the right are the same, but for the bottom counties' walkability and physical inactivity. As you can see, they are quite comparable, however we do not believe either of these graphs indicates a downward trend. The information is widely disseminated. Because of the variety of data points, it is possible to argue whether or not a county's walkability reduces physical inactivity.

plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)

top_health_by_county_sorted = top_health_by_county.sort_values(by='Poor or fair health raw value', ascending=True)
top_health_by_county_sorted['Poor or fair health raw value'] = pd.to_numeric(
    top_health_by_county_sorted['Poor or fair health raw value'], errors='coerce')
sns.scatterplot(x='NatWalkInd', y='Poor or fair health raw value', data=top_health_by_county_sorted)
sns.regplot(x='NatWalkInd', y='Poor or fair health raw value', data=top_health_by_county_sorted, scatter=False)
plt.xlabel('NatWalkInd')
plt.ylabel('Poor or fair health raw value')
plt.title('Top Counties: NatWalkInd vs Poor or Fair Health')


plt.subplot(1, 2, 2)

bot_health_by_county_sorted = bot_health_by_county.sort_values(by='Poor or fair health raw value', ascending=True)
bot_health_by_county_sorted['Poor or fair health raw value'] = pd.to_numeric(
    bot_health_by_county_sorted['Poor or fair health raw value'], errors='coerce')
sns.scatterplot(x='NatWalkInd', y='Poor or fair health raw value', data=bot_health_by_county_sorted)
sns.regplot(x='NatWalkInd', y='Poor or fair health raw value', data=bot_health_by_county_sorted, scatter=False)
plt.xlabel('NatWalkInd')
plt.ylabel('Poor or fair health raw value')
plt.title('Bottom Counties: NatWalkInd vs Poor or Fair Health')
plt.tight_layout()

plt.show()

png

Even though we claimed that poor or good health should not be used to compare a county's walkability, the graphs we created were pretty shocking. The left graph compares the top counties' walkability to their poor or fair health. The comparisons on the right are identical, but for the lowest counties' walkability and poor or good health. In terms of health, the top three walkable states have a huge downward trend over the bottom three. The data for the bottom three walkable states is clumped together in the center. Of course, there are several elements that influence someone's health, but it is impossible to dispute that a state's walkability has an impact on health.

Ethics & Privacy

Discusison and Conclusion

In this project, our aim was to investigate the impact of a state or county’s walkability on an individual’s health. Specifically, we wanted to answer the question of whether increased walkability correlates to improved health, possible due to higher levels of physical activity. To address this inquiry, we analyzed three distinct datasets consisting of information on states walkability, health data, and a dataset facilitating the linkage of cities from the walkability dataset to counties in the health dataset. Our analysis involved comparing the walkability scores across all states to people’s physical inactivity, as well as examining the correlation between walkability scores of counties in California to people’s physical inactivity and poor or fair health.

Based on our analysis, we found that poor or fair health is not sufficient to ascertain whether walkability directly leads to better health because external factors such as income levels, dietary habits, prevalence of chronic diseases, and accessibility to healthcare can impact the health status of individuals. When comparing physical inactivity to walkability, our results indicated that states with a higher walkability tend to have populations that are more physically active. We concluded that states with higher walkability had less physically inactive people, and those who are active typically experience better health than those who are not. However, as mentioned previously, it is important to recognize that health is influenced by multiple factors beyond physical activity alone, so we refrain from concluding that walkability itself directly causes improved health. Instead, we conclude that our findings suggest a correlation between physical activity and a state’s walkability score, implying that more walkable environments encourage healthier lifestyles.

Some of the main limitations of our project are the quality of the walkability data and the lack of a time frame specified for the data collected for the walkability index. As noted in our EDA process, there is a large variance for the walkability score of a single city, with scores ranging from [2,20] for a single city. With a walkability dataset with more specific columns relating the score to the time frame it was taken. This would allow us to understand the correlation on a more granular scale instead of reducing the dimensionality by taking the median or mean of the scores for a specific city. While this project does not have any immediate impact on society, if the question was explored more rigorously, the connection between walkability and health would be beneficial to improving the health of the individuals in areas with low walkability. If publicized, this work could be beneficial to establish more environmentally conscious cities and reduce carbon emissions while promoting walking and everyday activity for society.

Team Contributions