Note This is a notebook about something that came up in one of my courses, it involves looking at data from multiple sources over the past 2 years primarily on Covid data. All together it is over 100 million data points and it is a nice indication of how some people choose their own outcomes based upon what side of our US political divide they find themselves in. I have up to date data but this notebook is written using data up to Feb 23rd 2022. My sources are linked below.
import pandas as pd
import My_Covid_data as cd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import statsmodels.api as sm
from statsmodels.formula.api import ols
from sodapy import Socrata
%config InlineBackend.figure_format='retina'
#cd.Update_covid()
#cd.Update_vaccine() # you might have to re-download the original if you have'nt updated in 2 weeks
dataC = pd.read_csv('Data/USA_covid_data.csv') # Covid data
dataKey = pd.read_csv('Data/USAkey.csv')# this is to access multiple data sets and knit them together
print('Large file with :',len(dataC)*len(dataC.columns), ' points of data (approx 22 mil)')
Large file with : 22758650 points of data (approx 22 mil)
At a basic level I now have access to every Federal Information Processing Standards (FIPS) county in America since Jan 2020. Some data does not show up until March 2020, but it is all there.
dataX = cd.Get_Covid_data('Buchanan County','MO')
dataX['active'] = dataX.Cumulative_confirmed.diff(10) # 10 days of infection according to CDC
dataX['Rolling'] = dataX.active.rolling(7).mean()
plt.figure(figsize=(13,6))
plt.plot(dataX.active, label= "active cases")
plt.plot(dataX.Rolling, label = 'Rolling 7 day average of active')
plt.grid()
plt.legend()
plt.show()
dataV = pd.read_csv('Data/COVID-19_Vaccinations_in_the_United_States_County.csv',low_memory=False) # vaccination records for each county
#broken down into various subclass of population
print('Large file with :',len(dataV)*len(dataV.columns), ' points of data (approx 76 mil)')
Large file with : 76547847 points of data (approx 76 mil)
I can get the vaccination rate for different type of people, old young poor etc etc....I am mainly interested in just the people whom completed their sequence of vaccination. Too much parsing of vaccination types leads to too much nuance and you loose the big picture. For an example of what data there is, let us look at the vaccination rate of Buchanan County Missouri over time. We will also print the data from the last day of this data set.
County='Buchanan County'
State = 'MO'
dataVV = pd.DataFrame(dataV[(dataV.Recip_County==County)&(dataV.Recip_State==State)])
dataVV.Date = pd.to_datetime(dataVV.Date)
dataVV=dataVV.sort_values('Date')
plt.figure(figsize=(13,6))
plt.plot(dataVV.Date,dataVV.Series_Complete_Pop_Pct)
plt.show()
print(dataVV.iloc[-1])
Date 2022-02-23 00:00:00 FIPS 29021 MMWR_week 8 Recip_County Buchanan County Recip_State MO Completeness_pct 91.3 Administered_Dose1_Recip 38402.0 Administered_Dose1_Pop_Pct 44.0 Administered_Dose1_Recip_5Plus 38401.0 Administered_Dose1_Recip_5PlusPop_Pct 46.8 Administered_Dose1_Recip_12Plus 37481.0 Administered_Dose1_Recip_12PlusPop_Pct 50.4 Administered_Dose1_Recip_18Plus 35557.0 Administered_Dose1_Recip_18PlusPop_Pct 52.4 Administered_Dose1_Recip_65Plus 9445.0 Administered_Dose1_Recip_65PlusPop_Pct 64.7 Series_Complete_Yes 33143 Series_Complete_Pop_Pct 37.9 Series_Complete_5Plus 33143.0 Series_Complete_5PlusPop_Pct 40.4 Series_Complete_12Plus 32485.0 Series_Complete_12PlusPop_Pct 43.7 Series_Complete_18Plus 30852 Series_Complete_18PlusPop_Pct 45.5 Series_Complete_65Plus 8413 Series_Complete_65PlusPop_Pct 57.6 Booster_Doses 13414.0 Booster_Doses_Vax_Pct 40.5 Booster_Doses_12Plus 13399.0 Booster_Doses_12Plus_Vax_Pct 41.2 Booster_Doses_18Plus 13184.0 Booster_Doses_18Plus_Vax_Pct 42.7 Booster_Doses_50Plus 9922.0 Booster_Doses_50Plus_Vax_Pct 59.1 Booster_Doses_65Plus 6015.0 Booster_Doses_65Plus_Vax_Pct 71.5 SVI_CTGY C Series_Complete_Pop_Pct_SVI 10.0 Series_Complete_5PlusPop_Pct_SVI 11.0 Series_Complete_12PlusPop_Pct_SVI 11.0 Series_Complete_18PlusPop_Pct_SVI 11.0 Series_Complete_65PlusPop_Pct_SVI 11.0 Metro_status Metro Series_Complete_Pop_Pct_UR_Equity 2.0 Series_Complete_5PlusPop_Pct_UR_Equity 3.0 Series_Complete_12PlusPop_Pct_UR_Equity 3.0 Series_Complete_18PlusPop_Pct_UR_Equity 3.0 Series_Complete_65PlusPop_Pct_UR_Equity 3.0 Census2019 87364.0 Census2019_5PlusPop 81993.0 Census2019_12PlusPop 74402.0 Census2019_18PlusPop 67817.0 Census2019_65PlusPop 14600.0 Name: 2587, dtype: object
Note: Booster percentage is only the percentage of people who have had complete series, not of county
One could think that this is good until you look comparitively, 50 miles south is the KC metro region. We can compare each county in the metro region to Buchanan
MO_counties =['Cass County', 'Clay County','Platte County','Jackson County']
KS_counties =['Miami County','Johnson County','Wyandotte County','Leavenworth County']
plt.figure(figsize=(13,6))
State = "MO"
for C in MO_counties:
dataVV = pd.DataFrame(dataV[(dataV.Recip_County==C)&(dataV.Recip_State==State)])
dataVV.Date = pd.to_datetime(dataVV.Date)
dataVV=dataVV.sort_values('Date')
plt.plot(dataVV.Date,dataVV.Series_Complete_Pop_Pct,linewidth=0.8, label = C)
State = "KS"
for C in KS_counties:
dataVV = pd.DataFrame(dataV[(dataV.Recip_County==C)&(dataV.Recip_State==State)])
dataVV.Date = pd.to_datetime(dataVV.Date)
dataVV=dataVV.sort_values('Date')
plt.plot(dataVV.Date,dataVV.Series_Complete_Pop_Pct,linewidth=0.8, label = C)
County='Buchanan County'
State = 'MO'
dataVV = pd.DataFrame(dataV[(dataV.Recip_County==County)&(dataV.Recip_State==State)])
dataVV.Date = pd.to_datetime(dataVV.Date)
dataVV=dataVV.sort_values('Date')
plt.plot(dataVV.Date,dataVV.Series_Complete_Pop_Pct,'-.',linewidth=1,label = 'Buchanan County',c='r')
plt.title('Vaccination rate for fully vaccinated in county')
plt.grid()
plt.legend()
plt.show()
An interesting thing is the kink in the graph, those vertical lines in August 2021 is essentially new records being re-classified. As you can see, Buchanan County is not keeping pace with the rest of the other counties. My next though was to figure out does it matter, do the vaccines actually make a difference, not just individually but as a policy. So I joined those sets together and did some Regressions
dataCC = pd.DataFrame(dataC[dataC['date']==dataC['date'].values[-1]]) # this is latest data
dataVV = pd.DataFrame(dataV[dataV['Date']==dataV['Date'].values[0]]) # most recent Vaccine data
dataCC = dataCC[dataCC.location_key.str.count('_')>1] # only look at county data
dataCC['FIPS'] = dataCC.location_key.apply(lambda x: x.split('_',2)[2]) # create an FIPS code
dataCC = dataCC[dataCC.FIPS.str.isdigit()]
dataCC['FIPS'] = dataCC.FIPS.astype(int)
dataCC = dataCC[['FIPS','cumulative_confirmed','cumulative_deceased']] # only take what we need
dataVV = pd.DataFrame(dataVV[dataVV.FIPS.str.isdigit()]) # take county data from Vaccinated
dataVV['FIPS'] = dataVV.FIPS.astype(int)
Valid_FIPS = set(dataCC.FIPS.unique()).intersection(set(dataVV.FIPS.unique())) # make sure the FIPS are the same
dataCC = dataCC[dataCC.FIPS.isin(Valid_FIPS)] # take slicing of sets
dataVV = dataVV[dataVV.FIPS.isin(Valid_FIPS)]
dataCC=dataCC.set_index('FIPS',drop=True)
dataVV=dataVV.set_index('FIPS',drop=True)
dataA = pd.concat([dataCC,dataVV],axis=1) # glue sets together
dataA['FIPS']=dataA.index
dataA = dataA[dataA.FIPS<57000]# gets rid of Guam, Caguas Municipio etc
print("number of counties = ", len(dataA))
dataA.head()
number of counties = 3133
cumulative_confirmed | cumulative_deceased | Date | MMWR_week | Recip_County | Recip_State | Completeness_pct | Administered_Dose1_Recip | Administered_Dose1_Pop_Pct | Administered_Dose1_Recip_5Plus | ... | Series_Complete_5PlusPop_Pct_UR_Equity | Series_Complete_12PlusPop_Pct_UR_Equity | Series_Complete_18PlusPop_Pct_UR_Equity | Series_Complete_65PlusPop_Pct_UR_Equity | Census2019 | Census2019_5PlusPop | Census2019_12PlusPop | Census2019_18PlusPop | Census2019_65PlusPop | FIPS | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
FIPS | |||||||||||||||||||||
1001 | 15451.0 | 184.0 | 2022-02-23 | 8 | Autauga County | AL | 92.4 | 30610.0 | 54.8 | 30609.0 | ... | 3.0 | 3.0 | 4.0 | 4.0 | 55869.0 | 52592.0 | 47574.0 | 42904.0 | 8924.0 | 1001 |
1003 | 54837.0 | 636.0 | 2022-02-23 | 8 | Baldwin County | AL | 92.4 | 141853.0 | 63.5 | 141842.0 | ... | 4.0 | 4.0 | 4.0 | 4.0 | 223234.0 | 211195.0 | 192649.0 | 175680.0 | 46830.0 | 1003 |
1005 | 5433.0 | 92.0 | 2022-02-23 | 8 | Barbour County | AL | 92.4 | 13565.0 | 55.0 | 13563.0 | ... | 7.0 | 8.0 | 8.0 | 8.0 | 24686.0 | 23377.0 | 21404.0 | 19604.0 | 4861.0 | 1005 |
1007 | 6364.0 | 99.0 | 2022-02-23 | 8 | Bibb County | AL | 92.4 | 9439.0 | 42.1 | 9436.0 | ... | 2.0 | 2.0 | 3.0 | 3.0 | 22394.0 | 21148.0 | 19480.0 | 17837.0 | 3733.0 | 1007 |
1009 | 14706.0 | 216.0 | 2022-02-23 | 8 | Blount County | AL | 92.4 | 21922.0 | 37.9 | 21917.0 | ... | 2.0 | 2.0 | 2.0 | 3.0 | 57826.0 | 54388.0 | 49234.0 | 44571.0 | 10814.0 | 1009 |
5 rows × 55 columns
So I have all the raw numbers but not the percentages, I want to see if proportionally the vaccines made a difference in the 3,133 counties that I am tracking in the USA
dataA['Pop'] = dataA.Census2019 # Population estimates
dataA['Per_infect'] = dataA.cumulative_confirmed/dataA.Pop*100 # create cumutative percentage infected
dataA['Per_death'] = dataA.cumulative_deceased/dataA.Pop*100 # create cumutative percentage deaths
sns.lmplot(data=dataA,x='Series_Complete_Pop_Pct',y='Per_death',height=7, aspect=2,scatter_kws={'alpha':0.3})
plt.xlabel('Vaccination rate for fully Vaccinated')
plt.ylabel('Percentage death of population due to covid')
plt.grid()
plt.show()
model = ols("Per_death ~ Series_Complete_Pop_Pct " , dataA).fit()
print(model.summary())
OLS Regression Results ============================================================================== Dep. Variable: Per_death R-squared: 0.080 Model: OLS Adj. R-squared: 0.080 Method: Least Squares F-statistic: 273.5 Date: Fri, 25 Feb 2022 Prob (F-statistic): 5.69e-59 Time: 11:42:20 Log-Likelihood: 1535.2 No. Observations: 3133 AIC: -3066. Df Residuals: 3131 BIC: -3054. Df Model: 1 Covariance Type: nonrobust =========================================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------------------- Intercept 0.5221 0.011 48.377 0.000 0.501 0.543 Series_Complete_Pop_Pct -0.0035 0.000 -16.539 0.000 -0.004 -0.003 ============================================================================== Omnibus: 229.869 Durbin-Watson: 1.425 Prob(Omnibus): 0.000 Jarque-Bera (JB): 482.364 Skew: 0.482 Prob(JB): 1.80e-105 Kurtosis: 4.663 Cond. No. 210. ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
This clearly is statistically significant, also I would note that I did not parse the data or choose what subgroup of the data to use. Simply put, the slope of the regression line indicates that the more you are vaccinated the less you die. One can get into arguments about correlation and causation, but if you are talking about Vaccination rates for a disease and death from said disease, you can infer causation.
I decided to look deeper into the data and see if there are any leading indicators of behavior, for example what type of county
sns.lmplot(data=dataA,x='Series_Complete_Pop_Pct',y='Per_death',hue="Metro_status",height=7, aspect=2,scatter_kws={'alpha':0.2}, hue_order=['Metro',"Non-metro"])
plt.xlabel('Vaccination rate for fully Vaccinated')
plt.ylabel('Percentage death of population due to covid')
plt.grid()
plt.show()
So according to the above graph, people who live in a "metro" area have a greater advantage if they are vaccinated compared to non-metro. You can explicitly see this looking that their linear regressions below, the coefficients for metro is -0.0034 compared to -0.0025 for non-metro.
model = ols("Per_death ~ Series_Complete_Pop_Pct " , dataA[dataA.Metro_status=="Metro"]).fit()
print(model.summary())
model = ols("Per_death ~ Series_Complete_Pop_Pct " , dataA[dataA.Metro_status=="Non-metro"]).fit()
print(model.summary())
OLS Regression Results ============================================================================== Dep. Variable: Per_death R-squared: 0.124 Model: OLS Adj. R-squared: 0.123 Method: Least Squares F-statistic: 164.4 Date: Tue, 15 Mar 2022 Prob (F-statistic): 2.72e-35 Time: 12:16:31 Log-Likelihood: 877.72 No. Observations: 1161 AIC: -1751. Df Residuals: 1159 BIC: -1741. Df Model: 1 Covariance Type: nonrobust =========================================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------------------- Intercept 0.4852 0.015 32.515 0.000 0.456 0.514 Series_Complete_Pop_Pct -0.0034 0.000 -12.822 0.000 -0.004 -0.003 ============================================================================== Omnibus: 77.789 Durbin-Watson: 1.492 Prob(Omnibus): 0.000 Jarque-Bera (JB): 130.584 Skew: 0.496 Prob(JB): 4.41e-29 Kurtosis: 4.310 Cond. No. 250. ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. OLS Regression Results ============================================================================== Dep. Variable: Per_death R-squared: 0.033 Model: OLS Adj. R-squared: 0.032 Method: Least Squares F-statistic: 66.35 Date: Tue, 15 Mar 2022 Prob (F-statistic): 6.64e-16 Time: 12:16:31 Log-Likelihood: 799.05 No. Observations: 1971 AIC: -1594. Df Residuals: 1969 BIC: -1583. Df Model: 1 Covariance Type: nonrobust =========================================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------------------- Intercept 0.4957 0.015 33.502 0.000 0.467 0.525 Series_Complete_Pop_Pct -0.0025 0.000 -8.145 0.000 -0.003 -0.002 ============================================================================== Omnibus: 85.848 Durbin-Watson: 1.319 Prob(Omnibus): 0.000 Jarque-Bera (JB): 149.342 Skew: 0.345 Prob(JB): 3.72e-33 Kurtosis: 4.158 Cond. No. 199. ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
This difference perked my interest, so I started looking at other classifications. The one that came to mind was the fevrocity of people concerning vaccines in relation to their political leanings. So I went and got voting data for each county in the USA
dataVote = pd.read_csv('Data/countypres_2000-2020.csv')
dataVote.head()
year | state | state_po | county_name | county_fips | office | candidate | party | candidatevotes | totalvotes | version | mode | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2000 | ALABAMA | AL | AUTAUGA | 1001.0 | PRESIDENT | AL GORE | DEMOCRAT | 4942.0 | 17208.0 | 20191203 | TOTAL |
1 | 2000 | ALABAMA | AL | AUTAUGA | 1001.0 | PRESIDENT | GEORGE W. BUSH | REPUBLICAN | 11993.0 | 17208.0 | 20191203 | TOTAL |
2 | 2000 | ALABAMA | AL | AUTAUGA | 1001.0 | PRESIDENT | RALPH NADER | GREEN | 160.0 | 17208.0 | 20191203 | TOTAL |
3 | 2000 | ALABAMA | AL | AUTAUGA | 1001.0 | PRESIDENT | OTHER | OTHER | 113.0 | 17208.0 | 20191203 | TOTAL |
4 | 2000 | ALABAMA | AL | BALDWIN | 1003.0 | PRESIDENT | AL GORE | DEMOCRAT | 13997.0 | 56480.0 | 20191203 | TOTAL |
I needed to join the data sets I had, with data on Covid infections, deaths, vaccination rates and now voting record. I created a "Trump" margin value, which is the percentage win of Trump in the 2020 election. So -10 would mean the county went Biden 60% and Trump 40%
dataVote = dataVote[dataVote.year==2020]
dataVote=dataVote[dataVote.party=="REPUBLICAN"]
dataVote = dataVote[dataVote['mode']=="TOTAL"] # we only want to look at total votes
dataVote["Trump_margin"] = (0.5 - (dataVote.totalvotes-dataVote.candidatevotes)/dataVote.totalvotes)*100 #The higher the Margin, the more Trump votes
dataVote = dataVote[["county_fips",'totalvotes','Trump_margin']]
dataVote = dataVote[dataVote.county_fips.isin(Valid_FIPS)] # again, matching the FIPS codes
dataVote.county_fips = dataVote.county_fips.astype(int)
dataVote.columns = ['FIPS','totalvote','Trump_margin']
dataVote=dataVote.set_index('FIPS',drop=True)
dataA = pd.concat([dataA,dataVote],axis=1)
dataA.head()
cumulative_confirmed | cumulative_deceased | Date | MMWR_week | Recip_County | Recip_State | Completeness_pct | Administered_Dose1_Recip | Administered_Dose1_Pop_Pct | Administered_Dose1_Recip_5Plus | ... | Census2019_5PlusPop | Census2019_12PlusPop | Census2019_18PlusPop | Census2019_65PlusPop | FIPS | Pop | Per_infect | Per_death | totalvote | Trump_margin | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
FIPS | |||||||||||||||||||||
1001 | 15451.0 | 184.0 | 2022-02-23 | 8 | Autauga County | AL | 92.4 | 30610.0 | 54.8 | 30609.0 | ... | 52592.0 | 47574.0 | 42904.0 | 8924.0 | 1001 | 55869.0 | 27.655766 | 0.329342 | 27770.0 | 21.436802 |
1003 | 54837.0 | 636.0 | 2022-02-23 | 8 | Baldwin County | AL | 92.4 | 141853.0 | 63.5 | 141842.0 | ... | 211195.0 | 192649.0 | 175680.0 | 46830.0 | 1003 | 223234.0 | 24.564806 | 0.284903 | 109679.0 | 26.171373 |
1005 | 5433.0 | 92.0 | 2022-02-23 | 8 | Barbour County | AL | 92.4 | 13565.0 | 55.0 | 13563.0 | ... | 23377.0 | 21404.0 | 19604.0 | 4861.0 | 1005 | 24686.0 | 22.008426 | 0.372681 | 10518.0 | 3.451226 |
1007 | 6364.0 | 99.0 | 2022-02-23 | 8 | Bibb County | AL | 92.4 | 9439.0 | 42.1 | 9436.0 | ... | 21148.0 | 19480.0 | 17837.0 | 3733.0 | 1007 | 22394.0 | 28.418326 | 0.442083 | 9595.0 | 28.426264 |
1009 | 14706.0 | 216.0 | 2022-02-23 | 8 | Blount County | AL | 92.4 | 21922.0 | 37.9 | 21917.0 | ... | 54388.0 | 49234.0 | 44571.0 | 10814.0 | 1009 | 57826.0 | 25.431467 | 0.373534 | 27588.0 | 39.571553 |
5 rows × 60 columns
Then I plotted each county Vaccination versus Death rate, I color coded the Trump margin with more red being more Trump-margin and more blue being more Biden-margin. I also sized each county to the population of the county
customPalette = sns.color_palette("RdBu_r", as_cmap=True)
sns.relplot(x="Series_Complete_Pop_Pct", y="Per_death", hue="Trump_margin", size="Pop",palette=customPalette,
sizes=(10, 600), alpha=.45,height=8,aspect=2, data=dataA)
plt.xlabel('Percent of county vaccinated')
plt.ylabel('Percent of county death to covid')
plt.grid()
plt.ylim(0,1)
plt.show()
As a mathematician I knew I was on to something from this picture so I started to dig. First this I did was create a Trump margin versus Vaccination rate
sns.relplot(x="Trump_margin",y="Series_Complete_Pop_Pct", hue="Trump_margin",size='Pop',
sizes=(10, 600), alpha=.95,height=8,aspect=2, data=dataA,palette=customPalette)
sns.regplot(x='Trump_margin',y='Series_Complete_Pop_Pct',data=dataA ,scatter_kws={'alpha':0.25},scatter=False)
plt.ylabel('Percentage of the county fully vaccinated')
plt.xlabel('Margin of victory for Trump')
plt.title('All counties, Trump margin Vs Vaccination rate')
plt.grid()
plt.ylim(0,100)
plt.xlim(-50,50)
plt.show()
Looking at all counties and all types of counties this is impressive. The more your county voted for Trump the less likely that a random person in your county was fully vaccinated. If you look at the linear regression, you can see its statistically significant and the confidence interval i very tight. Essentially for every 2 percentage points a county voted for Trump they had a one percentage point drop in vaccination rate.
model = ols("Series_Complete_Pop_Pct ~ Trump_margin " , dataA).fit()
print(model.summary())
OLS Regression Results =================================================================================== Dep. Variable: Series_Complete_Pop_Pct R-squared: 0.424 Model: OLS Adj. R-squared: 0.423 Method: Least Squares F-statistic: 1659. Date: Tue, 15 Mar 2022 Prob (F-statistic): 2.50e-272 Time: 12:30:31 Log-Likelihood: -8376.4 No. Observations: 2258 AIC: 1.676e+04 Df Residuals: 2256 BIC: 1.677e+04 Df Model: 1 Covariance Type: nonrobust ================================================================================ coef std err t P>|t| [0.025 0.975] -------------------------------------------------------------------------------- Intercept 58.6703 0.289 203.336 0.000 58.105 59.236 Trump_margin -0.5247 0.013 -40.729 0.000 -0.550 -0.499 ============================================================================== Omnibus: 972.246 Durbin-Watson: 1.569 Prob(Omnibus): 0.000 Jarque-Bera (JB): 13470.230 Skew: -1.651 Prob(JB): 0.00 Kurtosis: 14.501 Cond. No. 31.1 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
I was worried that this data was skewed because of the size of these counties, with more democratic counties being "metro" and more republican being "non-metro". So I split all 3,133 counties in to quartiles in size and and extra one for very large cities (500k+)
dataA['County_Population']=pd.cut(dataA.Pop,bins=[dataA.Pop.quantile(0),dataA.Pop.quantile(0.25),dataA.Pop.quantile(0.50),dataA.Pop.quantile(0.75),dataA.Pop.quantile(0.957),dataA.Pop.quantile(1)],labels=['0-11k','11k-26k','26k-70k','70K-500k','500k-10 mil'])
# splitting counties up into Quartiles and very large
Pop_cuts = ['0-11k','11k-26k','26k-70k','70K-500k','500k-10 mil']
for P in Pop_cuts:
Min_pop=dataA[dataA.County_Population==P].Pop.min()/(100*(Pop_cuts.index(P)**3 + 1))
Max_pop=dataA[dataA.County_Population==P].Pop.max()/(100*(Pop_cuts.index(P)**3 + 1))
sns.relplot(x="Trump_margin",y="Series_Complete_Pop_Pct", hue="Trump_margin",size='Pop',
sizes=(Min_pop,Max_pop), alpha=.95,height=8,aspect=2, data=dataA[dataA.County_Population==P],palette=customPalette)
sns.regplot(x='Trump_margin',y='Series_Complete_Pop_Pct',data=dataA[dataA.County_Population==P] ,scatter_kws={'alpha':0.25},scatter=False)
plt.ylabel('Percentage of the county fully vaccinated')
plt.xlabel('Margin of victory for Trump, for counties with Populations of '+P)
plt.grid()
plt.ylim(0,100)
plt.xlim(-50,50)
plt.show()
model = ols("Series_Complete_Pop_Pct ~ Trump_margin " , dataA[dataA.County_Population==P]).fit()
print(model.summary())
OLS Regression Results =================================================================================== Dep. Variable: Series_Complete_Pop_Pct R-squared: 0.271 Model: OLS Adj. R-squared: 0.269 Method: Least Squares F-statistic: 223.8 Date: Tue, 15 Mar 2022 Prob (F-statistic): 2.89e-43 Time: 12:33:03 Log-Likelihood: -2290.9 No. Observations: 605 AIC: 4586. Df Residuals: 603 BIC: 4595. Df Model: 1 Covariance Type: nonrobust ================================================================================ coef std err t P>|t| [0.025 0.975] -------------------------------------------------------------------------------- Intercept 56.0467 0.841 66.660 0.000 54.395 57.698 Trump_margin -0.4427 0.030 -14.960 0.000 -0.501 -0.385 ============================================================================== Omnibus: 89.592 Durbin-Watson: 1.441 Prob(Omnibus): 0.000 Jarque-Bera (JB): 652.208 Skew: -0.396 Prob(JB): 2.37e-142 Kurtosis: 8.024 Cond. No. 55.0 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results =================================================================================== Dep. Variable: Series_Complete_Pop_Pct R-squared: 0.227 Model: OLS Adj. R-squared: 0.225 Method: Least Squares F-statistic: 147.8 Date: Tue, 15 Mar 2022 Prob (F-statistic): 5.24e-30 Time: 12:33:04 Log-Likelihood: -1932.0 No. Observations: 506 AIC: 3868. Df Residuals: 504 BIC: 3876. Df Model: 1 Covariance Type: nonrobust ================================================================================ coef std err t P>|t| [0.025 0.975] -------------------------------------------------------------------------------- Intercept 55.6433 0.856 65.033 0.000 53.962 57.324 Trump_margin -0.4424 0.036 -12.159 0.000 -0.514 -0.371 ============================================================================== Omnibus: 244.858 Durbin-Watson: 1.261 Prob(Omnibus): 0.000 Jarque-Bera (JB): 2611.782 Skew: -1.837 Prob(JB): 0.00 Kurtosis: 13.506 Cond. No. 41.1 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results =================================================================================== Dep. Variable: Series_Complete_Pop_Pct R-squared: 0.470 Model: OLS Adj. R-squared: 0.469 Method: Least Squares F-statistic: 495.0 Date: Tue, 15 Mar 2022 Prob (F-statistic): 5.10e-79 Time: 12:33:05 Log-Likelihood: -1932.0 No. Observations: 561 AIC: 3868. Df Residuals: 559 BIC: 3877. Df Model: 1 Covariance Type: nonrobust ================================================================================ coef std err t P>|t| [0.025 0.975] -------------------------------------------------------------------------------- Intercept 59.1595 0.516 114.727 0.000 58.147 60.172 Trump_margin -0.5575 0.025 -22.249 0.000 -0.607 -0.508 ============================================================================== Omnibus: 29.363 Durbin-Watson: 1.549 Prob(Omnibus): 0.000 Jarque-Bera (JB): 84.348 Skew: -0.142 Prob(JB): 4.83e-19 Kurtosis: 4.878 Cond. No. 33.2 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results =================================================================================== Dep. Variable: Series_Complete_Pop_Pct R-squared: 0.290 Model: OLS Adj. R-squared: 0.289 Method: Least Squares F-statistic: 192.6 Date: Tue, 15 Mar 2022 Prob (F-statistic): 5.84e-37 Time: 12:33:07 Log-Likelihood: -1738.1 No. Observations: 473 AIC: 3480. Df Residuals: 471 BIC: 3488. Df Model: 1 Covariance Type: nonrobust ================================================================================ coef std err t P>|t| [0.025 0.975] -------------------------------------------------------------------------------- Intercept 60.0152 0.479 125.389 0.000 59.075 60.956 Trump_margin -0.4568 0.033 -13.879 0.000 -0.522 -0.392 ============================================================================== Omnibus: 344.827 Durbin-Watson: 1.261 Prob(Omnibus): 0.000 Jarque-Bera (JB): 8229.878 Skew: -2.838 Prob(JB): 0.00 Kurtosis: 22.631 Cond. No. 15.8 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results =================================================================================== Dep. Variable: Series_Complete_Pop_Pct R-squared: 0.227 Model: OLS Adj. R-squared: 0.220 Method: Least Squares F-statistic: 32.68 Date: Tue, 15 Mar 2022 Prob (F-statistic): 9.30e-08 Time: 12:33:08 Log-Likelihood: -409.60 No. Observations: 113 AIC: 823.2 Df Residuals: 111 BIC: 828.7 Df Model: 1 Covariance Type: nonrobust ================================================================================ coef std err t P>|t| [0.025 0.975] -------------------------------------------------------------------------------- Intercept 62.2957 1.186 52.546 0.000 59.946 64.645 Trump_margin -0.4411 0.077 -5.716 0.000 -0.594 -0.288 ============================================================================== Omnibus: 128.496 Durbin-Watson: 1.839 Prob(Omnibus): 0.000 Jarque-Bera (JB): 3619.776 Skew: -3.745 Prob(JB): 0.00 Kurtosis: 29.697 Cond. No. 21.2 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
For each of these classifications, the results were the same, Statistcially significant and approximately 2:1 ratio of Trump-Vote:vaccination decrease.
So the morbid part of me decided to look at death rates instead of vaccination rates. I kow already that they are connected, but I wanted to see the relationship between voting for Trump and chance of death
Pop_cuts = ['0-11k','11k-26k','26k-70k','70K-500k','500k-10 mil']
for P in Pop_cuts:
Min_pop=dataA[dataA.County_Population==P].Pop.min()/(100*(Pop_cuts.index(P)**3 + 1))
Max_pop=dataA[dataA.County_Population==P].Pop.max()/(100*(Pop_cuts.index(P)**3 + 1))
sns.relplot(x="Trump_margin",y="Per_death", hue="Trump_margin",size='Pop',
sizes=(Min_pop,Max_pop), alpha=.95,height=8,aspect=2, data=dataA[dataA.County_Population==P],palette=customPalette)
sns.regplot(x='Trump_margin',y='Per_death',data=dataA[dataA.County_Population==P] ,scatter_kws={'alpha':0.25},scatter=False)
plt.ylabel('Percentage of the county that died')
plt.xlabel('Margin of victory for Trump, for counties with Populations of '+P)
plt.grid()
plt.ylim(0,1)
plt.xlim(-50,50)
plt.show()
print(dataA[dataA.County_Population==P].Pop.sum(), ' Population in this slicing')
model = ols("Per_death ~ Trump_margin " , dataA[dataA.County_Population==P]).fit()
print(model.summary())
4619999.0 Population in this slicing OLS Regression Results ============================================================================== Dep. Variable: Per_death R-squared: 0.000 Model: OLS Adj. R-squared: -0.001 Method: Least Squares F-statistic: 0.1051 Date: Tue, 15 Mar 2022 Prob (F-statistic): 0.746 Time: 12:43:43 Log-Likelihood: 128.57 No. Observations: 605 AIC: -253.1 Df Residuals: 603 BIC: -244.3 Df Model: 1 Covariance Type: nonrobust ================================================================================ coef std err t P>|t| [0.025 0.975] -------------------------------------------------------------------------------- Intercept 0.3737 0.015 24.244 0.000 0.343 0.404 Trump_margin 0.0002 0.001 0.324 0.746 -0.001 0.001 ============================================================================== Omnibus: 19.083 Durbin-Watson: 1.377 Prob(Omnibus): 0.000 Jarque-Bera (JB): 20.015 Skew: 0.432 Prob(JB): 4.51e-05 Kurtosis: 3.217 Cond. No. 55.0 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
13822303.0 Population in this slicing OLS Regression Results ============================================================================== Dep. Variable: Per_death R-squared: 0.074 Model: OLS Adj. R-squared: 0.072 Method: Least Squares F-statistic: 40.17 Date: Tue, 15 Mar 2022 Prob (F-statistic): 5.17e-10 Time: 12:43:44 Log-Likelihood: 268.23 No. Observations: 506 AIC: -532.5 Df Residuals: 504 BIC: -524.0 Df Model: 1 Covariance Type: nonrobust ================================================================================ coef std err t P>|t| [0.025 0.975] -------------------------------------------------------------------------------- Intercept 0.3202 0.011 28.944 0.000 0.298 0.342 Trump_margin 0.0030 0.000 6.338 0.000 0.002 0.004 ============================================================================== Omnibus: 19.134 Durbin-Watson: 1.306 Prob(Omnibus): 0.000 Jarque-Bera (JB): 20.664 Skew: 0.442 Prob(JB): 3.26e-05 Kurtosis: 3.445 Cond. No. 41.1 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
32978675.0 Population in this slicing OLS Regression Results ============================================================================== Dep. Variable: Per_death R-squared: 0.216 Model: OLS Adj. R-squared: 0.214 Method: Least Squares F-statistic: 153.8 Date: Tue, 15 Mar 2022 Prob (F-statistic): 2.28e-31 Time: 12:43:45 Log-Likelihood: 433.52 No. Observations: 561 AIC: -863.0 Df Residuals: 559 BIC: -854.4 Df Model: 1 Covariance Type: nonrobust ================================================================================ coef std err t P>|t| [0.025 0.975] -------------------------------------------------------------------------------- Intercept 0.2688 0.008 35.335 0.000 0.254 0.284 Trump_margin 0.0046 0.000 12.401 0.000 0.004 0.005 ============================================================================== Omnibus: 58.504 Durbin-Watson: 1.477 Prob(Omnibus): 0.000 Jarque-Bera (JB): 96.096 Skew: 0.686 Prob(JB): 1.36e-21 Kurtosis: 4.493 Cond. No. 33.2 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
114006856.0 Population in this slicing OLS Regression Results ============================================================================== Dep. Variable: Per_death R-squared: 0.273 Model: OLS Adj. R-squared: 0.272 Method: Least Squares F-statistic: 177.0 Date: Tue, 15 Mar 2022 Prob (F-statistic): 1.65e-34 Time: 12:43:47 Log-Likelihood: 422.29 No. Observations: 473 AIC: -840.6 Df Residuals: 471 BIC: -832.3 Df Model: 1 Covariance Type: nonrobust ================================================================================ coef std err t P>|t| [0.025 0.975] -------------------------------------------------------------------------------- Intercept 0.2637 0.005 53.041 0.000 0.254 0.273 Trump_margin 0.0045 0.000 13.303 0.000 0.004 0.005 ============================================================================== Omnibus: 83.347 Durbin-Watson: 1.489 Prob(Omnibus): 0.000 Jarque-Bera (JB): 205.736 Skew: 0.889 Prob(JB): 2.11e-45 Kurtosis: 5.698 Cond. No. 15.8 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
154469632.0 Population in this slicing OLS Regression Results ============================================================================== Dep. Variable: Per_death R-squared: 0.041 Model: OLS Adj. R-squared: 0.033 Method: Least Squares F-statistic: 4.796 Date: Tue, 15 Mar 2022 Prob (F-statistic): 0.0306 Time: 12:43:48 Log-Likelihood: 110.58 No. Observations: 113 AIC: -217.2 Df Residuals: 111 BIC: -211.7 Df Model: 1 Covariance Type: nonrobust ================================================================================ coef std err t P>|t| [0.025 0.975] -------------------------------------------------------------------------------- Intercept 0.2774 0.012 23.356 0.000 0.254 0.301 Trump_margin 0.0017 0.001 2.190 0.031 0.000 0.003 ============================================================================== Omnibus: 3.298 Durbin-Watson: 1.453 Prob(Omnibus): 0.192 Jarque-Bera (JB): 2.246 Skew: 0.155 Prob(JB): 0.325 Kurtosis: 2.383 Cond. No. 21.2 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
At last I saw a change,
When you look at counties with populations below 11,000 there was no effect that trump had on the death rates. The regression line is horizontal indicating it does not matter whether or not you voted one way or another, you have equally enough chance of dying. Unfortunately this lack of effect was only for 4.6 million people.
Every other size of county was statistically significant and all said the more a county voted for Trump the greater chance of dying.
The other thing I wanted to see was if this was the case since the vaccines were freely available.
dataCMay = pd.DataFrame(dataC[dataC['date']=="2021-05-30"])
dataCMay = dataCMay[dataCMay.location_key.str.count('_')>1]
dataCMay['FIPS'] = dataCMay.location_key.apply(lambda x: x.split('_',2)[2])
dataCMay = dataCMay[dataCMay.FIPS.str.isdigit()]
dataCMay['FIPS'] = dataCMay.FIPS.astype(int)
dataCMay = dataCMay[dataCMay.FIPS.isin(Valid_FIPS)] # making sure same set
dataCMay = dataCMay[['FIPS','cumulative_confirmed','cumulative_deceased']]
dataCMay.columns = ['FIPS','cumulative_confirmed_May','cumulative_deceased_May']
dataCMay=dataCMay.set_index('FIPS',drop=True)
dataCMay.head()
cumulative_confirmed_May | cumulative_deceased_May | |
---|---|---|
FIPS | ||
2013 | 367.0 | 3.0 |
2016 | 704.0 | 0.0 |
2020 | 30556.0 | 174.0 |
2050 | 3900.0 | 22.0 |
2068 | 109.0 | 0.0 |
dataA = pd.concat([dataA,dataCMay],axis=1)
dataA.head()
cumulative_confirmed | cumulative_deceased | Date | MMWR_week | Recip_County | Recip_State | Completeness_pct | Administered_Dose1_Recip | Administered_Dose1_Pop_Pct | Administered_Dose1_Recip_5Plus | ... | Census2019_65PlusPop | FIPS | Pop | Per_infect | Per_death | totalvote | Trump_margin | County_Population | cumulative_confirmed_May | cumulative_deceased_May | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
FIPS | |||||||||||||||||||||
1001 | 15451.0 | 184.0 | 2022-02-23 | 8.0 | Autauga County | AL | 92.4 | 30610.0 | 54.8 | 30609.0 | ... | 8924.0 | 1001.0 | 55869.0 | 27.655766 | 0.329342 | 27770.0 | 21.436802 | 26k-70k | 7142.0 | 110.0 |
1003 | 54837.0 | 636.0 | 2022-02-23 | 8.0 | Baldwin County | AL | 92.4 | 141853.0 | 63.5 | 141842.0 | ... | 46830.0 | 1003.0 | 223234.0 | 24.564806 | 0.284903 | 109679.0 | 26.171373 | 70K-500k | 21620.0 | 311.0 |
1005 | 5433.0 | 92.0 | 2022-02-23 | 8.0 | Barbour County | AL | 92.4 | 13565.0 | 55.0 | 13563.0 | ... | 4861.0 | 1005.0 | 24686.0 | 22.008426 | 0.372681 | 10518.0 | 3.451226 | 11k-26k | 2334.0 | 59.0 |
1007 | 6364.0 | 99.0 | 2022-02-23 | 8.0 | Bibb County | AL | 92.4 | 9439.0 | 42.1 | 9436.0 | ... | 3733.0 | 1007.0 | 22394.0 | 28.418326 | 0.442083 | 9595.0 | 28.426264 | 11k-26k | 2664.0 | 64.0 |
1009 | 14706.0 | 216.0 | 2022-02-23 | 8.0 | Blount County | AL | 92.4 | 21922.0 | 37.9 | 21917.0 | ... | 10814.0 | 1009.0 | 57826.0 | 25.431467 | 0.373534 | 27588.0 | 39.571553 | 26k-70k | 6864.0 | 139.0 |
5 rows × 63 columns
dataA['Per_infect_recent'] = (dataA.cumulative_confirmed-dataA.cumulative_confirmed_May)/dataA.Pop
dataA['Per_death_recent'] = (dataA.cumulative_deceased-dataA.cumulative_deceased_May)/dataA.Pop
Pop_cuts = ['0-11k','11k-26k','26k-70k','70K-500k','500k-10 mil']
for P in Pop_cuts:
Min_pop=dataA[dataA.County_Population==P].Pop.min()/(100*(Pop_cuts.index(P)**3 + 1))
Max_pop=dataA[dataA.County_Population==P].Pop.max()/(100*(Pop_cuts.index(P)**3 + 1))
sns.relplot(x="Trump_margin",y="Per_death_recent", hue="Trump_margin",size='Pop',
sizes=(Min_pop,Max_pop), alpha=.95,height=8,aspect=2, data=dataA[dataA.County_Population==P],palette=customPalette)
sns.regplot(x='Trump_margin',y='Per_death_recent',data=dataA[dataA.County_Population==P] ,scatter_kws={'alpha':0.25},scatter=False)
plt.ylabel('Percentage of the county that died since May 21')
plt.xlabel('Margin of victory for Trump, for counties with Populations of '+P)
plt.grid()
plt.ylim(0,.0051)
plt.xlim(-50,50)
plt.show()
model = ols("Per_death_recent ~ Trump_margin " , dataA[dataA.County_Population==P]).fit()
print(model.summary())
OLS Regression Results ============================================================================== Dep. Variable: Per_death_recent R-squared: 0.004 Model: OLS Adj. R-squared: 0.002 Method: Least Squares F-statistic: 2.363 Date: Tue, 15 Mar 2022 Prob (F-statistic): 0.125 Time: 12:47:15 Log-Likelihood: 3293.5 No. Observations: 605 AIC: -6583. Df Residuals: 603 BIC: -6574. Df Model: 1 Covariance Type: nonrobust ================================================================================ coef std err t P>|t| [0.025 0.975] -------------------------------------------------------------------------------- Intercept 0.0014 8.24e-05 16.512 0.000 0.001 0.002 Trump_margin 4.458e-06 2.9e-06 1.537 0.125 -1.24e-06 1.02e-05 ============================================================================== Omnibus: 85.632 Durbin-Watson: 1.491 Prob(Omnibus): 0.000 Jarque-Bera (JB): 131.909 Skew: 0.930 Prob(JB): 2.27e-29 Kurtosis: 4.332 Cond. No. 55.0 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: Per_death_recent R-squared: 0.174 Model: OLS Adj. R-squared: 0.173 Method: Least Squares F-statistic: 106.5 Date: Tue, 15 Mar 2022 Prob (F-statistic): 8.74e-23 Time: 12:47:16 Log-Likelihood: 2967.1 No. Observations: 506 AIC: -5930. Df Residuals: 504 BIC: -5922. Df Model: 1 Covariance Type: nonrobust ================================================================================ coef std err t P>|t| [0.025 0.975] -------------------------------------------------------------------------------- Intercept 0.0012 5.34e-05 22.620 0.000 0.001 0.001 Trump_margin 2.343e-05 2.27e-06 10.321 0.000 1.9e-05 2.79e-05 ============================================================================== Omnibus: 5.126 Durbin-Watson: 1.447 Prob(Omnibus): 0.077 Jarque-Bera (JB): 5.233 Skew: 0.240 Prob(JB): 0.0731 Kurtosis: 2.870 Cond. No. 41.1 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: Per_death_recent R-squared: 0.292 Model: OLS Adj. R-squared: 0.290 Method: Least Squares F-statistic: 230.0 Date: Tue, 15 Mar 2022 Prob (F-statistic): 9.03e-44 Time: 12:47:18 Log-Likelihood: 3440.3 No. Observations: 561 AIC: -6877. Df Residuals: 559 BIC: -6868. Df Model: 1 Covariance Type: nonrobust ================================================================================ coef std err t P>|t| [0.025 0.975] -------------------------------------------------------------------------------- Intercept 0.0011 3.58e-05 30.724 0.000 0.001 0.001 Trump_margin 2.636e-05 1.74e-06 15.167 0.000 2.29e-05 2.98e-05 ============================================================================== Omnibus: 2.357 Durbin-Watson: 1.487 Prob(Omnibus): 0.308 Jarque-Bera (JB): 2.150 Skew: 0.128 Prob(JB): 0.341 Kurtosis: 3.163 Cond. No. 33.2 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: Per_death_recent R-squared: 0.434 Model: OLS Adj. R-squared: 0.432 Method: Least Squares F-statistic: 360.5 Date: Tue, 15 Mar 2022 Prob (F-statistic): 4.12e-60 Time: 12:47:19 Log-Likelihood: 3005.0 No. Observations: 473 AIC: -6006. Df Residuals: 471 BIC: -5998. Df Model: 1 Covariance Type: nonrobust ================================================================================ coef std err t P>|t| [0.025 0.975] -------------------------------------------------------------------------------- Intercept 0.0011 2.11e-05 51.857 0.000 0.001 0.001 Trump_margin 2.76e-05 1.45e-06 18.987 0.000 2.47e-05 3.05e-05 ============================================================================== Omnibus: 25.772 Durbin-Watson: 1.520 Prob(Omnibus): 0.000 Jarque-Bera (JB): 28.976 Skew: 0.547 Prob(JB): 5.10e-07 Kurtosis: 3.522 Cond. No. 15.8 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: Per_death_recent R-squared: 0.284 Model: OLS Adj. R-squared: 0.277 Method: Least Squares F-statistic: 43.92 Date: Tue, 15 Mar 2022 Prob (F-statistic): 1.28e-09 Time: 12:47:20 Log-Likelihood: 750.51 No. Observations: 113 AIC: -1497. Df Residuals: 111 BIC: -1492. Df Model: 1 Covariance Type: nonrobust ================================================================================ coef std err t P>|t| [0.025 0.975] -------------------------------------------------------------------------------- Intercept 0.0010 4.12e-05 25.077 0.000 0.001 0.001 Trump_margin 1.779e-05 2.68e-06 6.627 0.000 1.25e-05 2.31e-05 ============================================================================== Omnibus: 4.122 Durbin-Watson: 1.479 Prob(Omnibus): 0.127 Jarque-Bera (JB): 3.885 Skew: 0.454 Prob(JB): 0.143 Kurtosis: 3.007 Cond. No. 21.2 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Essentially the same thing happened, it had more of a relative effect since the proportion of the county whose people were dying was smaller. But the compartive slope of the regression line was steeper. What this means is, the more that county went democratic the better chance of not dying comparitive to pre vaccine availablility.
I wanted to see whether or not there was somethign else, so I did a mulitple regression using Trump margin and Vaccine rates.
model = ols("Per_death ~ Trump_margin + Series_Complete_Pop_Pct " , dataA).fit()
print(model.summary())
OLS Regression Results ============================================================================== Dep. Variable: Per_death R-squared: 0.111 Model: OLS Adj. R-squared: 0.110 Method: Least Squares F-statistic: 141.0 Date: Tue, 15 Mar 2022 Prob (F-statistic): 1.93e-58 Time: 12:52:42 Log-Likelihood: 1143.9 No. Observations: 2258 AIC: -2282. Df Residuals: 2255 BIC: -2265. Df Model: 2 Covariance Type: nonrobust =========================================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------------------- Intercept 0.3295 0.019 17.603 0.000 0.293 0.366 Trump_margin 0.0029 0.000 11.456 0.000 0.002 0.003 Series_Complete_Pop_Pct -0.0006 0.000 -1.867 0.062 -0.001 2.93e-05 ============================================================================== Omnibus: 172.303 Durbin-Watson: 1.408 Prob(Omnibus): 0.000 Jarque-Bera (JB): 291.771 Skew: 0.562 Prob(JB): 4.39e-64 Kurtosis: 4.356 Cond. No. 328. ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
What is strange is that the vaccine rate has less of an effect on dying than voting for Trump. Trump was a leading indicator of dying rather than vaccination rates. Now this is obviously partially explained by the fact that there is such a strong relationship between Trump voting and Vaccination rates, but I speculate it must be more than vaccination rates, else the Vaccination rates would have been the leading indicator. So there is something inherent about counties that Voted for Trump and death rates. One could speculate age in those counties, and it might be that way for soem counties, but this data is true for each of the subgroups of counties with populations 12,000 and over.
This was something I came up with while drinking soem wine and looking at these graphs. What if its not about democrate or republican, what if it is about civil engaement, measured in part by voter turnout?
dataA['Per_vote'] = dataA.totalvote/dataA.Pop*100
The next graph just appeased my curiousity of something talking heads used to say. The more people turn out the more people vote democrat. I wanted to see if this is true.
sns.relplot(x="Trump_margin",y="Per_vote", hue="Trump_margin",size='Pop',
sizes=(10, 600), alpha=.95,height=8,aspect=2, data=dataA,palette=customPalette)
sns.regplot(x='Trump_margin',y='Per_vote',data=dataA ,scatter_kws={'alpha':0.25},scatter=False)
plt.ylabel('Percentage of the county that Voted')
plt.xlabel('Margin of victory for Trump')
plt.title('All counties, Trump margin Vs voter turnout rate')
plt.grid()
plt.ylim(0,100)
plt.xlim(-50,50)
plt.show()
The negative slope indicates this, the more voter turn out the higher Biden percentage
Whats more interesting is Vaccination rates compared with Voter turnout
sns.relplot(x="Per_vote",y="Series_Complete_Pop_Pct", hue="Trump_margin",size='Pop',
sizes=(10, 600), alpha=.95,height=8,aspect=2, data=dataA,palette=customPalette)
sns.regplot(x='Per_vote',y='Series_Complete_Pop_Pct',data=dataA ,scatter_kws={'alpha':0.25},scatter=False)
plt.ylabel('Percentage of the county fully vaccinated')
plt.xlabel('Percentage of county Voting')
plt.title('All counties, Voter turnout Vs Vaccination rate')
plt.grid()
plt.ylim(0,100)
plt.xlim(0,1)
plt.show()
This in itself was interesting, but predictable and statistcally significant
model = ols("Per_vote ~ Series_Complete_Pop_Pct " , dataA).fit()
print(model.summary())
OLS Regression Results ============================================================================== Dep. Variable: Per_vote R-squared: 0.029 Model: OLS Adj. R-squared: 0.029 Method: Least Squares F-statistic: 68.46 Date: Tue, 15 Mar 2022 Prob (F-statistic): 2.19e-16 Time: 13:15:56 Log-Likelihood: -8303.0 No. Observations: 2258 AIC: 1.661e+04 Df Residuals: 2256 BIC: 1.662e+04 Df Model: 1 Covariance Type: nonrobust =========================================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------------------- Intercept 43.7274 0.807 54.169 0.000 42.144 45.310 Series_Complete_Pop_Pct 0.1280 0.015 8.274 0.000 0.098 0.158 ============================================================================== Omnibus: 1859.187 Durbin-Watson: 1.422 Prob(Omnibus): 0.000 Jarque-Bera (JB): 305270.378 Skew: 3.104 Prob(JB): 0.00 Kurtosis: 59.623 Cond. No. 209. ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Another thing was just to look at infection rates and voter turnout. In this graph there seems to be a nice uniform distribution of voter types but the inference is goign in the same direction
sns.relplot(x="Per_vote",y="Per_infect", hue="Trump_margin",size='Pop',
sizes=(10, 600), alpha=.95,height=8,aspect=2, data=dataA,palette=customPalette)
sns.regplot(x='Per_vote',y='Per_infect',data=dataA ,scatter_kws={'alpha':0.25},scatter=False)
plt.ylabel('Percentage of the county infected cumulative')
plt.xlabel('Percentage of county Voting')
plt.title('All counties, Voter Turnout Vs infection rate')
plt.grid()
plt.ylim(0,50)
plt.xlim(0,100)
plt.show()
The more voter turnout, or civic awareness, the less infections have happened.
This also translates into death versus voter turnout
Also
recent death versus voter turnout
sns.relplot(x="Per_vote",y="Per_death", hue="Trump_margin",size='Pop',
sizes=(10, 600), alpha=.95,height=8,aspect=2, data=dataA,palette=customPalette)
sns.regplot(x='Per_vote',y='Per_death',data=dataA ,scatter_kws={'alpha':0.25},scatter=False)
plt.ylabel('Percentage of the county death cumulative')
plt.xlabel('Percentage of county Voting')
plt.title('All counties, Voter Turnout Vs death rate')
plt.grid()
plt.ylim(0,1)
plt.xlim(0,100)
plt.show()
sns.relplot(x="Per_vote",y="Per_death_recent", hue="Trump_margin",size='Pop',
sizes=(10, 600), alpha=.95,height=8,aspect=2, data=dataA,palette=customPalette)
sns.regplot(x='Per_vote',y='Per_death_recent',data=dataA ,scatter_kws={'alpha':0.25},scatter=False)
plt.ylabel('Percentage of the county death rate since May')
plt.xlabel('Percentage of county Voting')
plt.title('All counties, Voter Turnout Vs death rate since May')
plt.grid()
plt.ylim(0,0.0031)
plt.xlim(0,100)
plt.show()
I hope you enjoyed this notebook.
All of the code can be used by anybody, the data is available via the links given at the top.