regressions team #13

donbowen · 2023-02-22T21:29:17Z

No description provided.

donbowen · 2023-02-22T21:30:22Z

We didn't talk about this today. Please reply with a link to your file with tables!

@jum223 @XiaozheZhangLehigh

donbowen · 2023-02-22T21:39:21Z

@mromano224 @SebastianStoneham

What about regressions? These would be easy to implement from the bank tract data! We can talk on monday about how!

$$y = a * TractStat + b * BOW + c * TractStat * BOW + e$$

--> c would show issues.

y1 = Denial Rate
y2 = log(# applications)

TractStat1 = Hispanic%
TractStat2 = Hispanic > median Hisp% (from census level)

Reg 1: DenialRate = a + bH + c{BOW==1} + d*H{BOW ==1} + e

jum223 · 2023-02-22T21:59:12Z

You can find the tables we produced in my branch called juan4, the files are in code folder and they are called TablesAZ and TablesCA.

donbowen · 2023-02-23T05:12:54Z

Cool, I see some numbers you should show Matt Monday in your existing tables...

Quintiles_loan_table ... please add denial rates and avg (approved) loan size... useful stats to know
Quintiles_loan_table3.iloc[:,-3:] (we only need to see the last 3 cols)

donbowen · 2023-02-27T19:25:16Z

@jum223 @XiaozheZhangLehigh @mromano224 @SebastianStoneham

What's the status here? Please be prepared with which tables (And which numbers specifically in the tables) you what to show. Please reply with a link to the files with the tables you want to show.

donbowen · 2023-02-28T15:03:37Z

donbowen · 2023-02-28T19:28:55Z

You need to update the regression. See the picture above.

Reg 1 uses y1 and x1, 2 uses y1 x2, and so on

@mromano224

mromano224 · 2023-02-28T19:39:34Z

Just updated the regressions in regression_1 @donbowen

donbowen · 2023-02-28T19:51:16Z

Better. Still not right.

model_names = ['m1', 'm2', 'm3', 'm4'] is not informative when the y variable is changing from one column to another. I think if you delete that, it will show the variable name for each column. If not update the model_names.
Second: Sigh... that's not the regression... look at the bottom of the picture. See it? That's the regression. Or look at this comment further up the thread, where @annakharv46 transcribed the photo for you.
In the plan for the regressions from the comment/whiteboard, I called x1 and x2 H. H is either
- Fraction hispanic (not high_hispanic )
- Fraction hispanic > median(fraction hispanic)

mromano224 · 2023-02-28T20:31:59Z

another update... please lmk, sorry for the confusion @donbowen

donbowen · 2023-02-28T21:31:23Z

Am I looking at the right file?

Good job with the column names... it helped me figure out the key problem.

Obviously still not the formula... it's missing 2 variables in each column. I know why now (next point)
You need to include all rows in your data for the regressions. A single regression is supposed to have a variable indicating whether or not the row is about BOW. You obviously can't do this on your mini datasets that have only BOW or only Competitors.
You didn't ABCD and print your data, but I bet your "hisp_over_med" variable wasn't only 0s and 1s.
No need to type the regressions twice and print them by themselves and then all together at the end.

Here, just restart the file with this (some issues fixed, others I left pointers to.)

import pandas as pd
import numpy as np
from statsmodels.formula.api import ols as sm_ols
from statsmodels.iolib.summary2 import summary_col 

bank_tract = pd.read_csv('../input_data_clean/bank_tract_clean_WITH_CENSUS.csv')

# adjust this next line to drop the BMO rows
bank_tract = bank_tract.query('which_bank != "BMO")

# create vars
bank_tract['hisp_rate']     = (bank_tract ['HispanicLatinoPop'] / bank_tract ['Tot.Pop']) * 100
bank_tract['hisp_over_med'] = bank_tract["hisp_rate"] >  bank_tract["hisp_rate"].median()
bank_tract['log_num_apps']  = np.log(bank_tract['num_applications'])

# skip all the one-off regressions (just show them all together...)

# regressions

# define the regression models (YOU'LL NEED TO MAKE ONE MORE VARIABLE ABOVE, AND THEN UPDATE THESE TO MATCH FORMULA)
m1 = sm.OLS.from_formula('denial_rate ~ hisp_rate', data=all_other).fit()
m2 = sm.OLS.from_formula('denial_rate ~ hisp_over_med', data=bank_of_west).fit()
m3 = sm.OLS.from_formula('log_num_apps ~ hisp_rate', data=all_other).fit()
m4 = sm.OLS.from_formula('log_num_apps ~ hisp_over_med', data=bank_of_west).fit()

# set up the formatting for the table
info_dict = {'No. observations': lambda x: f"{int(x.nobs):d}"}
float_format = '%0.3f'

# UPDATE THIS AS NEEDED: 
regressor_order = ['Intercept', 'hisp_rate', 'hisp_over_med']

# UPDATE THE COLUMN NAMES (just using the y variable in each column works)
table = summary_col(results=[m1, m2, m3, m4],
                    model_names=['|all banks denial rate reg|',
                                 '|BOW denial rate reg|',
                                 '|all banks log num apps reg|',
                                 '|BOW log num apps reg|'],
                    regressor_order=regressor_order,
                    float_format=float_format,
                    info_dict=info_dict,
                    stars=True)  

table.title = 'OLS Regressions'

# print the table
print(table)

vrg223 · 2023-03-01T19:06:45Z

Regression Analysis:
-Based on reg2.ipynb
In linear regression for CA and AZ, the models displaying log(number of applications return higher R-squareds (In the 70s) relative to the models describing denial rates. The model is able to better describe variation in the number of applications.
R-squared retrieved in models 1 and 2 which describe the denial rates, dependant on the hispanic rates of the population, return a much lower R-Squared. This tells us that the relationship between a greater hispanic population and denial rates is not necessarily a one-to-one relationship.
hisp_rate: -0.00075*** implies a very small but negative relationship in denial rates and hispanic population in AZ. Same goes for hisp_rate: -0.00115*** in CA.

donbowen mentioned this issue Feb 22, 2023

Open tasks (post baby infrastructure) #11

Open

5 tasks

donbowen changed the title ~~Tables team~~ regressions team Feb 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

regressions team #13

regressions team #13

donbowen commented Feb 22, 2023

donbowen commented Feb 22, 2023

donbowen commented Feb 22, 2023 •

edited by annakharv46

Loading

jum223 commented Feb 22, 2023

donbowen commented Feb 23, 2023

donbowen commented Feb 27, 2023

donbowen commented Feb 28, 2023

donbowen commented Feb 28, 2023

mromano224 commented Feb 28, 2023

donbowen commented Feb 28, 2023

mromano224 commented Feb 28, 2023

donbowen commented Feb 28, 2023 •

edited

Loading

vrg223 commented Mar 1, 2023

regressions team #13

regressions team #13

Comments

donbowen commented Feb 22, 2023

donbowen commented Feb 22, 2023

donbowen commented Feb 22, 2023 • edited by annakharv46 Loading

jum223 commented Feb 22, 2023

donbowen commented Feb 23, 2023

donbowen commented Feb 27, 2023

donbowen commented Feb 28, 2023

donbowen commented Feb 28, 2023

mromano224 commented Feb 28, 2023

donbowen commented Feb 28, 2023

mromano224 commented Feb 28, 2023

donbowen commented Feb 28, 2023 • edited Loading

vrg223 commented Mar 1, 2023

donbowen commented Feb 22, 2023 •

edited by annakharv46

Loading

donbowen commented Feb 28, 2023 •

edited

Loading