Skip to content

Commit deca649

Browse files
wexlergroupwexlergroup
authored andcommitted
Refine terminology and improve clarity in calibration analysis section
1 parent f6c7cfe commit deca649

1 file changed

Lines changed: 25 additions & 22 deletions

File tree

lecture-08-calibration.md

Lines changed: 25 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -98,13 +98,13 @@ Fitting a calibration curve is not the end of the story. We need to know how con
9898

9999
### A Theoretical Interlude
100100

101-
In OLS, the sum of squared errors (SSE) or residuals (SSR) is key in determining the confidence intervals for the slope and intercept. The SSR is defined as
101+
In OLS, the **sum of squared errors (SSE)** or **residuals (SSR)** is key in determining the confidence intervals for the slope and intercept. The SSR is defined as
102102

103103
$$
104104
SSR = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
105105
$$
106106

107-
where $y_i$ is the observed value of the dependent variable, $\hat{y}_i$ is the predicted value of the dependent variable, and $n$ is the number of data points. Looking at the plot above, this would correspond to summing the squares of the vertical distances (gray lines) between the observed data points and the line. The SSR is related to the variance of the residuals, which is defined as
107+
where $y_i$ is the observed value of the dependent variable, $\hat{y}_i$ is the predicted value of the dependent variable, and $n$ is the number of data points. Looking at the plot above, this would correspond to summing the squares of the vertical distances (gray lines) between the observed data points and the line. The SSR is related to the **standard error of the regression**, which is defined as
108108

109109
````{margin}
110110
```{note}
@@ -113,27 +113,27 @@ We divide SSR by $n-2$ (not $n$) because estimating the slope and intercept uses
113113
````
114114

115115
$$
116-
\sigma^2 = \frac{SSR}{n-2}
116+
s_{y/x} = \frac{SSR}{n-2}
117117
$$
118118

119-
where $n$ is the number of data points. The variance of the residuals is used to calculate the standard errors of the slope and intercept, which are then used to calculate the confidence intervals. The standard errors of the slope and intercept are defined as
119+
where $n$ is the number of data points. The variance of the residuals is used to calculate the standard errors of the slope and intercept, which are then used to calculate the confidence intervals. The **standard errors of the slope and intercept** are defined as
120120

121121
$$
122-
SE(\hat{\beta}_1) = \sqrt{\frac{\sigma^2}{\sum_{i=1}^{n} (x_i - \bar{x})^2}}
122+
s_{\hat{\beta}_1} = \frac{s_{y/x}}{\sqrt{\sum_{i=1}^{n} (x_i - \bar{x})^2}}
123123
$$
124124

125125
$$
126-
SE(\hat{\beta}_0) = \sqrt{\sigma^2 \left( \frac{1}{n} + \frac{\bar{x}^2}{\sum_{i=1}^{n} (x_i - \bar{x})^2} \right)}
126+
s_{\hat{\beta}_0} = s_{y/x} \sqrt{\frac{\sum_{i=1}^{n} x_i^2}{n \sum_{i=1}^{n} (x_i - \bar{x})^2}}
127127
$$
128128

129-
where $\hat{\beta}_1$ is the estimated slope, $\hat{\beta}_0$ is the estimated intercept, $x_i$ is the value of the independent variable, $\bar{x}$ is the mean of the independent variable, and $n$ is the number of data points. The confidence intervals for the slope and intercept are then calculated as
129+
where $\hat{\beta}_1$ is the estimated slope, $\hat{\beta}_0$ is the estimated intercept, $x_i$ is the value of the independent variable, $\bar{x}$ is the mean of the independent variable, and $n$ is the number of data points. The **confidence intervals for the slope and intercept** are then calculated as
130130

131131
$$
132-
CI(\hat{\beta}_1) = \hat{\beta}_1 \pm t_{\alpha/2} SE(\hat{\beta}_1)
132+
CI_{\hat{\beta}_1} = \hat{\beta}_1 \pm t_{\alpha/2} s_{\hat{\beta}_1}
133133
$$
134134

135135
$$
136-
CI(\hat{\beta}_0) = \hat{\beta}_0 \pm t_{\alpha/2} SE(\hat{\beta}_0)
136+
CI_{\hat{\beta}_0} = \hat{\beta}_0 \pm t_{\alpha/2} s_{\hat{\beta}_0}
137137
$$
138138

139139
where $t_{\alpha/2}$ is the critical value of the $t$-distribution with $n-2$ degrees of freedom and a significance level of $\alpha/2$. The confidence intervals give us a range of values likely to contain the true value of the slope and intercept with a certain level of confidence.
@@ -147,22 +147,22 @@ Let's calculate the confidence intervals for the calibration curve's slope and i
147147
residuals = absorbance - line
148148
149149
# Calculate the sum of the squared residuals
150-
def sse(residuals):
150+
def ssr(residuals):
151151
return np.sum(residuals ** 2)
152152
153153
# Test the function
154-
print(sse(residuals))
154+
print(ssr(residuals))
155155
```
156156

157-
Now, let us write a function to compute the variance of the residuals.
157+
Now, let us write a function to compute the standard error of the regression.
158158

159159
```{code-cell} ipython3
160-
# Calculate the variance of the residuals
161-
def variance(residuals):
162-
return sse(residuals) / (len(residuals) - 2)
160+
# Calculate the standard error of the regression
161+
def se_regression(residuals):
162+
return ssr(residuals) / (len(residuals) - 2)
163163
164164
# Test the function
165-
print(variance(residuals))
165+
print(se_regression(residuals))
166166
```
167167

168168
OK, now we can calculate the standard errors of the slope and intercept.
@@ -171,7 +171,7 @@ OK, now we can calculate the standard errors of the slope and intercept.
171171
# Calculate the standard error of the slope
172172
def se_slope(x, residuals):
173173
# numerator
174-
numerator = variance(residuals)
174+
numerator = se_regression(residuals)
175175
# denominator
176176
x_mean = np.mean(x)
177177
denominator = np.sum((x - x_mean) ** 2)
@@ -182,12 +182,15 @@ print(se_slope(concentration, residuals))
182182
183183
# Calculate the standard error of the intercept
184184
def se_intercept(x, residuals):
185+
# prefactor
186+
prefactor = se_regression(residuals)
185187
# numerator
186-
numerator = variance(residuals)
188+
numerator = np.sum(x ** 2)
187189
# denominator
190+
n = len(x)
188191
x_mean = np.mean(x)
189-
denominator = len(x) * np.sum((x - x_mean) ** 2)
190-
return np.sqrt(numerator / denominator)
192+
denominator = n * np.sum((x - x_mean) ** 2)
193+
return prefactor * np.sqrt(numerator / denominator)
191194
192195
# Test the function
193196
print(se_intercept(concentration, residuals))
@@ -234,15 +237,15 @@ def confidence_interval_intercept(x, residuals, confidence_level):
234237
return critical_t_value * se
235238
236239
# Calculate the 95% confidence interval for the intercept
237-
print(f"intercept: {intercept:.3f} +/- {confidence_interval_intercept(concentration, residuals, 0.95):.3f}")
240+
print(f"intercept: {intercept:.6f} +/- {confidence_interval_intercept(concentration, residuals, 0.95):.6f}")
238241
```
239242

240243
## Correlation Analysis
241244

242245
The last step in analyzing calibration data is to perform correlation analysis. Correlation analysis assesses the strength of the relationship between the two variables. In this case, we are interested in the correlation between the diacetyl concentration and the absorbance value. The correlation coefficient measures the strength and direction of the relationship between two variables. It ranges from -1 to 1, with 1 indicating a perfect positive relationship, -1 indicating a perfect negative relationship, and 0 indicating no relationship. The correlation coefficient is calculated as
243246

244247
$$
245-
r = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^{n} (x_i - \bar{x})^2 \sum_{i=1}^{n} (y_i - \bar{y})^2}}
248+
r = \frac{\sum_{i=1}^{n} \left[ (x_i - \bar{x})(y_i - \bar{y}) \right]}{\sqrt{\sum_{i=1}^{n} \left[ (x_i - \bar{x})^2 \right] \sum_{i=1}^{n} \left[ (y_i - \bar{y})^2 \right]}}
246249
$$
247250

248251
where $x_i$ is the value of the independent variable, $\bar{x}$ is the mean of the independent variable, $y_i$ is the value of the dependent variable, and $\bar{y}$ is the mean of the dependent variable. The correlation coefficient gives us an indication of how well the two variables are related. A correlation coefficient close to 1 or -1 indicates a strong relationship, while a correlation coefficient close to 0 indicates a weak relationship.

0 commit comments

Comments
 (0)