KintorModel/gee.Rmd at master · the8thday/KintorModel · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
---
title: "gee"
author: "liuc"
date: "10/28/2021"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# gee

广义估计方程. 在医学研究中每个受试者有多次随访的数值型观测值，则每个受试者的每次观测值是相关的，不同受试者之间可以认为是独立的，但是模型误差是不相等的。
以基线为协变量，

```{r}
library(geepack)
library(tidyverse)
library(ggeffects)

```

在对重复测量资料展开广义估计方程分析时，首先要确定需要选择的作业相关矩阵；其次在建模时 基于对具体问题的了解考虑变量间的交互效应，主效应分析等。

### 首先是考虑主效应的结果

```{r}
jisu_sheet <- 'Ddimer'
sex <- 'males'

mt_ESR <- readxl::read_excel(glue::glue('/Users/congliu/OneDrive/kintor/Daily_Work/{sex}_treated.xlsx'),
                             sheet = jisu_sheet
                             ) %>%
  select(-BMI) %>%
  drop_na()
mn_ESR <- readxl::read_excel(glue::glue('/Users/congliu/OneDrive/kintor/Daily_Work/{sex}_untreated.xlsx'),
                             sheet = jisu_sheet
                             ) %>%
  select(-BMI) %>%
  drop_na()

df2 <- mt_ESR %>%
  mutate(class = 'Treated') %>%
  bind_rows(
    mn_ESR %>% mutate(class = 'Untreated')
  ) %>%
  pivot_longer(cols = -c(`Patient ID`, class, AGE), names_to = 'time', values_to = 'score') %>%
  arrange(`Patient ID`) %>% # data should be ordered
  rename('PatientID'='Patient ID') %>%
  mutate(class=if_else(class=='Treated',1,0)) %>%
  rstatix::convert_as_factor(PatientID, class, time) %>%
  drop_na()
df2$class <- fct_relevel(df2$class, '0')

df2
```

```{r}
geefit_main <- geeglm(
  score ~ time + class,
  id=`PatientID`,
  corstr='exchangeable', # 对于作业相关矩阵的选择依据Y变量的特征知识或是按照相关性矩阵进行选择
  family="gaussian",
  data=df2,
  std.err = 'san.se'
)
summary(geefit_main)

broom::tidy(geefit_main)

```

*结果解读：*对于主效应分析而言，其是基于所有人（不分组）的结果，上表中(Intercept)为timeDay0的数据，time作为我们within的变量结果也是主要的within分类结果；class1为组间差值，即是基于所有人的数据class1组比class0组降低了457；timeDay1为相对于timeDay0的估计值差值，即是在第1周比timeDay0降低了147，以此类推。 \_\_以上结果\_\_总体而言表明class分组对于Y值是具有统计学意义的影响因素；随着时间的变化Y值也是在变化的，虽则在timeDay7 p值为0.458，但以上结果都是在*整体*层面的分析。

#### some functions for gee object

```{r}
#
coef(geefit_main)
#
vcov(geefit_main)

# anova analysis tests whether the model terms are significant
anova(geefit_main)
```

### 加入交互效应

```{r}
# 加入交互效应，并考虑age作为协变量
geefit3 <- geeglm(score ~ time * class + AGE,
                  id=`PatientID`,
                  corstr='exchangeable',
                  family="gaussian",
                  data=df2,
                  std.err = 'san.se')
broom::tidy(geefit3)
anova(geefit3)
```

**结果解读**在加入交互效应之后，组别与时间点反应的不再是主效应而是单独效应，即是每一个具体分组的效应值。

*对于Y为分类变量的情况，OR=exp(estimate)*

```{r}
QIC(geefit_main)
QIC(geefit3)

anova(geefit_main, geefit3)

```

plot effect by emmeans

```{r}
plot(ggemmeans(geefit3, terms = c("time", "class"),
               condition = c(diagnose = "severe"))) +
  ggplot2::ggtitle("GEE Effect plot")
```
plot effects
```{r}
# ggpredict(geefit3,
#           terms = c('time', 'class')
#           )

(geefit3_effect <- ggeffect(geefit3,
         terms = c('time', 'class')
         ))

plot(geefit3_effect)
```


## 一些补充信息

```{r}
library(gee)

geefit_gee <- gee(
  formula = score ~ time * class + AGE,
  data = df2,
  id = PatientID,
  family = 'gaussian',
  contrasts = 'exchangeable'
)


summary(geefit_gee)
```