У меня вопрос, который касается как технических решений в R, так и статистики. Я надеюсь, что это не слишком статистично и по-прежнему актуально для этого сайта. У меня есть огромный набор данных с 2400 респондентами. Я провел логистическую регрессию, чтобы проанализировать взгляды на коррупцию в органах местного самоуправления в различных социально-экономических группах. Респонденты могли либо сказать, что коррумпированных чиновников мало/почти нет, либо большинство/все чиновники коррумпированы.
Сейчас я ищу способ рассчитать изменение шансов считать, что местные чиновники в основном коррумпированы. Так что я мог бы, например. говорят, что вероятность того, что коррупция распространена среди мужчин, уменьшилась на x процентов.
В дополнение к этому я хотел бы рассчитать Pseudo-R-Squared для каждой прогностической переменной, контролируя любые другие переменные. Я знаю, как это сделать в SPSS, но вычисление этого вручную в R кажется более сложным.
Это моя модель, в которой в качестве эталонной категории для зависимой переменной используется «Не много/почти нет коррумпированных чиновников». Референтная категория по полу – женщины, по образованию – базовое образование.
glm(formula = corruption_local_recoded ~ gender + age + education_cat,
family = binomial(link = "logit"), data = lebanon, subset = (corruption_local_recoded !=
"Don't know" & education_cat != "No formal education"))
Deviance Residuals:
Min 1Q Median 3Q Max
-1.671 -1.290 0.896 1.017 1.468
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.274611 0.169750 7.509 5.97e-14 ***
genderMale 0.169807 0.085740 1.980 0.047650 *
age -0.018510 0.002972 -6.228 4.74e-10 ***
education_catSecondary education -0.217526 0.107645 -2.021 0.043302 *
education_catHigher education -0.402557 0.121817 -3.305 0.000951 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 3139 on 2327 degrees of freedom
Residual deviance: 3095 on 2323 degrees of freedom
(42 observations deleted due to missingness)
AIC: 3105
Number of Fisher Scoring iterations: 4
Это образец первых 250 строк в моем наборе данных:
structure(list(age = c(41L, 36L, 33L, 26L, 28L, 33L, 31L, 45L,
70L, 18L, 23L, 20L, 24L, 44L, 38L, 39L, 23L, 45L, 26L, 54L, 26L,
22L, 33L, 62L, 18L, 67L, 28L, 28L, 26L, 40L, 53L, 36L, 58L, 52L,
43L, 24L, 28L, 29L, 21L, 41L, 33L, 37L, 23L, 21L, 48L, 20L, 65L,
26L, 38L, 24L, 59L, 48L, 26L, 33L, 36L, 39L, 24L, 28L, 75L, 26L,
38L, 32L, 43L, 28L, 63L, 68L, 28L, 32L, 18L, 34L, 20L, 21L, 56L,
31L, 52L, 30L, 26L, 40L, 28L, 38L, 36L, 60L, 56L, 53L, 25L, 66L,
29L, 19L, 33L, 55L, 20L, 40L, 49L, 24L, 47L, 25L, 58L, 31L, 20L,
41L, 71L, 27L, 34L, 19L, 40L, 55L, 36L, 25L, 55L, 38L, 27L, 52L,
21L, 19L, 70L, 38L, 53L, 70L, 22L, 22L, 18L, 18L, 30L, 38L, 45L,
21L, 53L, 48L, 19L, 72L, 35L, 25L, 30L, 58L, 25L, 53L, 47L, 19L,
27L, 28L, 37L, 25L, 48L, 60L, 20L, 21L, 26L, 43L, 38L, 24L, 48L,
26L, 52L, 22L, 21L, 38L, 41L, 30L, 40L, 19L, 55L, 24L, 18L, 18L,
56L, 70L, 43L, 24L, 24L, 18L, 55L, 48L, 36L, 27L, 32L, 28L, 50L,
60L, 27L, 57L, 36L, 31L, 18L, 22L, 45L, 25L, 24L, 29L, 35L, 36L,
48L, 31L, 35L, 30L, 44L, 45L, 37L, 31L, 61L, 58L, 25L, 39L, 18L,
34L, 30L, 36L, 48L, 20L, 21L, 24L, 49L, 61L, 52L, 33L, 45L, 21L,
42L, 28L, 35L, 33L, 25L, 21L, 46L, 52L, 45L, 24L, 34L, 56L, 60L,
36L, 69L, 23L, 63L, 40L, 70L, 70L, 23L, 29L, 29L, 60L, 38L, 65L,
38L, 52L, 28L, 29L, 22L, 26L, 28L, 48L), gender = c("Male", "Female",
"Female", "Male", "Male", "Male", "Male", "Male", "Male", "Male",
"Female", "Male", "Male", "Female", "Male", "Female", "Female",
"Male", "Female", "Female", "Male", "Male", "Male", "Female",
"Male", "Male", "Male", "Male", "Male", "Female", "Male", "Female",
"Male", "Female", "Male", "Male", "Female", "Male", "Male", "Male",
"Male", "Female", "Male", "Male", "Male", "Male", "Female", "Female",
"Female", "Male", "Male", "Female", "Male", "Male", "Female",
"Female", "Male", "Male", "Male", "Female", "Male", "Female",
"Female", "Male", "Male", "Female", "Female", "Male", "Male",
"Female", "Female", "Male", "Male", "Female", "Female", "Male",
"Male", "Male", "Male", "Male", "Female", "Female", "Female",
"Male", "Male", "Male", "Female", "Female", "Male", "Male", "Male",
"Female", "Female", "Female", "Male", "Male", "Female", "Female",
"Female", "Female", "Male", "Male", "Male", "Female", "Female",
"Male", "Female", "Male", "Female", "Male", "Female", "Female",
"Female", "Female", "Male", "Male", "Female", "Female", "Male",
"Female", "Female", "Female", "Male", "Female", "Male", "Female",
"Female", "Female", "Male", "Female", "Female", "Male", "Female",
"Male", "Female", "Female", "Female", "Female", "Female", "Female",
"Male", "Female", "Male", "Female", "Female", "Male", "Male",
"Female", "Female", "Female", "Male", "Male", "Female", "Male",
"Male", "Female", "Female", "Male", "Female", "Female", "Female",
"Female", "Female", "Female", "Female", "Female", "Female", "Male",
"Female", "Male", "Male", "Female", "Male", "Male", "Male", "Female",
"Female", "Male", "Female", "Female", "Female", "Female", "Male",
"Female", "Female", "Male", "Male", "Female", "Male", "Female",
"Female", "Male", "Female", "Female", "Female", "Male", "Female",
"Male", "Female", "Female", "Female", "Female", "Male", "Female",
"Male", "Female", "Female", "Male", "Female", "Male", "Female",
"Male", "Female", "Male", "Female", "Male", "Female", "Female",
"Female", "Male", "Female", "Female", "Male", "Male", "Female",
"Female", "Female", "Male", "Male", "Male", "Female", "Male",
"Female", "Male", "Male", "Female", "Female", "Male", "Male",
"Male", "Female", "Male", "Male", "Female", "Female", "Female",
"Male", "Male", "Male", "Female"), education_cat = structure(c(2L,
2L, 3L, 2L, 3L, 3L, 2L, 2L, 2L, 3L, 4L, 3L, 4L, 2L, 2L, 4L, 3L,
4L, 2L, 3L, 3L, 2L, 2L, 2L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
2L, 2L, 3L, 3L, 3L, 2L, 2L, 2L, 4L, 3L, 4L, 2L, 2L, 1L, 4L, 3L,
3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 3L, 4L, 3L, 2L, 2L, 2L,
2L, 4L, 2L, 2L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 2L,
3L, 2L, 2L, 4L, 2L, 2L, 2L, 3L, 3L, 4L, 2L, 2L, 3L, 3L, 2L, 3L,
4L, 4L, 2L, 2L, 2L, 3L, 4L, 4L, 3L, 3L, 4L, 2L, 3L, 2L, 2L, 2L,
4L, 3L, 3L, 4L, 2L, 3L, 2L, 4L, 2L, 3L, 4L, 2L, 2L, 2L, 3L, 4L,
2L, 3L, 4L, 4L, 2L, 2L, 1L, 2L, 4L, 3L, 2L, 2L, 3L, 2L, 2L, 3L,
2L, 3L, 2L, 2L, 4L, 3L, 3L, 2L, 2L, 3L, 3L, 4L, 2L, 3L, 4L, 3L,
4L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 3L, 2L, 2L, 4L, 2L, 2L, 3L, 3L,
2L, 4L, 2L, 4L, 3L, 2L, 4L, 2L, 3L, 2L, 4L, 4L, 2L, 1L, 3L, 2L,
2L, 3L, 2L, 3L, 2L, 2L, 2L, 4L, 3L, 2L, 3L, 4L, 3L, 2L, 4L, 4L,
3L, 2L, 4L, 3L, 3L, 2L, 2L, 2L, 3L, 2L, 2L, 4L, 3L, 3L, 2L, 3L,
2L, 2L, 3L, 2L, 3L, 2L, 4L, 2L, 3L, 2L, 2L, 3L, 1L, 4L, 2L, 3L,
2L, 2L, 2L, 4L, 3L, 3L, 3L, 4L, 2L), .Label = c("No formal education",
"Basic education", "Secondary education", "Higher education"), class = "factor"),
corruption_local_recoded = structure(c(1L, 1L, 1L, 1L, NA,
NA, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L,
2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L,
2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L,
1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, NA, 1L, 2L, 2L, 2L,
2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L,
1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L,
2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L,
2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L,
2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L,
NA, 2L, 2L, 2L, 2L), .Label = c("Not a lot/hardly any corrupt official",
"Most/every official is corrupt", "Don't know", "Refused to answer"
), class = "factor")), row.names = c(NA, 250L), class = "data.frame")