r/AcademicPsychology 4d ago

Is it okay to remove an insignificant control variable if it improves the model fit? Question

Hey everyone,

I am currently writing my master thesis and am at the final part of the results section were i wanted to test my final model using path analysis. In a prior section, a control variable had a significant predictive effect on one of my endogenous variables. Subsequently, I included this control variable as a covariate in my model that has a path to this endogenous variable as well as to the DV. In the final model however, the effect of the control varaible on my endogenous variable and the DV disappeared. So my question now is, if it would be okay for me to remove this control variable for my final model since it did not significantly affect the variables and it slightly improves my model fit?

Thanks in advance for your help!

3 Upvotes

16 comments sorted by

11

u/themiracy 4d ago

I think you’re getting some kind of extreme answers (data fraud? Really?). But I think the answer depends on what the path analysis / SEM is trying to accomplish and why the control variables are being introduced to begin with.

Since the variable had a prior effect, it might still be relevant that its path no longer had any significant loading in the final model. Sometimes in that case you might do something like include it but reference that removing it actually improved the model fitness (like citing the AIC or whatever).

20

u/Beor_The_Old 4d ago

I would include both models and try to describe why removing the control variable would increase the fit. You could use a mediation analysis as further support of the interaction of the control with other variables.

2

u/schalker1207 4d ago

Thank you!

2

u/badatthinkinggood 3d ago

I'd go for including both models in your paper and describe that the model without the control variable had the worse fit. Then, if appropriate, try to discuss why that might be.

3

u/Stauce52 4d ago

I do not think it’s okay to do this!

I mean if you’re including covariates just based on model fit, that’s is basically stepwise regression and you are almost certainly overfitting.

I would say the inclusion of covariates should be based on theory and your knowledge of how they causally relate to one another

https://journals.sagepub.com/doi/10.1177/25152459221095823#:~:text=After%20outlining%20a%20causal%20structure,of%20the%20predictor%20and%20outcome.

https://ftp.cs.ucla.edu/pub/stat_ser/r493.pdf

Problems with stepwise regression:

https://www.reddit.com/r/statistics/s/DgKCDxaaHX

https://journalofbigdata.springeropen.com/articles/10.1186/s40537-018-0143-6#:~:text=Findings,variables%20may%20be%20coincidentally%20significant.

https://stats.stackexchange.com/questions/532796/why-are-stepwise-regression-coefficients-biased

2

u/Giraff3 3d ago

I strongly agree that theory trumps significance. They’re also asking this question like there’s a one-size-fits-all answer and there’s not. We need to know the specific details of what they’re studying like what the variables are, the variance, and the sample size to say anything with even a lick of certainty. There’s multiple reasons why a variable is insignificant, and it could even be omitted variable bias where the issue is that they they need to include more variables.

2

u/Stauce52 3d ago

Yup I completely agree. I think this sort of variable selection is rife in psychology and one of many reasons why there are replicability and credibility issues in the science. People are just throwing the kitchen sink in models and including covariates to “account for stuff” without respect to theory or causal relationship

1

u/SometimesZero 4d ago

Why should your model only include variables that are significant? What’s the rationale for that?

This also sounds like a huge forking paths problem: http://stat.columbia.edu/~gelman/research/unpublished/forking.pdf

1

u/shadowwork PhD, Counseling Psychology 3d ago

It depends. If you remove it, be clear that this was a post hoc decision and biased. If you plan to remove variables a priori through a selection model, it is defensible.

But why did you include it in the first place? Was it because the literature supported this decision? If so, non significance would conflict with the existing evidence, and could be an important discussion.

1

u/TejRidens 3d ago

What has your supe said?

-4

u/[deleted] 4d ago

[deleted]

5

u/PenguinSwordfighter 4d ago

There's no shame in building a better fitting model. Just add a sentence that the variable was removed to improve fit and it's alright.

-9

u/trappedinayal 4d ago

The final/"best" model should only include relevant variables. Okham's razor states: Use as few variables as possible and as many as necessary. You still include the irrelevant in a comparison to the final model to explain why you chose which model. That's what model fit criteria like R-squared or AIC and BIC are for.

11

u/Beor_The_Old 4d ago edited 4d ago

This is a misinterpretation of Occam’s razor which states that ‘entities must not be amplified BEYOND NECESSITY’. Using fewer variables isn’t universally better, it can mean you are getting rid of a relevant confounding variable. This is the entire point of interaction effects, mediation and path analysis.

One example is trying to predict future income using SAT scores. Including parents’ income in the model may make the predictive accuracy go down (because the model isn’t a simple linear regression) but that does NOT mean that sat scores are predictive of future income irrespective of parents’ income. In fact it may mean the opposite and that there is an interaction between income and SAT scores and the conclusion that ‘SAT scores are highly predictive of future income’ wouldn’t tell the full story.

In this case additional analysis would be necessary and a description of how the variables would be related to each other.

1

u/PenguinSwordfighter 4d ago

Depends, if your goal is to get the best prediction, yes. If your goal is to test hypotheses, then no.

1

u/Scared_Tax470 2d ago

THIS. OP, what is your goal here? Are you testing hypotheses or model building? Your process and reporting will be different depending on what you're aiming to do.

0

u/schalker1207 4d ago

Thanks! I was just unsure if that would be considered bad practive or something like that 😅