[Article]
Abstract.
The sign change problem in quantitative structure–activity relationship
(QSAR), quantitative structure–property relationship (QSPR) and related
studies is the controversy related to the signs of correlation
coefficients and regression coefficients of a descriptor in univariate
and multivariate regressions, before and after the data split. Among 50
investigated regression models with 227 descriptors extracted from the
literature, the sign change problem was shown to have a very high
frequency, according to four new criteria proposed in this work for its
assessment. The sign change problem can be substantially reduced and
even eliminated for a given dataset by statistically based variable
selection and by checking for the sign change problem before model
validation and interpretation. Knowing the fundamentals of statistics
related to the sign change problem, its identification and
understanding aid in finding effective means to remedy regression
models with this deficiency.
Keywords.
Correlation; Univariate Regression; Multivariate Regression;
Descriptors.
Keywords Plus.