------------------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------- This direcoty contains data published in the QSAR article: Kiralj R., Ferreira M. M. C., "A priori molecular descriptors in QSAR: a case of HIV-1 protease inhibitors. I. The Chemometric approach". J. Mol. Graph. Mod., 21, 435-448 (2003). ------------------------------------------------------------------------------------------------------------------------------- The data are suitable for validation and testing of various chemometric methodologies. The data are in Tables 2 and 3 in the article: -1 dependent variable Y (biological activity, pIC50=-log(IC50), where IC50 is inhibitor's molar concentration at 50% viral inhibition); -14 independent variables (molecular descriptors of different nature, made by the a priori approach): X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13 and X14; -48 samples (peptidic HIV-1 integrase inhibitors). The meaning and definition of all the variables can be found in Table 1 in the paper. ------------------------------------------------------------------------------------------------------------------------------- The data are reorganized into 3 datasets (in .txt format), with matrices of dimensions 48x15. The columns in each .txt file are: -the first column: sample's position in the original dataset, being equal to the name of sample, as published in the article; -columns 2-15: independent variables (molecular descriptors) X1-X14, respectively. -the last column: the dependent variable (biological activity) Y. HIV1-QSAR-DATASET-A ------------------- -Used in the article: Kiralj R., Ferreira M. M. C., "A priori molecular descriptors in QSAR: a case of HIV-1 protease inhibitors. I. The Chemometric approach". J. Mol. Graph. Mod., 21, 435-448 (2003). -Three models were built: a priori I: all 48 samples (1-48) are training samples, there are no external validation samples; a priori II: first 32 (1-32) are training samples, the rest (33-48) are external validation samples. HIV1-QSAR-DATASET-B ------------------- -Used in the article: Teófilo R. F., Martins J. P. A., Ferreira M. M. C. "Sorting variables by using informative vectors as a strategy for feature selection in multivariate regression". J. Chemometr., 23, 32-48 (2009). -One model was built: the QSAR model: first 32 (1-32) are training samples, the rest (33-48) are external validation samples; the training samples are scrambled prior to any analysis. HIV1-QSAR-DATASET-C ------------------- -Used in the article: Hernández N., Kiralj R., Ferreira M. M. C., Talavera I., "Critical comparative analysis, validation and interpretation of SVM and PLS regression models in a QSAR study on HIV-1 protease inhibitors". Chemom. Intell. Lab. Syst., 98, 65-77 (2009). -Two models were built: ordinary PLS model: first 32 are training samples, the rest are external validation samples; the data split is different than in the previous two datasets; the training samples are scrambled prior to any analysis. OPS-PLS model: the same data split and scrambling as for the ordinary PLS model; only five descriptors were used (X3, X6, X9, X10 and X13). ------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------