Skip to content
Snippets Groups Projects
Commit 09e01d64 authored by Samuel Simko's avatar Samuel Simko
Browse files

f

parent 606770d4
No related branches found
No related tags found
No related merge requests found
......@@ -414,24 +414,28 @@ For the RNN, we found that using a LSTM cell instead of a RNN cell greatly incre
\centering
\begin{tabular}{|c|c|c|}
\hline
Algorithm & Best validation loss & Testing loss \\
Algorithm & Testing loss \\
\hline
Linear Regression (Baseline) & 2.64e-06 & \\
Linear Regression (Baseline) & 2.64e-06 \\
\hline
SVR & \ldots & 4.78e-05 \\
SVR & 4.78e-05 \\
\hline
MLP & & 7.09e-04 \\
MLP & 7.09e-04 \\
\hline
RNN & 6.03e-04 & 6.39e-04 \\
RNN & 6.39e-04 \\
\hline
\end{tabular}
\caption{Validation and testing losses for each algorithm used}
\label{fig:tabloss}
\end{figure}
In figure \ref{fig:tabloss}, we plot the best validation loss achieved during the cross-validation step.
In Figure \ref{fig:tabloss}, we plot the testing loss achieved for each algorithm.
For each algorithm, the hyperparameters used were the result of the Optuna optimization
on cross-validation.
We will perform a hypothesis test in order to figure out if the models achieved perform the same as the
We see in the Figure \ref{fig:tabloss} that the Linear regression performed the best out of all our models.
To see We will perform a hypothesis test in order to figure out if the models achieved perform the same as the
baseline.
We note $H_0$ the null hypotheses (The model and Linear Regression are equal in performance),
......@@ -445,21 +449,23 @@ We will do a paired sample t-test. In order to do so, we make the following assu
distributed. This seems to be the case if we plot the histogram (Figure \ref{hist});
\end{itemize}
We use a level of significance of $0.05$. The paired t-test will tell us if the mean loss
We use a level of significance of $\alpha = 0.05$. The paired t-test will tell us if the mean loss
of the two models are the same.
For the SVR test predictions, we get a p-value of 2.178e-05 for the Energy\_ attribute,
while 0.1766. As the p-value is above the level
of significance, we can reject the null hypothesis.
For the lasso test predictions, we get a p-value of 0.040.
As the p-value is below the level
of significance, We cannot reject the null hypothesis.
For the lasso test predictions, we get a p-value of 0.040. As the p-value is below the level
of significance, we can reject the null hypothesis.
For the MLP, we get a p-value of 0.13 for both dependent variables. We can reject the null
hypothesis.
For the Reccurent Neural Network, we get a p-value of 0.857 for the Energy\_ attribute,
and a p-value of 0.598 for the Energy\_DG attribute. We can reject the null hypothesis.
\ldots
As we did not find evidence which points to the hypothesis that the label is more than just linear,
we apply Occam's razor and conclude that the simplest model is to be preferred.
In more complex databases, we would use linear algorithms such as Linear Regression or Support Vector Machines
......@@ -475,9 +481,11 @@ for each of the algorithms using KFold cross-validation and Optuna to perform an
The baseline algorithm, a linear regression, performs extremely well with the right SMILES encoding.
We compared the best models for each different algorithm on the testing dataset. We found that linear regression
is competitive with the other methods. We use Occam's razor to determine that a linear model is to be
is competitive with the other methods we used.
\ldots
We use Occam's razor to determine that a simple linear regression with encoding of the number of occurences
of the different symbols of the SMILES string is to be preferred for practical applications
of molecular energy prediction.
\section{Acknowledgments}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment