First, it has been well established that a variety of classification models yield incredibly good results on Iris (Iris is very predictable); see here, for example.
Secondly, we can observe that there are relatively few features in the Iris dataset. Moreover, if you look at the dataset description you can see that two of the features are very highly correlated with the class outcomes.
These correlation values are linear, single-feature correlations, which indicates that one can most likely apply a linear model and observe good results. Neural nets are highly nonlinear; they become more and more complex and capture greater and greater nonlinear feature combinations as the number of hidden nodes and hidden layers is increased.
Taking these facts into account, that (a) there are few features to begin with and (b) that there are high linear correlations with class, would all point to a less complex, linear function as being the appropriate predictive model-- by using a single hidden node, you are very nearly using a linear model.
It can also be noted that, in the absence of any hidden layer (i.e., just input and output nodes), and when the logistic transfer function is used, this is equivalent to logistic regression.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…