This problem came up in a previous Kaggle competition (this thread references the paper I mentioned in the comments).
The idea is that, say you had 5 age groups, where 0 < 1 < 2 < 3 < 4, instead of one-hot encoding them and using a softmax objective function, you can encode them into K-1 classes and use a sigmoid objective. So, as an example, your encodings would be
[0] -> [0, 0, 0, 0]
[1] -> [1, 0, 0, 0]
[2] -> [1, 1, 0, 0]
[3] -> [1, 1, 1, 0]
[4] -> [1, 1, 1, 1]
Then the net will learn the orderings. Hope this helps.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…