python - How is scikit-learn cross_val_predict accuracy score calculated?

Question

Welcome To Ask or Share your Answers For Others

python - How is scikit-learn cross_val_predict accuracy score calculated?

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - How is scikit-learn cross_val_predict accuracy score calculated?

Does the cross_val_predict (see doc, v0.18) with k-fold method as shown in the code below calculate accuracy for each fold and average them finally or not?

cv = KFold(len(labels), n_folds=20)
clf = SVC()
ypred = cross_val_predict(clf, td, labels, cv=cv)
accuracy = accuracy_score(labels, ypred)
print accuracy

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T18:29:33+0000

No, it does not!

According to cross validation doc page, cross_val_predict does not return any scores but only the labels based on a certain strategy which is described here:

The function cross_val_predict has a similar interface to cross_val_score, but returns, for each element in the input, the prediction that was obtained for that element when it was in the test set. Only cross-validation strategies that assign all elements to a test set exactly once can be used (otherwise, an exception is raised).

And therefore by calling accuracy_score(labels, ypred) you are just calculating accuracy scores of labels predicted by aforementioned particular strategy compared to the true labels. This again is specified in the same documentation page:

These prediction can then be used to evaluate the classifier:
predicted = cross_val_predict(clf, iris.data, iris.target, cv=10) 
metrics.accuracy_score(iris.target, predicted)
Note that the result of this computation may be slightly different from those obtained using cross_val_score as the elements are grouped in different ways.

If you need accuracy scores of different folds you should try:

>>> scores = cross_val_score(clf, X, y, cv=cv)
>>> scores                                              
array([ 0.96...,  1.  ...,  0.96...,  0.96...,  1.        ])

and then for the mean accuracy of all folds use scores.mean():

>>> print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
Accuracy: 0.98 (+/- 0.03)

How to calculate Cohen kappa coefficient and confusion matrix for each fold?

For calculating Cohen Kappa coefficient and confusion matrix I assumed you mean kappa coefficient and confusion matrix between true labels and each fold's predicted labels:

from sklearn.model_selection import KFold
from sklearn.svm.classes import SVC
from sklearn.metrics.classification import cohen_kappa_score
from sklearn.metrics import confusion_matrix

cv = KFold(len(labels), n_folds=20)
clf = SVC()
for train_index, test_index in cv.split(X):
    clf.fit(X[train_index], labels[train_index])
    ypred = clf.predict(X[test_index])
    kappa_score = cohen_kappa_score(labels[test_index], ypred)
    confusion_matrix = confusion_matrix(labels[test_index], ypred)

What does `cross_val_predict` return?

It uses KFold to split the data to k parts and then for i=1..k iterations:

takes i'th part as the test data and all other parts as training data
trains the model with training data (all parts except i'th)
then by using this trained model, predicts labels for i'th part (test data)

In each iteration, label of i'th part of data gets predicted. In the end cross_val_predict merges all partially predicted labels and returns them as the final result.

This code shows this process step by step:

X = np.array([[0], [1], [2], [3], [4], [5]])
labels = np.array(['a', 'a', 'a', 'b', 'b', 'b'])

cv = KFold(len(labels), n_folds=3)
clf = SVC()
ypred_all = np.chararray((labels.shape))
i = 1
for train_index, test_index in cv.split(X):
    print("iteration", i, ":")
    print("train indices:", train_index)
    print("train data:", X[train_index])
    print("test indices:", test_index)
    print("test data:", X[test_index])
    clf.fit(X[train_index], labels[train_index])
    ypred = clf.predict(X[test_index])
    print("predicted labels for data of indices", test_index, "are:", ypred)
    ypred_all[test_index] = ypred
    print("merged predicted labels:", ypred_all)
    i = i+1
    print("=====================================")
y_cross_val_predict = cross_val_predict(clf, X, labels, cv=cv)
print("predicted labels by cross_val_predict:", y_cross_val_predict)

The result is:

iteration 1 :
train indices: [2 3 4 5]
train data: [[2] [3] [4] [5]]
test indices: [0 1]
test data: [[0] [1]]
predicted labels for data of indices [0 1] are: ['b' 'b']
merged predicted labels: ['b' 'b' '' '' '' '']
=====================================
iteration 2 :
train indices: [0 1 4 5]
train data: [[0] [1] [4] [5]]
test indices: [2 3]
test data: [[2] [3]]
predicted labels for data of indices [2 3] are: ['a' 'b']
merged predicted labels: ['b' 'b' 'a' 'b' '' '']
=====================================
iteration 3 :
train indices: [0 1 2 3]
train data: [[0] [1] [2] [3]]
test indices: [4 5]
test data: [[4] [5]]
predicted labels for data of indices [4 5] are: ['a' 'a']
merged predicted labels: ['b' 'b' 'a' 'b' 'a' 'a']
=====================================
predicted labels by cross_val_predict: ['b' 'b' 'a' 'b' 'a' 'a']

Categories

python - How is scikit-learn cross_val_predict accuracy score calculated?

python - How is scikit-learn cross_val_predict accuracy score calculated?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

How to calculate Cohen kappa coefficient and confusion matrix for each fold?

What does `cross_val_predict` return?

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags

Categories

python - How is scikit-learn cross_val_predict accuracy score calculated?

python - How is scikit-learn cross_val_predict accuracy score calculated?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

How to calculate Cohen kappa coefficient and confusion matrix for each fold?

What does cross_val_predict return?

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags

What does `cross_val_predict` return?