Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
255 views
in Technique[技术] by (71.8m points)

python - What is Pool in Catboost?When to use Pool instead of numpy array?

I use this code to test CatBoostClassifier.

import numpy as np
from catboost import CatBoostClassifier, Pool

# initialize data
train_data = np.random.randint(0, 100, size=(100, 10))
train_labels = np.random.randint(0, 2, size=(100))
test_data = Pool(train_data, train_labels) #What is Pool?When to use Pool?
# test_data = np.random.randint(0,100, size=(20, 10)) #Usually we will use numpy array,will not use Pool

model = CatBoostClassifier(iterations=2,
                           depth=2,
                           learning_rate=1,
                           loss_function='Logloss',
                           verbose=True)
# train the model
model.fit(train_data, train_labels)
# make the prediction using the resulting model
preds_class = model.predict(test_data)
preds_proba = model.predict_proba(test_data)
print("class = ", preds_class)
print("proba = ", preds_proba)

The description about Pool is like this:

Pool used in CatBoost as a data structure to train model from.

I think usually we will use numpy array,will not use Pool.

For example we use:

test_data = np.random.randint(0,100, size=(20, 10))

I did not find any more usage of Pool, so I want to know when we will use Pool instead of numpy array?

question from:https://stackoverflow.com/questions/65838830/what-is-pool-in-catboostwhen-to-use-pool-instead-of-numpy-array

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

My understanding of a Pool is that it is just a convenience wrapper combining features, labels and further metadata like categorical features or a baseline.
While it does not make a big difference if you first construct your pool and then fit your model using the pool, it makes a difference when it comes to saving your training data. If you save all the information separately it might get out of sync or you might forget something and when loading you need couple of lines to load everything. The pool comes in very handy here.
Note that when fitting you can also specify an evaluation dataset as a pool. If you want to try multiple evalutation datasets, it is quite handy to have them wrapped up in a single object - that's what the pools are for.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...