vecxoz/vecstack: Python package for stacking (machine learning technique)

原作者: [db:作者] 来自: 网络收藏邀请

开源软件名称（OpenSource Name）：

vecxoz/vecstack

开源软件地址(OpenSource Url)：

https://github.com/vecxoz/vecstack

开源编程语言(OpenSource Language)：

Python 100.0%

开源软件介绍(OpenSource Introduction)：

vecstack

Python package for stacking (stacked generalization) featuring lightweight functional API and fully compatible scikit-learn API
Convenient way to automate OOF computation, prediction and bagging using any number of models

Functional API:
- Minimalistic. Get your stacked features in a single line
- RAM-friendly. The lowest possible memory consumption
- Kaggle-ready. Stacked features and hyperparameters from each run can be automatically saved in files. No more mess at the end of the competition. Log example
Scikit-learn API:
- Standardized. Fully scikit-learn compatible transformer class exposing fit and transform methods
- Pipeline-certified. Implement and deploy multilevel stacking like it's no big deal using sklearn.pipeline.Pipeline
- And of course FeatureUnion is also invited to the party
Overall specs:
- Use any sklearn-like estimators
- Perform classification and regression tasks
- Predict class labels or probabilities in classification task
- Apply any user-defined metric
- Apply any user-defined transformations for target and prediction
- Python 3.5 and higher, unofficial support for Python 2.7 and 3.4
- Win, Linux, Mac
- MIT license
- Depends on numpy, scipy, scikit-learn>=0.18

Get started

FAQ
Installation guide
Usage:
- Functional API
- Scikit-learn API
Tutorials:
- Stacking concept + Pictures + Stacking implementation from scratch
Examples (all examples are valid for both API with little difference in parameters):
- Functional API:
- Scikit-learn API:
  - Regression + Multilevel stacking using Pipeline
Documentation:
- Functional API or type >>> help(stacking)
- Scikit-learn API or type >>> help(StackingTransformer)

Installation

Note: Python 3.5 or higher is required. If you’re still using Python 2.7 or 3.4 see installation details here

Classic 1st time installation (recommended):
- pip install vecstack
Install for current user only (if you have some troubles with write permission):
- pip install --user vecstack
If your PATH doesn't work:
- /usr/bin/python -m pip install vecstack
- C:/Python36/python -m pip install vecstack
Upgrade vecstack and all dependencies:
- pip install --upgrade vecstack
Upgrade vecstack WITHOUT upgrading dependencies:
- pip install --upgrade --no-deps vecstack
Upgrade directly from GitHub WITHOUT upgrading dependencies:
- pip install --upgrade --no-deps https://github.com/vecxoz/vecstack/archive/master.zip
Uninstall
- pip uninstall vecstack

Usage. Functional API

from vecstack import stacking

# Get your data

# Initialize 1st level estimators
models = [LinearRegression(),
          Ridge(random_state=0)]

# Get your stacked features in a single line
S_train, S_test = stacking(models, X_train, y_train, X_test, regression=True, verbose=2)

# Use 2nd level estimator with stacked features

Usage. Scikit-learn API

from vecstack import StackingTransformer

# Get your data

# Initialize 1st level estimators
estimators = [('lr', LinearRegression()),
              ('ridge', Ridge(random_state=0))]
              
# Initialize StackingTransformer
stack = StackingTransformer(estimators, regression=True, verbose=2)

# Fit
stack = stack.fit(X_train, y_train)

# Get your stacked features
S_train = stack.transform(X_train)
S_test = stack.transform(X_test)

# Use 2nd level estimator with stacked features

Stacking FAQ

1. How can I report an issue? How can I ask a question about stacking or vecstack package?

Just open an issue here.
Ask me anything on the topic.
I'm a bit busy, so typically I answer on the next day.

2. How can I say thanks?

Just give me a star in the top right corner of the repository page.

3. How to cite vecstack?

@misc{vecstack2016,
       author = {Igor Ivanov},
       title = {Vecstack},
       year = {2016},
       publisher = {GitHub},
       howpublished = {\url{https://github.com/vecxoz/vecstack}},
}

4. What is stacking?

Stacking (stacked generalization) is a machine learning ensembling technique.
Main idea is to use predictions as features.
More specifically we predict train set (in CV-like fashion) and test set using some 1st level model(s), and then use these predictions as features for 2nd level model. You can find more details (concept, pictures, code) in stacking tutorial.
Also make sure to check out:

Ensemble Learning (Stacking) in Wikipedia
Classical Kaggle Ensembling Guide
Stacked Generalization paper by David H. Wolpert

5. What about stacking name?

Often it is also called stacked generalization. The term is derived from the verb to stack (to put together, to put on top of each other). It implies that we put some models on top of other models, i.e. train some models on predictions of other models. From another point of view we can say that we stack predictions in order to use them as features.

6. Do I need stacking at all?

It depends on specific business case. The main thing to know about stacking is that it requires significant computing resources. No Free Lunch Theorem applies as always. Stacking can give you an improvement but for certain price (deployment, computation, maintenance). Only experiment for given business case will give you an answer: is it worth an effort and money.

At current point large part of stacking users are participants of machine learning competitions. On Kaggle you can't go too far without ensembling. I can secretly tell you that at least top half of leaderboard in pretty much any competition uses ensembling (stacking) in some way. Stacking is less popular in production due to time and resource constraints, but I think it gains popularity.

7. Can you explain stacking (stacked generalization) in 10 lines of code?

Of course

8. Why do I need complicated inner procedure for stacking?

I can just do the following. Why not?

model_L1 = XGBRegressor()
model_L1 = model_L1.fit(X_train, y_train)
S_train = model_L1.predict(X_train).reshape(-1, 1)  # <- DOES NOT work due to overfitting. Must be CV
S_test = model_L1.predict(X_test).reshape(-1, 1)
model_L2 = LinearRegression()
model_L2 = model_L2.fit(S_train, y_train)
final_prediction = model_L2.predict(S_test)

Code above will give meaningless result. If we fit on X_train we can’t just predict X_train, because our 1st level model has already seen X_train, and its prediction will be overfitted. To avoid overfitting we perform cross-validation procedure and in each fold we predict out-of-fold (OOF) part of X_train. You can find more details (concept, pictures, code) in stacking tutorial.

9. I want to implement stacking (stacked generalization) from scratch. Can you help me?

Not a problem

10. What is OOF?

OOF is abbreviation for out-of-fold prediction. It's also known as OOF features, stacked features, stacking features, etc. Basically it means predictions for the part of train data that model haven't seen during training.

11. What are estimator, learner, model?

Basically it is the same thing meaning machine learning algorithm. Often these terms are used interchangeably.
Speaking about inner stacking mechanics, you should remember that when you have single 1st level model there will be at least n_folds separate models trained in each CV fold on different subsets of data. See Q23 for more details.

12. What is blending? How is it related to stacking?

Basically it is the same thing. Both approaches use predictions as features.
Often this terms are used interchangeably.
The difference is how we generate features (predictions) for the next level:

stacking: perform cross-validation procedure and predict each part of train set (OOF)
blending: predict fixed holdout set

vecstack package supports only stacking i.e. cross-validation approach. For given random_state value (e.g. 42) folds (splits) will be the same across all estimators. See also Q30.

13. How to optimize weights for weighted average?

You can use for example:

scipy.optimize.minimize
scipy.optimize.differential_evolution

14. What is better: weighted average for current level or additional level?

By default you can start from weighted average. It is easier to apply and more chances that it will give good result. Then you can try additional level which potentially can outperform weighted average (but not always and not in an easy way). Experiment is your friend.

15. What is bagging? How is it related to stacking?

Bagging or Bootstrap aggregating works as follows: generate subsets of training set, train models on these subsets and then find average of predictions.
Also term bagging is often used to describe following approaches:

train several different models on the same data and average predictions
train same model with different random seeds on the same data and average predictions

So if we run stacking and just average predictions - it is bagging.

16. How many models should I use on a given stacking level?

Note 1: The best architecture can be found only by experiment.
Note 2: Always remember that higher number of levels or models does NOT guarantee better result. The key to success in stacking (and ensembling in general) is diversity - low correlation between models.

It depends on many factors like type of problem, type of data, quality of models, correlation of models, expected result, etc.
Some example configurations are listed below.

Reasonable starting point:
- L1: 2-10 models -> L2: weighted (rank) average or single model
Then try to add more 1st level models and additional level:
- L1: 10-50 models -> L2: 2-10 models -> L3: weighted (rank) average
If you're crunching numbers at Kaggle and decided to go wild:
- L1: 100-inf models -> L2: 10-50 models -> L3: 2-10 models -> L4: weighted (rank) average

You can also find some winning stacking architectures on Kaggle blog, e.g.: 1st place in Homesite Quote Conversion.

17. How many stacking levels should I use?

For some example configurations see Q16.

18. How do I choose models for stacking?

Based on experiments and correlation (e.g. Pearson). Less correlated models give better result. It means that we should never judge our models by accuracy only. We should also consider correlation (how given model is different from others). Sometimes inaccurate but very different model can add substantial value to resulting ensemble.

19. I am trying hard but still can't beat my best single model with stacking. What is wrong?

Nothing is wrong. Stacking is advanced complicated technique. It's hard to make it work. Solution: make sure to try weighted (rank) average first instead of additional level with some advanced models. Average is much easier to apply and in most cases it will surely outperform your best model. If still no luck - then probably your models are highly correlated.

20. What should I choose: functional API (`stacking` function) or Scikit-learn API (`StackingTransformer`)?

Quick guide:

By default start from StackingTransformer with familiar scikit-learn interface and logic
If you need low RAM consumption try stacking function but remember that it does not store models and does not have scikit-learn capabilities

Stacking API comparison:

Property	stacking function	StackingTransformer
Execution time	Same	Same
RAM	Consumes the smallest possible amount of RAM. Does not store models. At any point in time only one model is alive. Logic: train model -> predict -> delete -> etc. When execution ends all RAM is released.	Consumes much more RAM. It stores all models built in each fold. This price is paid for standard scikit-learn capabilities like `Pipeline` and `FeatureUnion`.
Access to models after training	No	Yes
Compatibility with `Pipeline` and `FeatureUnion`	No	Yes
Estimator implementation restrictions	Must have only `fit` and `predict` (`predict_proba`) methods	Must be fully scikit-learn compatible
`NaN` and `inf` in input data	Allowed	Not allowed
Can automatically save OOF and log in files	Yes	No
Input dimensionality (`X_train`, `X_test`)	Arbitrary	2-D

21. How do parameters of `stacking` function and `StackingTransformer` correspond?

stacking function	StackingTransformer
`models=[Ridge()]`	`estimators=[('ridge', Ridge())]`
`mode='oof_pred_bag'` (alias `'A'`)	`variant='A'`
`mode='oof_pred'` (alias `'B'`)	`variant='B'`

22. Why Scikit-learn API was implemented as transformer and not predictor?

By nature stacking procedure is predictor, but by application it is definitely transformer.
Transformer architecture was chosen because first of all user needs direct access to OOF. I.e. the ability to compute correlations, weighted average, etc.
If you need predictor based on StackingTransformer you can easily create it via Pipeline by adding on the top of StackingTransformer some regressor or classifier.
Transformer makes it easy to create any number of stacking levels. Using Pipeline we can easily create multilevel stacking by just adding several StackingTransformer's on top of each other and then some final regressor or classifier.

23. How to estimate stacking training time and number of models which will be built?

Note: Stacking usually takes long time. It's expected (inevitable) behavior.

We can compute total number of models which will be built during stacking procedure using following formulas:

Variant A: n_models_total = n_estimators * n_folds
Variant B: n_models_total = n_estimators * n_folds + n_estimators

Let's look at example. Say we define our stacking procedure as follows:

estimators_L1 = [('lr', LinearRegression()),
                 ('ridge', Ridge())]
stack = StackingTransformer(estimators_L1, n_folds=4)

So we have two 1st level estimators and 4 folds. It means stacking procedure will build the following number of models:

Variant A: 8 models total. Each model is trained on 3/4 of X_train.
Variant B: 10 models total. 8 models are trained on 3/4 of X_train and 2 models on full X_train.

Compute time:

If estimators have relatively similar training time, we can roughly compute total training time as: time_total = n_models_total * time_of_one_model
If estimators have different training time, we should compute number of models and time for each estimator separately (set n_estimators=1 in formulas above) and then sum up times.

24. Which stacking variant should I use: 'A' ('oof_pred_bag') or 'B' ('oof_pred')?

You can find out only by experiment. Default choice is variant A, because it takes less time and there should be no significant difference in result. But of course you may also try variant B. For more details see stacking tutorial.

25. How to choose number of folds?

Note: Remember that higher number of folds substantially increase training time (and RAM consumption for StackingTransformer). See Q23.

Standard approach: 4 or 5 folds.
If data is big: 3 folds.
If data is small: you can try more folds like 10 or so.

26. When I transform train set I see 'Train set was detected'. What does it mean?

Note 1: It is NOT allowed to change train set between calls to fit and transform methods. Due to stacking nature transformation is different for train set and any other set. If train set is changed after training, stacking procedure will not be able to correctly identify it and transformation will be wrong.

Note 2: To be correctly detected train set does not necessarily have to be identical (exactly the same). It must have the same shape and all values must be close (np.isclose is used for checking). So if you somehow regenerate your train set you should not worry about numerical precision.

If you transform X_train and see 'Train set was detected' everything is OK. If you transform X_train but you don't see this message then something went wrong. Probably your train set was changed (it is not allowed). In this case you have to retrain StackingTransformer. For more details see stacking tutorial or Q8.

27. How is the very first stacking level called: L0 or L1? Where does counting start?

Common convention: The very first bunch of models which are trained on initial raw data are called L1. On top of L1 we have so called stacker level or meta level or L2 i.e. models which are trained on predictions of L1 models. Count continues in the same fashion up to arbitrary number of levels.

I use this convention in my code and docs. But of course your Kaggle teammates may use some other naming approach, so you should clarify this for your specific case.

28. Can I use `(Randomized)GridSearchCV` to tune the whole stacking Pipeline?

Yes, technically you can, but it is not recommended because this approach will lead to redundant computations. General practical advice is to tune each estimator separately and then use tuned estimators on the 1st level of stacking. Higher level estimators should be tuned in the same fashion using OOF from previous level. For manual tuning you can use stacking function or StackingTransformer with a single 1st level estimator.

29. How to define custom metric, especially AUC?

from sklearn.metrics import roc_auc_score
from sklearn.preprocessing import OneHotEncoder

def auc(y_true, y_pred):
    """ROC AUC metric for both binary and multiclass classification

鲜花

握手

雷人

路过

鸡蛋

该文章已有0人参与评论

请发表评论

全部评论

专题导读

More+

10-27 六六分期app的软件客服如何联系？(六六分期

11-06 可心卡盟:win10系统火狐flash插件崩溃怎么

11-06 亲亲特价:怎么删除回收站图标

11-06 济南大学虚拟社区:鲁大师节能降温的具体办

11-06 xlueops.exe:无线网络安装向导

11-06 女斗合众国:win7系统cf与主机连接不稳定怎

11-06 0xc000022-[cf烟雾头]cf怎么调烟雾头

11-06 qizideyouhuo:应用程序无法正常启动0xc0000

11-06 ipz-185:win7系统vcf文件怎么打开

11-06 傻哥蹦迪:win10系统s4怎么打开usb调试

11-06 八神浩树gtaste:回收站清空了怎么恢复

11-06 妖尾之黑色守护:win10系统电脑没有1440x900

11-06 校园至尊魔王小说:win7系统浏览网页时字体

11-06 女斗合众国:win10系统访问共享文件夹提示请

11-06 tokyo hot n0654:恢复win7系统默认字体一招

11-06 雨酷仙境:设置win7系统转移临时文件夹腾出

11-06 阿穆纳伊之杖:win7系统开始菜单在右边还原

11-06 tunespotting:win10系统火狐flash插件总是

11-06 甘尔葛分析师：计谋网站seo关键词暴涨有什

11-06 蔡贵霖: 计谋网站seo关键词暴涨有什么秘密

11-06 博益网首页:ao3网页版进入不了解决方法

11-06 漏斗子专栏: 网站数据分析小白易懂精华篇

11-06 见证双虹怎么做:win7系统开启telnet命令的

11-06 颾狐蝶蜋:系统资源不足无法完成请求的服务

11-06 国光中学校歌:提交网站到alexa查询详细步骤

11-06 西安有情天:静态网页和动态网页的区别

11-06 红木雅尚斋:外部链接构造对网站的好处

11-06 前官礼遇：防止域名劫持–增强域安全性的10

11-06 密传二转答案: 中文分词算法有哪些

11-06 金泉家园邮编:百度快照劫持的表现及应对方

datamics/R-DataScience-MachineLearning发布时间：2022-08-19

ila987/Machine-learning-templates发布时间：2022-08-19

剪的笔顺,诠释剪的笔画,认识剪的部首

1 六六分期app的软件客服如何联系？(六六分期

六六分期app的软件客服如何联系？不知道吗？加qq群【895510560】即可！标题：六六分期

阅读：19266|2023-10-27

2 可心卡盟:win10系统火狐flash插件崩溃怎么

今天小编告诉大家如何处理win10系统火狐flash插件总是崩溃的问题，可能很多用户都不知

阅读：10012|2022-11-06

3 亲亲特价:怎么删除回收站图标

今天小编告诉大家如何对win10系统删除桌面回收站图标进行设置，可能很多用户都不知道

阅读：8339|2022-11-06

4 济南大学虚拟社区:鲁大师节能降温的具体办

今天小编告诉大家如何对win10系统电脑设置节能降温的设置方法，想必大家都遇到过需要

阅读：8706|2022-11-06

5 xlueops.exe:无线网络安装向导

我们在使用xp系统的过程中,经常需要对xp系统无线网络安装向导设置进行设置，可能很多

阅读：8653|2022-11-06

6 女斗合众国:win7系统cf与主机连接不稳定怎

今天小编告诉大家如何处理win7系统玩cf老是与主机连接不稳定的问题，可能很多用户都不

阅读：9682|2022-11-06

7 0xc000022-[cf烟雾头]cf怎么调烟雾头

电脑对日常生活的重要性小编就不多说了，可是一旦碰到win7系统设置cf烟雾头的问题，很

阅读：8641|2022-11-06

8 qizideyouhuo:应用程序无法正常启动0xc0000

我们在日常使用电脑的时候，有的小伙伴们可能在打开应用的时候会遇见提示应用程序无法

阅读：8009|2022-11-06

9 ipz-185:win7系统vcf文件怎么打开

今天小编告诉大家如何对win7系统打开vcf文件进行设置，可能很多用户都不知道怎么对win

阅读：8676|2022-11-06

10 傻哥蹦迪:win10系统s4怎么打开usb调试

今天小编告诉大家如何对win10系统s4开启USB调试模式进行设置，可能很多用户都不知道怎

阅读：7545|2022-11-06

客服电话

电子邮件

vecxoz/vecstack: Python package for stacking (machine learning technique)

开源软件名称（OpenSource Name）：

开源软件地址(OpenSource Url)：

开源编程语言(OpenSource Language)：

开源软件介绍(OpenSource Introduction)：

vecstack

Get started

Installation

Usage. Functional API

Usage. Scikit-learn API

Stacking FAQ

1. How can I report an issue? How can I ask a question about stacking or vecstack package?

2. How can I say thanks?

3. How to cite vecstack?

4. What is stacking?

5. What about stacking name?

6. Do I need stacking at all?

7. Can you explain stacking (stacked generalization) in 10 lines of code?

8. Why do I need complicated inner procedure for stacking?

9. I want to implement stacking (stacked generalization) from scratch. Can you help me?

10. What is OOF?

11. What are estimator, learner, model?

12. What is blending? How is it related to stacking?

13. How to optimize weights for weighted average?

14. What is better: weighted average for current level or additional level?

15. What is bagging? How is it related to stacking?

16. How many models should I use on a given stacking level?

17. How many stacking levels should I use?

18. How do I choose models for stacking?

19. I am trying hard but still can't beat my best single model with stacking. What is wrong?

20. What should I choose: functional API (stacking function) or Scikit-learn API (StackingTransformer)?

21. How do parameters of stacking function and StackingTransformer correspond?

22. Why Scikit-learn API was implemented as transformer and not predictor?

23. How to estimate stacking training time and number of models which will be built?

24. Which stacking variant should I use: 'A' ('oof_pred_bag') or 'B' ('oof_pred')?

25. How to choose number of folds?

26. When I transform train set I see 'Train set was detected'. What does it mean?

27. How is the very first stacking level called: L0 or L1? Where does counting start?

28. Can I use (Randomized)GridSearchCV to tune the whole stacking Pipeline?

29. How to define custom metric, especially AUC?

请发表评论

全部评论

上一篇：

下一篇：

ExpressQuantumGrid3.22 for Delphi Xe2

bradtraversy/iweather: Ionic 3 mobile we

CVE-2022-34911

joaomh/curso-de-matlab

断牙刷新位置时间（断牙属性及刷新位置介绍

剪的笔顺,诠释剪的笔画,认识剪的部首

六六分期app的软件客服如何联系？(六六分期

florent37/ViewAnimator: A fluent Android

florent37/Shrine-MaterialDesign2: implem

CVE-2020-36276

SimpleSoftwareIO/simple-sms: Send and re

关于我们

产品与服务

解决方案

139-2527-9053

20. What should I choose: functional API (`stacking` function) or Scikit-learn API (`StackingTransformer`)?

21. How do parameters of `stacking` function and `StackingTransformer` correspond?

28. Can I use `(Randomized)GridSearchCV` to tune the whole stacking Pipeline?