在线时间:8:00-16:00
迪恩网络APP
随时随地掌握行业动态
扫描二维码
关注迪恩网络微信公众号
开源软件名称:HCDM/BanditLib开源软件地址:https://github.com/HCDM/BanditLib开源编程语言:Python 90.2%开源软件介绍:BanditLibThis repo contains the implementation of serveral contextual bandits algorithm, including CoLin, hLinUCB, factorUCB, GOB.Lin, LinUCB, HybridLinUCB, PTS, and UCBPMF. CoLin, hLinUCB and factorUCB are our proposed algorithms published in [1], [2], [3]. We are updating the library and current version may have bugs. You can also check a stable release here: https://github.com/huazhengwang/BanditLib/releases/tag/1.0 UsageRun the simulator: Run with different parameters: Algorithms' detailsLinUCB: A state-of-art contextual bandit algorithm. It select arms based on an upper confidence bound of the estimated reward with given context vectors. LinUCB assume that users/bandits' parameters are independent with each other. And LinUCB only works with the observed features and does not consider hidden features. CoLin: A collaborative contextual bandit algorithm which explicitly models the underlying dependency among users/bandits. In CoLin, a weighted adjacency graph is constructed, where each node represents a contextual bandit deployed for a single user and the weight on each edge indicates the influence between a pair of users. Based on this dependency structure, the observed payoffs on each user are assumed to be determined by a mixture of neighboring users in the graph. Bandit parameters over all the users are estimated in a collaborative manner: both context and received payoffs from one user are prorogated across the whole graph in the process of online updating. CoLin establishes a bridge to share information among heterogenous users and thus reduce the sample com- plexity of preference learning. We rigorously prove that our CoLin achieves a remarkable reduction of upper regret bound with high probability, comparing to the linear regret with respect to the number of users if one simply runs independent bandits on them (LinUCB). hLinUCB: A contextual bandit algorithm with hidden feature learning, in which hidden features are explicitly introduced in our reward generation assumption, in addition to the observable contextual features. Coordinate descent with provable exploration bound is used to iteratively estimate the hidden features and unknown model parameters on the fly. At each iteration, closed form solutions exist and can be efficiently computed. Most importantly, we rigorously prove that with proper initialization the developed hLinUCB algorithm with hidden features learning can obtain a sublinear upper regret bound with high probability, and a linear regret is inevitable at the worst case if one fails to model such hidden features. FactorUCB: A factorization-based bandit algorithm, in which low-rank matrix completion is performed over an incrementally constructed user-item preference matrix and where an upper confidence bound based item selection strategy is developed to balance the exploit/explore trade-off in online learning. Observable conextual features and dependency among users (e.g., social influence) are leveraged to improve the algorithm’s convergence rate and help conquer cold-start in recommendation. A high probability sublinear upper regret bound is proved in the developed algorithm, where considerable regret reduction is achieved on both user and item sides. ResultThe results will be written under folder
RedesignConfiguration FilesA configuration yaml file can now be used to specify system level parameters for the simulation. The simulator will use a config file using the following command:
An example config file can be found below: general:
testing_iterations: 1000
context_dimension: 16
pool_article_size: 10
plot: True
user:
number: 10
collaborative: yes
article:
number: 1000
reward:
type: SocialLinear
alg:
general:
alpha: 0.3
lambda_: 0.1
parameters:
Theta: True
CoTheta: False
W: False
V: False
specific:
CoLinUCB:
parameters:
Theta: False
CoTheta: True Each section defines parameters for different modules in the system. In the alg section, two sub-headers are present: Adding New AlgorithmsA new algorithm can be defined by extending the BaseAlg class and implementing the following methods: class ExampleAlgorithm(BaseAlg):
def decide(self, pool_articles, userID, k = 1):
articles = []
return articles
def updateParameters(self, articlePicked, click, userID):
To define the default parameters for a new algorithm, a dictionary function can be defined in def create<Example>Dict(specific, general, W, system_params):
base_dict = {
'alpha': 0.3,
'lambda_': 0.1,
'parameters': {
'Theta': True,
}
}
return createSpecificAlgDict(specific, general, W, system_params, base_dict) Where variables like References[1]: Qingyun Wu, Huazheng Wang, Quanquan Gu and Hongning Wang. Contextual Bandits in A Collaborative Environment. The 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'2016), p529-538, 2016. [2]: Huazheng Wang, Qingyun Wu and Hongning Wang. Learning Hidden Features for Contextual Bandits. The 25th ACM International Conference on Information and Knowledge Management (CIKM 2016), p1633-1642, 2016. [3]: Huazheng Wang, Qingyun Wu and Hongning Wang. Factorization Bandits for Interactive Recommendation. The Thirty-First AAAI Conference on Artificial Intelligence (AAAI 2017). (to appear) |
2023-10-27
2022-08-15
2022-08-17
2022-09-23
2022-08-13
请发表评论