Im trying to implement an environment where you have multiple dials which can be turned from 0-100.
(我正在尝试实现一个环境,在该环境中您可以将多个拨盘设置为0-100。)
Only 1 dial can be turned at each step. (每步只能转动1个拨盘。)
So far i did it with a discrete actionspace giving 100 action for each dial, so the agent would pick a dial and a position. (到目前为止,我使用一个离散的操作空间来完成此操作,每个操作盘可提供100次操作,因此座席可以选择一个表盘和一个位置。)
I would like to convert it to continous actionspace.
(我想将其转换为连续的动作空间。)
To do so i took a look at the implementation of a2c for BipedalWalker-v2. (为此,我看了BipedalWalker-v2的a2c实现。)
The problem i am facing now is that these implementations return an action for each actuator. (我现在面临的问题是这些实现为每个执行器返回一个动作。)
In my case the agent is only allowed to choose 1 actuator and turn it from 0-100. (在我的情况下,座席只能选择1个执行器并将其从0-100旋转。)
What would be the optimal approach to do so? (这样做的最佳方法是什么?)
ask by Chaos translate from so 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…