本文整理汇总了Java中burlap.behavior.policy.Policy类的典型用法代码示例。如果您正苦于以下问题:Java Policy类的具体用法?Java Policy怎么用?Java Policy使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。
Policy类属于burlap.behavior.policy包,在下文中一共展示了Policy类的20个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于我们的系统推荐出更棒的Java代码示例。
示例1: DeepQLearner
import burlap.behavior.policy.Policy; //导入依赖的package包/类
public DeepQLearner(SADomain domain, double gamma, int replayStartSize, Policy policy, DQN vfa, StateMapping stateMapping) {
super(domain, gamma, vfa, stateMapping);
if (replayStartSize > 0) {
System.out.println(String.format("Starting with random policy for %d frames", replayStartSize));
this.replayStartSize = replayStartSize;
this.trainingPolicy = policy;
setLearningPolicy(new RandomPolicy(domain));
runningRandomPolicy = true;
} else {
setLearningPolicy(policy);
runningRandomPolicy = false;
}
}
开发者ID:h2r,项目名称:burlap_caffe,代码行数:17,代码来源:DeepQLearner.java
示例2: IPSS
import burlap.behavior.policy.Policy; //导入依赖的package包/类
public static void IPSS(){
InvertedPendulum ip = new InvertedPendulum();
ip.physParams.actionNoise = 0.;
Domain domain = ip.generateDomain();
RewardFunction rf = new InvertedPendulum.InvertedPendulumRewardFunction(Math.PI/8.);
TerminalFunction tf = new InvertedPendulum.InvertedPendulumTerminalFunction(Math.PI/8.);
State initialState = InvertedPendulum.getInitialState(domain);
SparseSampling ss = new SparseSampling(domain, rf, tf, 1, new SimpleHashableStateFactory(), 10 ,1);
ss.setForgetPreviousPlanResults(true);
ss.toggleDebugPrinting(false);
Policy p = new GreedyQPolicy(ss);
EpisodeAnalysis ea = p.evaluateBehavior(initialState, rf, tf, 500);
System.out.println("Num steps: " + ea.maxTimeStep());
Visualizer v = InvertedPendulumVisualizer.getInvertedPendulumVisualizer();
new EpisodeSequenceVisualizer(v, domain, Arrays.asList(ea));
}
开发者ID:f-leno,项目名称:DOO-Q_BRACIS2016,代码行数:21,代码来源:ContinuousDomainTutorial.java
示例3: oneStep
import burlap.behavior.policy.Policy; //导入依赖的package包/类
/**
* Performs one step of execution of the option. This method assumes that the {@link #initiateInState(burlap.oomdp.core.states.State, burlap.oomdp.singleagent.GroundedAction)}
* method was called previously for the state in which this option was initiated.
* @param s the state in which a single step of the option is to be taken.
* @param groundedAction the parameters in which this option was initiated
* @return the resulting state from a single step of the option being performed.
*/
public State oneStep(State s, GroundedAction groundedAction){
GroundedAction ga = this.oneStepActionSelection(s, groundedAction);
State sprime = ga.executeIn(s);
lastNumSteps++;
double r = 0.;
if(keepTrackOfReward){
r = rf.reward(s, ga, sprime);
lastCumulativeReward += cumulativeDiscount*r;
cumulativeDiscount *= discountFactor;
}
if(shouldRecordResults){
GroundedAction recordAction = ga;
if(shouldAnnotateExecution){
recordAction = new Policy.GroundedAnnotatedAction(groundedAction.toString() + "(" + (lastNumSteps-1) + ")", ga);
}
lastOptionExecutionResults.recordTransitionTo(recordAction, sprime, r);
}
return sprime;
}
开发者ID:f-leno,项目名称:DOO-Q_BRACIS2016,代码行数:31,代码来源:Option.java
示例4: DeterministicTerminationOption
import burlap.behavior.policy.Policy; //导入依赖的package包/类
/**
* Initializes the option by creating the policy uses some provided option. The valueFunction is called repeatedly on each state in the
* the list <code>seedStatesForPlanning</code> and then
* sets this options policy to the valueFunction derived policy that is provided.
* @param name the name of the option
* @param init the initiation conditions of the option
* @param terminationStates the termination states of the option
* @param seedStatesForPlanning the states that should be used as initial states for the valueFunction
* @param planner the valueFunction that is used to create the policy for this option
* @param p the valueFunction derived policy to use after planning from each initial state is performed.
*/
public DeterministicTerminationOption(String name, StateConditionTest init, StateConditionTest terminationStates, List<State> seedStatesForPlanning,
Planner planner, SolverDerivedPolicy p){
if(!(p instanceof Policy)){
throw new RuntimeErrorException(new Error("PlannerDerivedPolicy p is not an instnace of Policy"));
}
this.name = name;
this.initiationTest = init;
this.terminationStates = terminationStates;
//now construct the policy using the valueFunction from each possible initiation state
for(State si : seedStatesForPlanning){
planner.planFromState(si);
}
p.setSolver(planner);
this.policy = (Policy)p;
}
开发者ID:f-leno,项目名称:DOO-Q_BRACIS2016,代码行数:34,代码来源:DeterministicTerminationOption.java
示例5: getActionDistributionForState
import burlap.behavior.policy.Policy; //导入依赖的package包/类
@Override
public List<ActionProb> getActionDistributionForState(State s) {
if(policy == null){
this.computePolicyFromTree();
}
GroundedAction ga = policy.get(planner.stateHash(s));
if(ga == null){
throw new PolicyUndefinedException();
}
List <ActionProb> res = new ArrayList<Policy.ActionProb>();
res.add(new ActionProb(ga, 1.)); //greedy policy so only need to supply the mapped action
return res;
}
开发者ID:f-leno,项目名称:DOO-Q_BRACIS2016,代码行数:18,代码来源:UCTTreeWalkPolicy.java
示例6: logPolicyGrad
import burlap.behavior.policy.Policy; //导入依赖的package包/类
/**
* Computes and returns the gradient of the Boltzmann policy for the given state and action.
* @param s the state in which the policy is queried
* @param ga the action for which the policy is queried.
* @return s the gradient of the Boltzmann policy for the given state and action.
*/
public double [] logPolicyGrad(State s, GroundedAction ga){
Policy p = new BoltzmannQPolicy((QFunction)this.request.getPlanner(), 1./this.request.getBoltzmannBeta());
double invActProb = 1./p.getProbOfAction(s, ga);
double [] gradient = BoltzmannPolicyGradient.computeBoltzmannPolicyGradient(s, ga, (QGradientPlanner)this.request.getPlanner(), this.request.getBoltzmannBeta());
for(int f = 0; f < gradient.length; f++){
gradient[f] *= invActProb;
}
return gradient;
}
开发者ID:f-leno,项目名称:DOO-Q_BRACIS2016,代码行数:18,代码来源:MLIRL.java
示例7: main
import burlap.behavior.policy.Policy; //导入依赖的package包/类
public static void main(String[] args) {
MountainCar mcGen = new MountainCar();
Domain domain = mcGen.generateDomain();
TerminalFunction tf = new MountainCar.ClassicMCTF();
RewardFunction rf = new GoalBasedRF(tf, 100);
StateGenerator rStateGen = new MCRandomStateGenerator(domain);
SARSCollector collector = new SARSCollector.UniformRandomSARSCollector(domain);
SARSData dataset = collector.collectNInstances(rStateGen, rf, 5000, 20, tf, null);
ConcatenatedObjectFeatureVectorGenerator fvGen = new ConcatenatedObjectFeatureVectorGenerator(true,
MountainCar.CLASSAGENT);
FourierBasis fb = new FourierBasis(fvGen, 4);
LSPI lspi = new LSPI(domain, 0.99, fb, dataset);
Policy p = lspi.runPolicyIteration(30, 1e-6);
Visualizer v = MountainCarVisualizer.getVisualizer(mcGen);
VisualActionObserver vob = new VisualActionObserver(domain, v);
vob.initGUI();
SimulatedEnvironment env = new SimulatedEnvironment(domain, rf, tf,
MountainCar.getCleanState(domain, mcGen.physParams));
EnvironmentServer envServ = new EnvironmentServer(env, vob);
for(int i = 0; i < 100; i++){
p.evaluateBehavior(envServ);
envServ.resetEnvironment();
}
System.out.println("Finished");
}
开发者ID:f-leno,项目名称:DOO-Q_BRACIS2016,代码行数:35,代码来源:MCVideo.java
示例8: logLikelihoodOfTrajectory
import burlap.behavior.policy.Policy; //导入依赖的package包/类
/**
* Computes and returns the log-likelihood of the given trajectory under the current reward function parameters and weights it by the given weight.
* @param ea the trajectory
* @param weight the weight to assign the trajectory
* @return the log-likelihood of the given trajectory under the current reward function parameters and weights it by the given weight.
*/
public double logLikelihoodOfTrajectory(EpisodeAnalysis ea, double weight){
double logLike = 0.;
Policy p = new BoltzmannQPolicy((QFunction)this.request.getPlanner(), 1./this.request.getBoltzmannBeta());
for(int i = 0; i < ea.numTimeSteps()-1; i++){
this.request.getPlanner().planFromState(ea.getState(i));
double actProb = p.getProbOfAction(ea.getState(i), ea.getAction(i));
logLike += Math.log(actProb);
}
logLike *= weight;
return logLike;
}
开发者ID:f-leno,项目名称:DOO-Q_BRACIS2016,代码行数:18,代码来源:MLIRL.java
示例9: learnPolicy
import burlap.behavior.policy.Policy; //导入依赖的package包/类
@Override
public Policy learnPolicy(SADomain domain, List<Episode> episodes, int numberOfStates, int numberOfSamplesToUse) {
//create reward function features to use
LocationFeatures features = new LocationFeatures(numberOfStates);
//create a reward function that is linear with respect to those features and has small random
//parameter values to start
LinearStateDifferentiableRF rf = new LinearStateDifferentiableRF(features, numberOfStates);
for (int i = 0; i < rf.numParameters() - 1; i++) {
rf.setParameter(i, RandomFactory.getMapped(0).nextDouble() * 0.2 - 0.1);
}
//set last "dummy state" to large negative number as we do not want to go there
rf.setParameter(rf.numParameters() - 1, MLIRLWithGuard.minReward);
//use either DifferentiableVI or DifferentiableSparseSampling for planning. The latter enables receding horizon IRL,
//but you will probably want to use a fairly large horizon for this kind of reward function.
HashableStateFactory hashingFactory = new SimpleHashableStateFactory();
// DifferentiableVI dplanner = new DifferentiableVI(domain, rf, 0.99, beta, hashingFactory, 0.01, 100);
DifferentiableSparseSampling dplanner = new DifferentiableSparseSampling(domain, rf, 0.99, hashingFactory, (int) Math.sqrt(numberOfStates), numberOfSamplesToUse, beta);
dplanner.toggleDebugPrinting(doNotPrintDebug);
//define the IRL problem
MLIRLRequest request = new MLIRLRequest(domain, dplanner, episodes, rf);
request.setBoltzmannBeta(beta);
//run MLIRL on it
MLIRL irl = new MLIRLWithGuard(request, 0.1, 0.1, steps);
irl.performIRL();
return new GreedyQPolicy((QProvider) request.getPlanner());
}
开发者ID:honzaMaly,项目名称:kusanagi,代码行数:34,代码来源:PolicyLearningServiceImpl.java
示例10: getActionDistributionForState
import burlap.behavior.policy.Policy; //导入依赖的package包/类
@Override
public List<ActionProb> getActionDistributionForState(State s) {
GroundedAction selectedAction = (GroundedAction)this.getAction(s);
if(selectedAction == null){
throw new PolicyUndefinedException();
}
List <ActionProb> res = new ArrayList<Policy.ActionProb>();
ActionProb ap = new ActionProb(selectedAction, 1.);
res.add(ap);
return res;
}
开发者ID:f-leno,项目名称:DOO-Q_BRACIS2016,代码行数:12,代码来源:SDPlannerPolicy.java
示例11: getActionDistributionForState
import burlap.behavior.policy.Policy; //导入依赖的package包/类
@Override
public List<ActionProb> getActionDistributionForState(State s) {
GroundedAction selectedAction = (GroundedAction)this.getAction(s);
List <ActionProb> res = new ArrayList<Policy.ActionProb>();
ActionProb ap = new ActionProb(selectedAction, 1.);
res.add(ap);
return res;
}
开发者ID:f-leno,项目名称:DOO-Q_BRACIS2016,代码行数:9,代码来源:DDPlannerPolicy.java
示例12: main
import burlap.behavior.policy.Policy; //导入依赖的package包/类
public static void main(String [] args){
GridWorldDomain gwd = new GridWorldDomain(11, 11);
gwd.setMapToFourRooms();
//only go in intended directon 80% of the time
gwd.setProbSucceedTransitionDynamics(0.8);
Domain domain = gwd.generateDomain();
//get initial state with agent in 0,0
State s = GridWorldDomain.getOneAgentNoLocationState(domain);
GridWorldDomain.setAgent(s, 0, 0);
//all transitions return -1
RewardFunction rf = new UniformCostRF();
//terminate in top right corner
TerminalFunction tf = new GridWorldTerminalFunction(10, 10);
//setup vi with 0.99 discount factor, a value
//function initialization that initializes all states to value 0, and which will
//run for 30 iterations over the state space
VITutorial vi = new VITutorial(domain, rf, tf, 0.99, new SimpleHashableStateFactory(),
new ValueFunctionInitialization.ConstantValueFunctionInitialization(0.0), 30);
//run planning from our initial state
Policy p = vi.planFromState(s);
//evaluate the policy with one roll out visualize the trajectory
EpisodeAnalysis ea = p.evaluateBehavior(s, rf, tf);
Visualizer v = GridWorldVisualizer.getVisualizer(gwd.getMap());
new EpisodeSequenceVisualizer(v, domain, Arrays.asList(ea));
}
开发者ID:f-leno,项目名称:DOO-Q_BRACIS2016,代码行数:37,代码来源:VITutorial.java
示例13: MCLSPIFB
import burlap.behavior.policy.Policy; //导入依赖的package包/类
public static void MCLSPIFB(){
MountainCar mcGen = new MountainCar();
Domain domain = mcGen.generateDomain();
TerminalFunction tf = new MountainCar.ClassicMCTF();
RewardFunction rf = new GoalBasedRF(tf, 100);
StateGenerator rStateGen = new MCRandomStateGenerator(domain);
SARSCollector collector = new SARSCollector.UniformRandomSARSCollector(domain);
SARSData dataset = collector.collectNInstances(rStateGen, rf, 5000, 20, tf, null);
ConcatenatedObjectFeatureVectorGenerator featureVectorGenerator = new ConcatenatedObjectFeatureVectorGenerator(true, MountainCar.CLASSAGENT);
FourierBasis fb = new FourierBasis(featureVectorGenerator, 4);
LSPI lspi = new LSPI(domain, 0.99, fb, dataset);
Policy p = lspi.runPolicyIteration(30, 1e-6);
Visualizer v = MountainCarVisualizer.getVisualizer(mcGen);
VisualActionObserver vob = new VisualActionObserver(domain, v);
vob.initGUI();
SimulatedEnvironment env = new SimulatedEnvironment(domain, rf, tf, MountainCar.getCleanState(domain, mcGen.physParams));
EnvironmentServer envServ = new EnvironmentServer(env, vob);
for(int i = 0; i < 5; i++){
p.evaluateBehavior(envServ);
envServ.resetEnvironment();
}
System.out.println("Finished");
}
开发者ID:f-leno,项目名称:DOO-Q_BRACIS2016,代码行数:34,代码来源:ContinuousDomainTutorial.java
示例14: getPolicyValue
import burlap.behavior.policy.Policy; //导入依赖的package包/类
/**
* Returns the state value under a given policy for a state and {@link QFunction}.
* The value is the expected Q-value under the input policy action distribution. If no actions are permissible in the input state, then zero is returned.
* @param qSource the {@link QFunction} capable of producing Q-values.
* @param s the query {@link burlap.oomdp.core.states.State} for which the value should be returned.
* @param p the policy defining the action distribution.
* @return the expected Q-value under the input policy action distribution
*/
public static double getPolicyValue(QFunction qSource, State s, Policy p){
double expectedValue = 0.;
List <Policy.ActionProb> aps = p.getActionDistributionForState(s);
if(aps.size() == 0){
return 0.;
}
for(Policy.ActionProb ap : aps){
double q = qSource.getQ(s, ap.ga).q;
expectedValue += q * ap.pSelection;
}
return expectedValue;
}
开发者ID:f-leno,项目名称:DOO-Q_BRACIS2016,代码行数:22,代码来源:QFunction.java
示例15: oneStep
import burlap.behavior.policy.Policy; //导入依赖的package包/类
/**
* Performs one step of execution of the option in the provided {@link burlap.oomdp.singleagent.environment.Environment}.
* This method assuems that the {@link #initiateInState(burlap.oomdp.core.states.State, burlap.oomdp.singleagent.GroundedAction)} method
* was called previously for the state in which this option was initiated.
* @param env The {@link burlap.oomdp.singleagent.environment.Environment} in which this option is to be applied
* @param groundedAction the parameters in which this option was initiated
* @return the {@link burlap.oomdp.singleagent.environment.EnvironmentOutcome} of the one step of interaction.
*/
public EnvironmentOutcome oneStep(Environment env, GroundedAction groundedAction){
GroundedAction ga = this.oneStepActionSelection(env.getCurrentObservation(), groundedAction);
EnvironmentOutcome eo = ga.executeIn(env);
if(eo instanceof EnvironmentOptionOutcome){
EnvironmentOptionOutcome eoo = (EnvironmentOptionOutcome)eo;
lastNumSteps += eoo.numSteps;
lastCumulativeReward += cumulativeDiscount*eoo.r;
cumulativeDiscount *= eoo.discount;
}
else{
lastNumSteps++;
lastCumulativeReward += cumulativeDiscount*eo.r;
cumulativeDiscount *= discountFactor;
}
if(shouldRecordResults){
GroundedAction recordAction = ga;
if(shouldAnnotateExecution){
recordAction = new Policy.GroundedAnnotatedAction(groundedAction.toString() + "(" + (lastNumSteps-1) + ")", ga);
}
lastOptionExecutionResults.recordTransitionTo(recordAction, eo.op, eo.r);
}
return eo;
}
开发者ID:f-leno,项目名称:DOO-Q_BRACIS2016,代码行数:36,代码来源:Option.java
示例16: DeterministicTerminationOption
import burlap.behavior.policy.Policy; //导入依赖的package包/类
/**
* Initializes.
* @param name the name of the option
* @param p the option's policy
* @param init the initiation states of the option
* @param terminationStates the deterministic termination states of the option.
*/
public DeterministicTerminationOption(String name, Policy p, StateConditionTest init, StateConditionTest terminationStates){
this.name = name;
this.policy = p;
this.initiationTest = init;
this.terminationStates = terminationStates;
}
开发者ID:f-leno,项目名称:DOO-Q_BRACIS2016,代码行数:15,代码来源:DeterministicTerminationOption.java
示例17: SimpleTester
import burlap.behavior.policy.Policy; //导入依赖的package包/类
public SimpleTester(Policy policy) {
this.policy = policy;
}
开发者ID:h2r,项目名称:burlap_caffe,代码行数:4,代码来源:SimpleTester.java
示例18: DeepQTester
import burlap.behavior.policy.Policy; //导入依赖的package包/类
public DeepQTester(Policy policy, ExperienceMemory memory, StateMapping stateMapping) {
this.policy = policy;
this.memory = memory;
this.stateMapping = stateMapping;
}
开发者ID:h2r,项目名称:burlap_caffe,代码行数:6,代码来源:DeepQTester.java
示例19: modelPlannedPolicy
import burlap.behavior.policy.Policy; //导入依赖的package包/类
@Override
public Policy modelPlannedPolicy() {
return modelPolicy;
}
开发者ID:f-leno,项目名称:DOO-Q_BRACIS2016,代码行数:5,代码来源:VIModelLearningPlanner.java
示例20: MCLSPIRBF
import burlap.behavior.policy.Policy; //导入依赖的package包/类
public static void MCLSPIRBF(){
MountainCar mcGen = new MountainCar();
Domain domain = mcGen.generateDomain();
TerminalFunction tf = new MountainCar.ClassicMCTF();
RewardFunction rf = new GoalBasedRF(tf, 100);
State s = MountainCar.getCleanState(domain, mcGen.physParams);
StateGenerator rStateGen = new MCRandomStateGenerator(domain);
SARSCollector collector = new SARSCollector.UniformRandomSARSCollector(domain);
SARSData dataset = collector.collectNInstances(rStateGen, rf, 5000, 20, tf, null);
RBFFeatureDatabase rbf = new RBFFeatureDatabase(true);
StateGridder gridder = new StateGridder();
gridder.gridEntireDomainSpace(domain, 5);
List<State> griddedStates = gridder.gridInputState(s);
DistanceMetric metric = new EuclideanDistance(
new ConcatenatedObjectFeatureVectorGenerator(true, MountainCar.CLASSAGENT));
for(State g : griddedStates){
rbf.addRBF(new GaussianRBF(g, metric, .2));
}
LSPI lspi = new LSPI(domain, 0.99, rbf, dataset);
Policy p = lspi.runPolicyIteration(30, 1e-6);
Visualizer v = MountainCarVisualizer.getVisualizer(mcGen);
VisualActionObserver vob = new VisualActionObserver(domain, v);
vob.initGUI();
SimulatedEnvironment env = new SimulatedEnvironment(domain, rf, tf, s);
EnvironmentServer envServ = new EnvironmentServer(env, vob);
for(int i = 0; i < 5; i++){
p.evaluateBehavior(envServ);
envServ.resetEnvironment();
}
System.out.println("Finished");
}
开发者ID:f-leno,项目名称:DOO-Q_BRACIS2016,代码行数:43,代码来源:ContinuousDomainTutorial.java
注:本文中的burlap.behavior.policy.Policy类示例整理自Github/MSDocs等源码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。 |
请发表评论