本文整理汇总了Python中mdp.getStates函数的典型用法代码示例。如果您正苦于以下问题:Python getStates函数的具体用法?Python getStates怎么用?Python getStates使用的例子?那么恭喜您, 这里精选的函数代码示例或许可以为您提供帮助。
在下文中一共展示了getStates函数的20个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于我们的系统推荐出更棒的Python代码示例。
示例1: __init__
def __init__(self, mdp, discount = 0.9, iterations = 100):
"""
Your value iteration agent should take an mdp on
construction, run the indicated number of iterations
and then act according to the resulting policy.
Some useful mdp methods you will use:
mdp.getStates()
mdp.getPossibleActions(state)
mdp.getTransitionStatesAndProbs(state, action)
mdp.getReward(state, action, nextState)
"""
self.mdp = mdp
self.discount = discount
self.iterations = iterations
self.values = util.Counter() # A Counter is a dict with default 0
"*** YOUR CODE HERE ***"
currentIterationCounter = 1
for state in mdp.getStates():
self.values[state] = mdp.getReward(state, 'Stop', state)
while (currentIterationCounter != self.iterations):
newValues = util.Counter()
for state in mdp.getStates():
tempValues = util.Counter()
for action in mdp.getPossibleActions(state):
for newStateAndProb in mdp.getTransitionStatesAndProbs(state, action):
newState = newStateAndProb[0]
prob = newStateAndProb[1]
tempValues[action] += prob*(mdp.getReward(state, action, newState)+self.discount*self.values[newState])
newValues[state] = tempValues[tempValues.argMax()]
currentIterationCounter += 1
for state in mdp.getStates():
self.values[state] = newValues[state]
开发者ID:StonyBrookUniversity,项目名称:reinforcement,代码行数:34,代码来源:valueIterationAgents.py
示例2: __init__
def __init__(self, mdp, discount = 0.9, iterations = 100):
"""
Your value iteration agent should take an mdp on
construction, run the indicated number of iterations
and then act according to the resulting policy.
Some useful mdp methods you will use:
mdp.getStates()
mdp.getPossibleActions(state)
mdp.getTransitionStatesAndProbs(state, action)
mdp.getReward(state, action, nextState)
mdp.isTerminal(state)
"""
self.mdp = mdp
self.discount = discount
self.iterations = iterations
self.values = util.Counter() # A Counter is a dict with default 0
# Write value iteration code here
"*** YOUR CODE HERE ***"
state = mdp.getStartState()
for i in range(0,iterations):
#print "iteration: ", i
#iterate once through all states and actions, save q-values
for state in mdp.getStates():
for action in mdp.getPossibleActions(state):
#compute qValue for each action
qValue = self.getQValue(state, action)
self.values[(state,action)] = qValue
#after all qValues are computed, iterate againt through states, save value from optimal policy. these values will be V* for next iteration
for state in mdp.getStates():
action = self.getAction(state)
self.values[state] = self.values[(state, action)]
"""
开发者ID:lucasosouza,项目名称:berkeleyAI,代码行数:35,代码来源:valueIterationAgents.py
示例3: __init__
def __init__(self, mdp, discount = 0.9, iterations = 100):
"""
Your value iteration agent should take an mdp on
construction, run the indicated number of iterations
and then act according to the resulting policy.
Some useful mdp methods you will use:
mdp.getStates()
mdp.getPossibleActions(state)
mdp.getTransitionStatesAndProbs(state, action)
mdp.getReward(state, action, nextState)
"""
self.mdp = mdp
self.discount = discount
self.iterations = iterations
self.values = util.Counter() # A Counter is a dict with default 0
for i in range(iterations):
lastValues = copy.deepcopy(self.values)
for s in mdp.getStates():
aCounter = util.Counter()
for a in mdp.getPossibleActions(s):
for s2 in mdp.getStates():
aCounter[a] += self.T(s,a,s2) * (mdp.getReward(s,a,s2) + discount*lastValues[s2])
self.values[s] = aCounter[aCounter.argMax()]
开发者ID:danielrich,项目名称:pacman-rl,代码行数:25,代码来源:valueIterationAgents.py
示例4: __init__
def __init__(self, mdp, discount = 0.9, iterations = 100):
"""
Your value iteration agent should take an mdp on
construction, run the indicated number of iterations
and then act according to the resulting policy.
Some useful mdp methods you will use:
mdp.getStates()
mdp.getPossibleActions(state)
mdp.getTransitionStatesAndProbs(state, action)
mdp.getReward(state, action, nextState)
mdp.isTerminal(state)
"""
self.mdp = mdp
self.discount = discount
self.iterations = iterations
self.values = util.Counter() # A Counter is a dict with default 0
self.newvalues = util.Counter()
# Write value iteration code here
"*** YOUR CODE HERE ***"
iterationsRun = 0
while iterationsRun < iterations:
iterationsRun += 1
for state in mdp.getStates():
self.computeActionFromValues(state)
for state in mdp.getStates():
self.values[state] = self.newvalues[state]
开发者ID:Jwonsever,项目名称:AI,代码行数:29,代码来源:valueIterationAgents.py
示例5: __init__
def __init__(self, mdp, discount = 0.9, iterations = 100):
"""
Your value iteration agent should take an mdp on
construction, run the indicated number of iterations
and then act according to the resulting policy.
Some useful mdp methods you will use:
mdp.getStates()
mdp.getPossibleActions(state)
mdp.getTransitionStatesAndProbs(state, action)
mdp.getReward(state, action, nextState)
mdp.isTerminal(state)
"""
self.mdp = mdp
self.discount = discount
self.iterations = iterations
self.values = util.Counter() # A Counter is a dict with default 0
# Write value iteration code here
# Add all states to the dictionary and initialize their values to 0.
for state in mdp.getStates():
self.values[state] = 0
# Run the evaluation a specified number of times.
for index in range(self.iterations):
# Keep self.values static during an iteration
iterationValues = util.Counter()
for state in mdp.getStates():
QValues = util.Counter()
for action in self.mdp.getPossibleActions(state):
QValues[action] = self.computeQValueFromValues(state, action)
if len(QValues) > 0:
iterationValues[state] = QValues[QValues.sortedKeys()[0]]
# Only update self.values at the end of an iteration
self.values = iterationValues
开发者ID:gscalvary,项目名称:PacMan-QLearning,代码行数:35,代码来源:valueIterationAgents.py
示例6: __init__
def __init__(self, mdp, discount = 0.9, iterations = 100):
"""
Your value iteration agent should take an mdp on
construction, run the indicated number of iterations
and then act according to the resulting policy.
Some useful mdp methods you will use:
mdp.getStates()
mdp.getPossibleActions(state)
mdp.getTransitionStatesAndProbs(state, action)
mdp.getReward(state, action, nextState)
"""
self.mdp = mdp
self.discount = discount
self.iterations = iterations
self.values = util.Counter() # A Counter is a dict with default 0
"*** YOUR CODE HERE ***"
for s in mdp.getStates():
self.values[s] = 0
"for a in mdp.getPossibleActions(s):"
"for ac in mdp.getTransitionStatesAndProbs(s,a):"
" print ac[0]"
"print ac[1]"
"copy_value = self.values.copy()"
"for c in mdp.getStates():"
" print copy_value[c]"
i=0
"self.states = mdp.getStates()"
while i < iterations:
copy_value = self.values.copy()
for s in mdp.getStates():
if not mdp.isTerminal(s):
self.values[s] = mdp.getReward(s,'north',s) + discount * max([sum([copy_value[s1] * p for (s1,p) in mdp.getTransitionStatesAndProbs(s,a)]) for a in mdp.getPossibleActions(s)])
i = i + 1
开发者ID:NivethaThiru,项目名称:Final_project_ml,代码行数:35,代码来源:valueIterationAgents.py
示例7: __init__
def __init__(self, mdp, discount = 0.9, iterations = 100):
"""
Your value iteration agent should take an mdp on
construction, run the indicated number of iterations
and then act according to the resulting policy.
Some useful mdp methods you will use:
mdp.getStates()
mdp.getPossibleActions(state)
mdp.getTransitionStatesAndProbs(state, action)
mdp.getReward(state, action, nextState)
mdp.isTerminal(state)
"""
self.mdp = mdp
self.discount = discount
self.iterations = iterations
self.values = util.Counter() # A Counter is a dict with default 0
# Write value iteration code here
"*** YOUR CODE HERE ***"
self.depth = 1
self.qTable = {}
self.vTable = {}
for state in mdp.getStates():
self.vTable[state] = 0
self.qTable[state] = {}
for action in mdp.getPossibleActions(state):
self.qTable[state][action] = 0
while self.depth < self.iterations + 1:
self.tempTable = {}
for state in mdp.getStates():
self.stateValue = 0
if not mdp.isTerminal(state):
self.stateValue = -9999
for action in mdp.getPossibleActions(state):
self.Qtotal = 0
for nextState,prob in mdp.getTransitionStatesAndProbs(state,action):
self.reward = mdp.getReward(state, action, nextState)
self.Qtotal += prob * (self.reward + self.discount * self.vTable[nextState])
#print "###state:",state,"Next",nextState,"reward:",self.reward,"Qtotal",self.Qtotal,"Value:",self.vTable[nextState]
self.qTable[state][action] = self.Qtotal
#print self.qTable[state][action]
self.stateValue = max(self.stateValue,self.qTable[state][action])
else:
self.tempTable[state] = 0
self.tempTable[state] = self.stateValue
self.vTable = self.tempTable
self.depth += 1
for state in mdp.getStates():
self.stateValue = -9999
for action in mdp.getPossibleActions(state):
self.Qtotal = 0
for nextState,prob in mdp.getTransitionStatesAndProbs(state,action):
self.reward = mdp.getReward(state, action, nextState)
self.Qtotal += prob * (self.reward + self.discount * self.vTable[nextState])
self.qTable[state][action] = self.Qtotal
开发者ID:ChristopherKai,项目名称:ai,代码行数:59,代码来源:valueIterationAgents.py
示例8: __init__
def __init__(self, mdp, discount = 0.9, iterations = 100):
"""
Your value iteration agent should take an mdp on
construction, run the indicated number of iterations
and then act according to the resulting policy.
Some useful mdp methods you will use:
mdp.getStates()
mdp.getPossibleActions(state)
mdp.getTransitionStatesAndProbs(state, action)
mdp.getReward(state, action, nextState)
"""
self.mdp = mdp
self.discount = discount
self.iterations = iterations
self.values = util.Counter() # A Counter is a dict with default 0
self.delta = 0
while(self.iterations > 0):
# self.delta = 0
batchValues = util.Counter()
for state in mdp.getStates():
maxM = -10000
if mdp.isTerminal(state):
continue
for action in mdp.getPossibleActions(state):
statesProbs = mdp.getTransitionStatesAndProbs(state, action)
sumU = 0
Rs = 0
for stateProb in statesProbs:
# if stateProb[0] == 'TERMINAL_STATE':
# continue
sumU = sumU + self.values[stateProb[0]]*stateProb[1]
Rs = Rs + mdp.getReward(state, action, stateProb[0]) * stateProb[1]
# if sumU > maxM:
# maxM = sumU
v = Rs + sumU * discount
if (v > maxM):
maxM = v
batchValues[state] = maxM
self.values = batchValues
self.iterations = self.iterations - 1
self.policy = {}
for state in mdp.getStates():
if mdp.isTerminal(state):
self.policy[state] = None
continue
QValues = []
for action in mdp.getPossibleActions(state):
QValues.append(self.getQValue(state, action))
self.policy[state] = mdp.getPossibleActions(state)[QValues.index(max (QValues))]
开发者ID:zxcsvd,项目名称:CSSE413,代码行数:51,代码来源:valueIterationAgents.py
示例9: __init__
def __init__(self, mdp, discount = 0.9, iterations = 100):
"""
Your value iteration agent should take an mdp on
construction, run the indicated number of iterations
and then act according to the resulting policy.
Some useful mdp methods you will use:
mdp.getStates()
mdp.getPossibleActions(state)
mdp.getTransitionStatesAndProbs(state, action)
mdp.getReward(state, action, nextState)
mdp.isTerminal(state)
"""
self.mdp = mdp
self.discount = discount
self.iterations = iterations
self.values = util.Counter() # A Counter is a dict with default 0
# Write value iteration code here
"*** YOUR CODE HERE ***"
# fill every state with some action.
self.actions = dict()
for state in mdp.getStates():
stateActions = mdp.getPossibleActions(state)
if len(stateActions) > 0:
action = stateActions[0]
self.actions[state] = action
for i in xrange(iterations):
# make a copy of all the values.
# this copy will get modified in the for-loop,
# and at the end of the loop,
# the new values will become then real values.
nextValues = self.values.copy()
# for every state, and if it isn't a terminal state
# (you can't do any action on a terminal state):
for state in mdp.getStates():
if not mdp.isTerminal(state):
# get the best action.
action = self.computeActionFromValues(state)
self.actions[state] = action
# get the value for doing the currently stored action.
nextValues[state] = self.computeQValueFromValues(state, action)
# copy the new values over the old values.
self.values.update(nextValues)
开发者ID:Pava1n3,项目名称:learntoplay,代码行数:49,代码来源:valueIterationAgents.py
示例10: __init__
def __init__(self, mdp, discount = 0.9, iterations = 100):
"""
Your value iteration agent should take an mdp on
construction, run the indicated number of iterations
and then act according to the resulting policy.
Some useful mdp methods you will use:
mdp.getStates()
mdp.getPossibleActions(state)
mdp.getTransitionStatesAndProbs(state, action)
mdp.getReward(state, action, nextState)
"""
self.mdp = mdp
self.discount = discount
self.iterations = iterations
self.values = util.Counter() # A Counter is a dict with default 0
"*** YOUR CODE HERE ***"
"""
i=1
for state in mdp.getStates():
print "state ", i, ": ", state
print "possible action: ", mdp.getPossibleActions(state)
i+=1
"""
self.policy = util.Counter()
self.nextStateValue = util.Counter()
states = mdp.getStates()
for state in mdp.getStates():
self.values[state] = 0
i=0
while i < self.iterations:
self.currentStateValue = self.values.copy()
for state in states:
actions = mdp.getPossibleActions(state)
max_qvalue = -99999999
take_action = None
for action in actions:
qvalue = self.getQValue(state, action)
if qvalue > max_qvalue:
max_qvalue = qvalue
take_action = action
if max_qvalue != -99999999:
self.values[state] = max_qvalue
self.policy[state] = take_action
i+=1
开发者ID:PMX10,项目名称:projects,代码行数:48,代码来源:valueIterationAgents.py
示例11: __init__
def __init__(self, mdp, discount = 0.9, iterations = 100):
"""
Your value iteration agent should take an mdp on
construction, run the indicated number of iterations
and then act according to the resulting policy.
Some useful mdp methods you will use:
mdp.getStates()
mdp.getPossibleActions(state)
mdp.getTransitionStatesAndProbs(state, action)
mdp.getReward(state, action, nextState)
"""
self.mdp = mdp
self.discount = discount
self.iterations = iterations
self.values = util.Counter() # A Counter is a dict with default 0
"*** YOUR CODE HERE ***"
# OUR CODE HERE
#Note: I think we should use the util.Counter thing?
for times in range(0, iterations):
#values from previous iteration so we don't update over them while iterating
prevVals = self.values.copy()
#iterate through all states
for state in mdp.getStates():
#will store the action-value for the iteration
value = util.Counter()
for action in mdp.getPossibleActions(state):
for transitionState, probability in mdp.getTransitionStatesAndProbs(state, action):
#expected value, probability * reward for the state with the discount * reward
value[action] += probability * (mdp.getReward( state, action, transitionState) + discount * prevVals[transitionState])
#update the values to the new value from the iteration
#the .argMax() function returns the one with the largest value
self.values[state] = value[value.argMax()]
开发者ID:lyeechong,项目名称:ai,代码行数:34,代码来源:valueIterationAgents.py
示例12: __init__
def __init__(self, mdp, discount = 0.9, iterations = 100):
"""
Your value iteration agent should take an mdp on
construction, run the indicated number of iterations
and then act according to the resulting policy.
Some useful mdp methods you will use:
mdp.getStates()
mdp.getPossibleActions(state)
mdp.getTransitionStatesAndProbs(state, action)
mdp.getReward(state, action, nextState)
"""
self.mdp = mdp
self.discount = discount
self.iterations = iterations
self.values = util.Counter() # A Counter is a dict with default 0
self.tmpValues = util.Counter();
iterationsCompleted = 0
startState = mdp.getStartState();
while (iterationsCompleted < iterations):
for state in mdp.getStates():
self.computeValue(mdp,state,discount)
for key in self.tmpValues:
self.values[key] = self.tmpValues[key]
iterationsCompleted += 1
开发者ID:yifeng96,项目名称:188searchproject,代码行数:25,代码来源:valueIterationAgents.py
示例13: __init__
def __init__(self, mdp, discount=0.9, iterations=100):
"""
Your value iteration agent should take an mdp on
construction, run the indicated number of iterations
and then act according to the resulting policy.
Some useful mdp methods you will use:
mdp.getStates()
mdp.getPossibleActions(state)
mdp.getTransitionStatesAndProbs(state, action)
mdp.getReward(state, action, nextState)
mdp.isTerminal(state)
"""
self.mdp = mdp
self.discount = discount
self.iterations = iterations
self.values = util.Counter() # A Counter is a dict with default 0
# Write value iteration code here
"*** YOUR CODE HERE ***"
self.optimalActionInState = collections.defaultdict(None)
for k in range(iterations):
lastValues = self.values.copy()
for state in mdp.getStates():
if self.mdp.isTerminal(state):
continue
maxValue = float("-inf") if mdp.getPossibleActions(state) else 0
for action in mdp.getPossibleActions(state):
theSum = 0
for nextState, prob in self.mdp.getTransitionStatesAndProbs(state, action):
R = self.mdp.getReward(state, action, nextState)
theSum += prob * (R + self.discount * lastValues[nextState])
maxValue = max(maxValue,theSum)
self.values[state] = maxValue
开发者ID:CatcherGG,项目名称:Small-Projects,代码行数:34,代码来源:valueIterationAgents.py
示例14: __init__
def __init__(self, mdp, discount = 0.9, iterations = 100):
"""
Your value iteration agent should take an mdp on
construction, run the indicated number of iterations
and then act according to the resulting policy.
Some useful mdp methods you will use:
mdp.getStates()
mdp.getPossibleActions(state)
mdp.getTransitionStatesAndProbs(state, action)
mdp.getReward(state, action, nextState)
mdp.isTerminal(state)
"""
self.mdp = mdp
self.discount = discount
self.iterations = iterations
self.values = util.Counter() # A Counter is a dict with default 0
states = mdp.getStates()
for i in range(0,self.iterations):
V = util.Counter()
for state in states:
action = self.computeActionFromValues(state)
if action is None:
V[state] = 0
else:
V[state] = self.computeQValueFromValues(state,action)
self.values = V
开发者ID:Andy-Au,项目名称:artificial-intelligence,代码行数:28,代码来源:valueIterationAgents.py
示例15: __init__
def __init__(self, mdp, discount = 0.9, iterations = 100):
"""
Your value iteration agent should take an mdp on
construction, run the indicated number of iterations
and then act according to the resulting policy.
Some useful mdp methods you will use:
mdp.getStates()
mdp.getPossibleActions(state)
mdp.getTransitionStatesAndProbs(state, action)
mdp.getReward(state, action, nextState)
mdp.isTerminal(state)
"""
self.mdp = mdp
self.discount = discount
self.iterations = iterations
self.values = util.Counter() # A Counter is a dict with default 0
# Write value iteration code here
"*** YOUR CODE HERE ***"
for i in range(iterations):
valuesNew = util.Counter()
for state in mdp.getStates():
maxVal = -1
if not mdp.isTerminal(state):
vals = util.Counter()
for possact in mdp.getPossibleActions(state):
#value = self.computeQValueFromValues(state, possact)
#if value > maxVal:
# maxVal = value
vals[possact] = self.computeQValueFromValues(state, possact)
#valuesNew[state] = maxVal
valuesNew[state] = max(vals.values())
for st2 in valuesNew:
self.values[st2] = valuesNew[st2]
开发者ID:erikm0111,项目名称:AIclass,代码行数:35,代码来源:valueIterationAgents.py
示例16: __init__
def __init__(self, mdp, discount = 0.9, iterations = 100):
"""
Your value iteration agent should take an mdp on
construction, run the indicated number of iterations
and then act according to the resulting policy.
Some useful mdp methods you will use:
mdp.getStates()
mdp.getPossibleActions(state)
mdp.getTransitionStatesAndProbs(state, action)
mdp.getReward(state, action, nextState)
mdp.isTerminal(state)
"""
self.mdp = mdp
self.discount = discount
self.iterations = iterations
self.values = util.Counter()
for i in range(iterations): # running the alg on the indicated number of iterations
y = self.values.copy() #V sub k-1
for state in mdp.getStates():
actions = util.Counter()
if mdp.isTerminal(state) == False:
for possibleActions in mdp.getPossibleActions(state):
for transitionState, prob in mdp.getTransitionStatesAndProbs(state, possibleActions):
value_iteration = prob * (mdp.getReward(state, possibleActions, transitionState) + (discount* y[transitionState]))
actions[possibleActions] += value_iteration
self.values[state] = actions[actions.argMax()]
开发者ID:opalkale,项目名称:pacman-reinforcementlearning,代码行数:31,代码来源:valueIterationAgents.py
示例17: __init__
def __init__(self, mdp, discount = 0.9, iterations = 100):
"""
Your value iteration agent should take an mdp on
construction, run the indicated number of iterations
and then act according to the resulting policy.
Some useful mdp methods you will use:
mdp.getStates()
mdp.getPossibleActions(state)
mdp.getTransitionStatesAndProbs(state, action)
mdp.getReward(state, action, nextState)
mdp.isTerminal(state)
"""
self.mdp = mdp
self.discount = discount
self.iterations = iterations
self.values = util.Counter() # A Counter is a dict with default 0
mdpStates = mdp.getStates()
for iteration in xrange(iterations):
newValues = util.Counter()
for state in mdpStates:
if self.mdp.isTerminal(state):
continue
actionValues = -sys.maxint - 1
for action in mdp.getPossibleActions(state):
sum = 0
for transitionState, prob in mdp.getTransitionStatesAndProbs(state, action):
sum += prob*(mdp.getReward(state, action, transitionState) + discount * self.values[transitionState])
if sum > actionValues:
actionValues = sum
newValues[state] = actionValues
self.values = newValues
开发者ID:yami280,项目名称:ReinforcementAgent,代码行数:33,代码来源:valueIterationAgents.py
示例18: __init__
def __init__(self, mdp, discount = 0.9, iterations = 100):
"""
Your value iteration agent should take an mdp on
construction, run the indicated number of iterations
and then act according to the resulting policy.
Some useful mdp methods you will use:
mdp.getStates()
mdp.getPossibleActions(state)
mdp.getTransitionStatesAndProbs(state, action)
mdp.getReward(state, action, nextState)
mdp.isTerminal(state)
"""
self.mdp = mdp
self.discount = discount
self.iterations = iterations
self.values = util.Counter() # A Counter is a dict with default 0
# Write value iteration code here
"*** YOUR CODE HERE ***"
states = mdp.getStates()
for k in range(iterations):
newValues = {}
for state in states:
actions = mdp.getPossibleActions(state)
v = util.Counter()
for action in actions:
v[action] = self.computeQValueFromValues(state, action)
newValues[state] = v[v.argMax()]
self.values = newValues
开发者ID:jmanalus,项目名称:CS-188,代码行数:30,代码来源:valueIterationAgents.py
示例19: __init__
def __init__(self, mdp, discount = 0.9, iterations = 100):
"""
Your value iteration agent should take an mdp on
construction, run the indicated number of iterations
and then act according to the resulting policy.
Some useful mdp methods you will use:
mdp.getStates()
mdp.getPossibleActions(state)
mdp.getTransitionStatesAndProbs(state, action)
mdp.getReward(state, action, nextState)
"""
self.mdp = mdp
self.discount = discount
self.iterations = iterations
self.values = util.Counter() # A Counter is a dict with default 0
"*** YOUR CODE HERE ***"
for n in range(iterations):
V = self.values.copy()
for s in mdp.getStates():
action_values = []
for a in mdp.getPossibleActions(s):
action_value = 0
for s_, P in mdp.getTransitionStatesAndProbs(s, a):
action_value += P * (mdp.getReward(s, a, s_) + discount * V[s_])
action_values.append(action_value)
self.values[s] = max(action_values or [0])
开发者ID:HaiYiMao,项目名称:cs188,代码行数:28,代码来源:valueIterationAgents.py
示例20: __init__
def __init__(self, mdp, discount = 0.9, iterations = 100):
"""
Your value iteration agent should take an mdp on
construction, run the indicated number of iterations
and then act according to the resulting policy.
Some useful mdp methods you will use:
mdp.getStates()
mdp.getPossibleActions(state)
mdp.getTransitionStatesAndProbs(state, action)
mdp.getReward(state, action, nextState)
mdp.isTerminal(state)
"""
self.mdp = mdp
self.discount = discount
self.iterations = iterations
self.values = util.Counter() # A Counter is a dict with default 0
self.ValuesDup = util.Counter()
# Write value iteration code here
"*** YOUR CODE HERE ***"
iterations = self.iterations
while(iterations >0):
for astate in mdp.getStates():
if mdp.isTerminal(astate)==0:
QVallist=[]
for action in mdp.getPossibleActions(astate):
QVallist += [self.computeQValueFromValues(astate, action)]
self.values[astate]=max(QVallist)
for states,value in self.values.items():
self.ValuesDup[states] = self.values[states]
iterations+=-1
开发者ID:boxgf,项目名称:transfer-learning,代码行数:32,代码来源:valueIterationAgents.py
注:本文中的mdp.getStates函数示例由纯净天空整理自Github/MSDocs等源码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。 |
请发表评论