强化学习原理二 BasicConcepts
- 状态,State
- 状态空间,State Space
- 行动,Action
- 状态转换,state transition
- 策略,Policy
- 用数组或者矩阵表示这样一个策略
- 奖励,Reward
- 不确定的话,表格就不适用了。这个时候就要用数学来表示:p(r=-1|s1,a1)=1 and p(r!=-1|s1,a1)=0
- 轨迹,Trajectory A Trajectory is a state-action-reward chain.
- 返回,return 可以用来评估一个策略好还是坏
- discount rate
- discounted return
- Episode
- terminal states
- MDP Markov Decision Process
- Sets
- State
- Action
- Reward
- Policy
- Probablity distribute
- Markov property
- Sets