1. 什么是强化学习?
- 智能体(Agent):做出决策的实体。
- 环境(Environment):智能体与之交互的外界。
- 状态(State, S):环境在某一时刻的描述。
- 动作(Action, A):智能体在某一状态下可以采取的行为。
- 奖励(Reward, R):智能体在某一状态下采取某一动作后获得的反馈。
- 策略(Policy, π):智能体在各个状态下采取的动作规则。
2. 强化学习的基本框架
在强化学习中,智能体通过不断尝试和错误(Trial and Error),学习到一个最优策略来最大化其累积奖励。这个过程通常通过以下几个步骤实现:
- 初始化策略。
- 观测当前状态。
- 选择一个动作并执行。
- 接收奖励并观测新的状态。
- 更新策略。
3. 深度强化学习
深度强化学习结合了深度学习和强化学习,Deep Reinforcement Learning, DRL)是一门结合了深度学习和强化学习的学科,使用深度神经网络来近似策略函数或价值函数,使得智能体能够在高维和复杂的环境中进行学习。
- 深度Q网络(DQN):使用深度神经网络来近似Q值函数的算法。
- 策略梯度(Policy Gradient)方法:直接优化策略函数的方法。
- Actor-Critic方法:结合策略梯度和价值函数的方法。
- 初始化经验回放池(Replay Buffer)。
- 初始化Q网络和目标Q网络(Target Q-Network)。
- 从环境中获取当前状态( s )。
- 根据当前策略选择一个动作( a )。
- 执行动作,获取新的状态( s' )和奖励( r )。
- 将经验( (s, a, r, s') )存储到经验回放池中。
- 从经验回放池中随机抽取一个小批量经验进行训练。
- 使用小批量经验更新Q网络。
- 每隔一定步数更新目标Q网络。
3. 实践练习
- Python:主要编程语言。
- TensorFlow或PyTorch:深度学习框架。
- OpenAI Gym:提供强化学习环境的库。
conda install gym
conda install pytorch
先看一下最终的结果。下面是训练了 400 多轮后的结果,性能越好,其维持平衡的时间越长,即输出的视频时间也会越长。
下面这段代码实现了一个基于深度Q网络(Deep Q-Network, DQN)的强化学习智能体,用于解决CartPole-v1环境中的平衡杆问题。以下是代码的主要部分和功能介绍:
1. 导入库
import gym
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from collections import deque
import random
2. 定义Q网络
class QNetwork(nn.Module):
def __init__(self, state_size, action_size):
super(QNetwork, self).__init__()
self.fc1 = nn.Linear(state_size, 24)
self.fc2 = nn.Linear(24, 24)
self.fc3 = nn.Linear(24, action_size)
def forward(self, x):
x = torch.relu(self.fc1(x))
x = torch.relu(self.fc2(x))
x = self.fc3(x)
return x
3. 定义DQN智能体
class DQNAgent:
def __init__(self, state_size, action_size):
self.state_size = state_size
self.action_size = action_size
self.memory = deque(maxlen=2000)
self.gamma = 0.95
self.epsilon = 1.0
self.epsilon_decay = 0.995
self.epsilon_min = 0.01
self.learning_rate = 0.001
self.batch_size = 64
self.model = QNetwork(state_size, action_size)
self.target_model = QNetwork(state_size, action_size)
self.optimizer = optim.Adam(self.model.parameters(), lr=self.learning_rate)
4. 更新目标网络
def update_target_model(self):
5. 记忆存储
def remember(self, state, action, reward, next_state, done):
self.memory.append((state, action, reward, next_state, done))
6. 动作选择
def act(self, state):
if np.random.rand() <= self.epsilon:
return random.randrange(self.action_size)
state = torch.FloatTensor(state)
with torch.no_grad():
act_values = self.model(state)
return np.argmax(act_values.numpy())
7. 经验回放和训练
def replay(self):
if len(self.memory) < self.batch_size:
minibatch = random.sample(self.memory, self.batch_size)
for state, action, reward, next_state, done in minibatch:
state = torch.FloatTensor(state)
next_state = torch.FloatFloat(next_state)
target = reward
if not done:
target = reward + self.gamma * torch.max(self.target_model(next_state)).item()
target_f = self.model(state)
target_f = target_f.clone().detach() # 确保 target_f 可以被修改
target_f[0][action] = target
loss = nn.MSELoss()(self.model(state), target_f)
if self.epsilon > self.epsilon_min:
self.epsilon *= self.epsilon_decay
8. 创建环境并训练智能体
env = gym.make('CartPole-v1')
state_size = env.observation_space.shape[0]
action_size = env.action_space.n
agent = DQNAgent(state_size, action_size)
n_episodes = 1000
for e in range(n_episodes):
state = env.reset()[0] # 提取实际状态
state = np.reshape(state, [1, state_size])
for time in range(500):
action = agent.act(state)
result = env.step(action)
if len(result) == 5:
next_state, reward, done, _, _ = result
next_state, reward, done, _ = result
reward = reward if not done else -10
next_state = np.reshape(next_state, [1, state_size])
agent.remember(state, action, reward, next_state, done)
state = next_state
if done:
print(f"episode: {e}/{n_episodes}, score: {time}, e: {agent.epsilon:.2f}")
# 保存模型
# 加载模型进行评估
# 使用RecordVideo包装环境
video_folder = './video'
env = gym.make('CartPole-v1', render_mode='rgb_array') # 确保环境支持rgb_array模式
env = gym.wrappers.RecordVideo(env, video_folder, episode_trigger=lambda episode_id: True)
# env = gym.wrappers.RecordVideo(env, video_folder, episode_trigger=lambda x: x == 0)
# env = gym.wrappers.RecordVideo(env, video_folder)
def show_video():
video_files = [f for f in os.listdir(video_folder) if f.endswith('.mp4')]
if video_files:
video_path = os.path.join(video_folder, video_files[0])
video_file = open(video_path, "r+b").read()
encoded = base64.b64encode(video_file)
<video width="400" height="300" controls>
<source src="data:video/mp4;base64,{0}" type="video/mp4" />
state = env.reset()[0]
state = np.reshape(state, [1, state_size])
total_reward = 0
for time in range(500):
action = agent.act(state)
# next_state, reward, done, _ = env.step(action)
result = env.step(action)
if len(result) == 5:
next_state, reward, done, _, _ = result
next_state, reward, done, _ = result
total_reward += reward
next_state = np.reshape(next_state, [1, state_size])
state = next_state
if done:
print(f"Total reward: {total_reward}")
# 关闭环境
4. 下一步提高和扩展
- 双深度Q网络(Double DQN, DDQN)。
- 优势演员-评论家(Advantage Actor-Critic, A2C)。
- 深度确定性策略梯度(Deep Deterministic Policy Gradient, DDPG)。
Was just browsing the site and was impressed the layout. Nicely design and great user experience. Just had to drop a message, have a great day! 8dfds87a
Hi. We run a YouTube growth service, which increases your number of subscribers both safety and practically.
- We guarantee to gain you new 700+ subscribers per month
- People subscribe because they are interested in your videos/channel, increasing video likes, comments and interaction.
- All actions are made manually by our team. We do not use any bots.
The price is just $60 (USD) per month, and we can start immediately. If you are interested and would like to see some of our previous work, let me know and we can discuss further.
Kind Regards,
To Unsubscribe, reply with the word unsubscribe in the subject.
It's a pity you don't have a donate button! I'd without a doubt donate to this fantastic blog! I suppose for now i'll settle for bookmarking and adding your RSS feed to my Google account. I look forward to brand new updates and will share this blog with my Facebook group. Chat soon!
I just visited yuyi.monster and wondered if you'd ever thought about having an engaging video to explain what you do?
Our videos cost just $195 for a 30 second video ($239 for 60 seconds) and include a full script, voice-over and video.
I can show you some previous videos we've done if you want me to send some over. Let me know if you're interested in seeing samples of our previous work.
Unsubscribe: https://removeme.live/unsubscribe.php?d=yuyi.monster
I just visited yuyi.monster and wondered if you'd ever thought about having an engaging video to explain what you do?
Our videos cost just $195 for a 30 second video ($239 for 60 seconds) and include a full script, voice-over and video.
I can show you some previous videos we've done if you want me to send some over. Let me know if you're interested in seeing samples of our previous work. If you are not interested, just use the link at the bottom.
Unsubscribe: https://removeme.live/unsubscribe.php?d=yuyi.monster
Hi there,
We run a YouTube growth service, which increases your number of subscribers both safely and practically.
- We guarantee to gain you 700-1500+ subscribers per month.
- People subscribe because they are interested in your channel/videos, increasing likes, comments and interaction.
- All actions are made manually by our team. We do not use any 'bots'.
The price is just $60 (USD) per month, and we can start immediately.
If you have any questions, let me know, and we can discuss further.
Kind Regards,
Hi there,
We run a YouTube growth service, which increases your number of subscribers both safely and practically.
- We guarantee to gain you 700-1500+ subscribers per month.
- People subscribe because they are interested in your channel/videos, increasing likes, comments and interaction.
- All actions are made manually by our team. We do not use any 'bots'.
The price is just $60 (USD) per month, and we can start immediately.
If you have any questions, let me know, and we can discuss further.
Kind Regards,
Affiliate marketers, this one’s for you: Social media blocks your ads? We won’t block your message. With our service, your ads go directly to website owners. One flat rate, no per-click charges, and guaranteed delivery.
Feel free to reach out if you’d like to dive deeper into how I do this; my contact info is below.
Lance Earls
Email: Lance.Earls@morebiz.my
Website: http://zc3kbq.contactblastingworks.my
Skype: https://join.skype.com/invite/bON5aDdyKhPt
Hi there,
We run a TikTok growth service, which increases your number of followers both safely and practically.
- We guarantee to gain you 700-1500+ followers per month.
- People follow because they are interested in your profile, increasing likes, comments and interaction.
- All actions are made manually by our team. We do not use any 'bots'.
The price is just $60 (USD) per month, and we can start immediately.
If you have any questions, let me know, and we can discuss further.
Kind Regards,
Hi there,
We run a TikTok growth service, which increases your number of followers both safely and practically.
- We guarantee to gain you 700-1500+ followers per month.
- People follow because they are interested in your profile, increasing likes, comments and interaction.
- All actions are made manually by our team. We do not use any 'bots'.
The price is just $60 (USD) per month, and we can start immediately.
If you have any questions, let me know, and we can discuss further.
Kind Regards,
I can blast your ad text to millions of website contact forms. One flat rate, never any per click costs. People will read your message just like you're reading this one that I just sent through your website contact form now.
Interested? Get in touch via the contact details below.
Ralf Renard
Email: Ralf.Renard@morebiz.my
Website: http://z8bqvu.contactblastingworks.my
Skype: https://join.skype.com/invite/bON5aDdyKhPt
Hi there,
We’re excited to introduce Mintsuite, the ultimate platform to enhance your online presence and drive results. Mintsuite empowers you to create stunning websites, manage social media like a pro, and generate traffic effortlessly.
Create Stunning Websites
Manage Social Media Effortlessly
Generate Unlimited Traffic
Grab Mintsuite now for just $16 (normally $180)!
Check out the amazing features of Mintsuite here: https://furtherinfo.info/mint
Thanks for your time,
Hi there,
We run a Youtube growth service, where we can increase your subscriber count safely and practically.
- Guaranteed: We guarantee to gain you 700-1500 new subscribers each month.
- Real, human subscribers who subscribe because they are interested in your channel/videos.
- Safe: All actions are done, without using any automated tasks / bots.
Our price is just $60 (USD) per month and we can start immediately.
If you are interested then we can discuss further.
Kind Regards,
I just visited yuyi.monster and wondered if you'd ever thought about having an engaging video to explain what you do?
Our prices start from just $195.
Let me know if you're interested in seeing samples of our previous work.
Unsubscribe: https://removeme.live/unsubscribe.php?d=yuyi.monster