TypechoJoeTheme

鱼一的博客 ◡̈

yuyi

知不可乎骤得,托遗响于悲风
网站页面
标签搜索

强化学习初体验

1. 什么是强化学习?

强化学习是一种机器学习方法,强调通过与环境的交互来学习策略,从而在某种情境下最大化累积奖励。主要包括以下几个概念:

  • 智能体(Agent):做出决策的实体。
  • 环境(Environment):智能体与之交互的外界。
  • 状态(State, S):环境在某一时刻的描述。
  • 动作(Action, A):智能体在某一状态下可以采取的行为。
  • 奖励(Reward, R):智能体在某一状态下采取某一动作后获得的反馈。
  • 策略(Policy, π):智能体在各个状态下采取的动作规则。

2. 强化学习的基本框架

在强化学习中,智能体通过不断尝试和错误(Trial and Error),学习到一个最优策略来最大化其累积奖励。这个过程通常通过以下几个步骤实现:

  1. 初始化策略
  2. 观测当前状态
  3. 选择一个动作并执行
  4. 接收奖励并观测新的状态
  5. 更新策略

3. 深度强化学习

深度强化学习结合了深度学习和强化学习,Deep Reinforcement Learning, DRL)是一门结合了深度学习和强化学习的学科,使用深度神经网络来近似策略函数或价值函数,使得智能体能够在高维和复杂的环境中进行学习。


基本概念和术语

在深度强化学习中,你需要理解以下几个基本概念:

  • 深度Q网络(DQN):使用深度神经网络来近似Q值函数的算法。
  • 策略梯度(Policy Gradient)方法:直接优化策略函数的方法。
  • Actor-Critic方法:结合策略梯度和价值函数的方法。

深度Q网络(DQN)

DQN是深度强化学习中一个基础且经典的算法。我们可以先从DQN开始。

DQN算法的基本步骤

  1. 初始化经验回放池(Replay Buffer)
  2. 初始化Q网络和目标Q网络(Target Q-Network)
  3. 在每个时间步中

    • 从环境中获取当前状态( s )。
    • 根据当前策略选择一个动作( a )。
    • 执行动作,获取新的状态( s' )和奖励( r )。
    • 将经验( (s, a, r, s') )存储到经验回放池中。
    • 从经验回放池中随机抽取一个小批量经验进行训练。
    • 使用小批量经验更新Q网络。
    • 每隔一定步数更新目标Q网络。

3. 实践练习

接下来,我们可以通过实际编程来实现DQN算法。你需要掌握以下工具和库:

  • Python:主要编程语言。
  • TensorFlow或PyTorch:深度学习框架。
  • OpenAI Gym:提供强化学习环境的库。

实现DQN算法

我们可以从一个简单的环境(如CartPole)入手,实现DQN算法的基本版本。首先,我们需要安装必要的库:

环境初始化

conda install gym
conda install pytorch

代码实践

先看一下最终的结果。下面是训练了 400 多轮后的结果,性能越好,其维持平衡的时间越长,即输出的视频时间也会越长。

下面这段代码实现了一个基于深度Q网络(Deep Q-Network, DQN)的强化学习智能体,用于解决CartPole-v1环境中的平衡杆问题。以下是代码的主要部分和功能介绍:

1. 导入库

import gym
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from collections import deque
import random

导入了所需的库,包括用于创建和管理强化学习环境的Gym库,用于数据处理的NumPy库,以及用于构建神经网络和优化的PyTorch库。

2. 定义Q网络

class QNetwork(nn.Module):
    def __init__(self, state_size, action_size):
        super(QNetwork, self).__init__()
        self.fc1 = nn.Linear(state_size, 24)
        self.fc2 = nn.Linear(24, 24)
        self.fc3 = nn.Linear(24, action_size)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Q网络是一个简单的前馈神经网络,有三层全连接层。输入是状态,输出是每个动作的Q值。

3. 定义DQN智能体

class DQNAgent:
    def __init__(self, state_size, action_size):
        self.state_size = state_size
        self.action_size = action_size
        self.memory = deque(maxlen=2000)
        self.gamma = 0.95
        self.epsilon = 1.0
        self.epsilon_decay = 0.995
        self.epsilon_min = 0.01
        self.learning_rate = 0.001
        self.batch_size = 64
        self.model = QNetwork(state_size, action_size)
        self.target_model = QNetwork(state_size, action_size)
        self.update_target_model()
        self.optimizer = optim.Adam(self.model.parameters(), lr=self.learning_rate)

初始化DQN智能体,包括状态和动作的尺寸、经验回放缓冲区、折扣因子、探索率及其衰减、学习率、批量大小,以及Q网络和目标Q网络的构建与优化器的定义。

4. 更新目标网络

    def update_target_model(self):
        self.target_model.load_state_dict(self.model.state_dict())

将当前Q网络的权重复制到目标Q网络中。

5. 记忆存储

    def remember(self, state, action, reward, next_state, done):
        self.memory.append((state, action, reward, next_state, done))

将经验(状态、动作、奖励、下一状态、是否结束)存储到经验回放缓冲区中。

6. 动作选择

    def act(self, state):
        if np.random.rand() <= self.epsilon:
            return random.randrange(self.action_size)
        state = torch.FloatTensor(state)
        with torch.no_grad():
            act_values = self.model(state)
        return np.argmax(act_values.numpy())

根据当前策略选择动作,使用epsilon贪心策略来平衡探索和利用。

7. 经验回放和训练

    def replay(self):
        if len(self.memory) < self.batch_size:
            return
        minibatch = random.sample(self.memory, self.batch_size)
        for state, action, reward, next_state, done in minibatch:
            state = torch.FloatTensor(state)
            next_state = torch.FloatFloat(next_state)
            target = reward
            if not done:
                target = reward + self.gamma * torch.max(self.target_model(next_state)).item()
            target_f = self.model(state)
            target_f = target_f.clone().detach()  # 确保 target_f 可以被修改
            target_f[0][action] = target
            self.optimizer.zero_grad()
            loss = nn.MSELoss()(self.model(state), target_f)
            loss.backward()
            self.optimizer.step()
        if self.epsilon > self.epsilon_min:
            self.epsilon *= self.epsilon_decay

从经验回放缓冲区中采样一个小批量的经验,更新Q值,使用均方误差损失函数,并通过反向传播来更新网络权重。

8. 创建环境并训练智能体

env = gym.make('CartPole-v1')
state_size = env.observation_space.shape[0]
action_size = env.action_space.n

agent = DQNAgent(state_size, action_size)
n_episodes = 1000

for e in range(n_episodes):
    state = env.reset()[0]  # 提取实际状态
    state = np.reshape(state, [1, state_size])
    for time in range(500):
        action = agent.act(state)
        result = env.step(action)
        if len(result) == 5:
            next_state, reward, done, _, _ = result
        else:
            next_state, reward, done, _ = result
        reward = reward if not done else -10
        next_state = np.reshape(next_state, [1, state_size])
        agent.remember(state, action, reward, next_state, done)
        state = next_state
        if done:
            agent.update_target_model()
            print(f"episode: {e}/{n_episodes}, score: {time}, e: {agent.epsilon:.2f}")
            break
        agent.replay()

创建CartPole-v1环境,初始化智能体,并进行训练。在每个episode中,智能体与环境交互,并通过经验回放和目标网络更新来学习最优策略。

模型及任务可视化

# 保存模型
agent.save("dqn_cartpole_model.pth")

# 加载模型进行评估
agent.load("dqn_cartpole_model.pth")
# 使用RecordVideo包装环境
video_folder = './video'
env = gym.make('CartPole-v1', render_mode='rgb_array')  # 确保环境支持rgb_array模式
env = gym.wrappers.RecordVideo(env, video_folder, episode_trigger=lambda episode_id: True)

# env = gym.wrappers.RecordVideo(env, video_folder, episode_trigger=lambda x: x == 0)

# env = gym.wrappers.RecordVideo(env, video_folder)

def show_video():
    video_files = [f for f in os.listdir(video_folder) if f.endswith('.mp4')]
    if video_files:
        video_path = os.path.join(video_folder, video_files[0])
        video_file = open(video_path, "r+b").read()
        encoded = base64.b64encode(video_file)
        display.display(HTML(data='''
            <video width="400" height="300" controls>
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
            </video>'''.format(encoded.decode('ascii'))))

state = env.reset()[0]
state = np.reshape(state, [1, state_size])
total_reward = 0
for time in range(500):
    action = agent.act(state)
    # next_state, reward, done, _ = env.step(action)
    result = env.step(action)
    if len(result) == 5:
        next_state, reward, done, _, _ = result
    else:
        next_state, reward, done, _ = result    
    total_reward += reward
    next_state = np.reshape(next_state, [1, state_size])
    state = next_state
    if done:
        break

print(f"Total reward: {total_reward}")
# 关闭环境
env.close()

show_video()

4. 下一步提高和扩展

在理解并实现了基础的DQN算法后,你可以进一步学习和实现以下内容:

  • 双深度Q网络(Double DQN, DDQN)
  • 优势演员-评论家(Advantage Actor-Critic, A2C)
  • 深度确定性策略梯度(Deep Deterministic Policy Gradient, DDPG)

你可以尝试在更多复杂的环境中应用这些算法,并不断优化和调整你的模型。

赞(0)
版权属于:

鱼一的博客 ◡̈

本文链接:

https://yuyi.monster/archives/256/(转载时请注明本文出处及文章链接)

评论 (14)
  1. Ken Brinson 作者
    Linux · Google Chrome

    Was just browsing the site and was impressed the layout. Nicely design and great user experience. Just had to drop a message, have a great day! 8dfds87a

    2024-07-03 回复
  2. Nathan Travis 作者
    Linux · Google Chrome

    Hi. We run a YouTube growth service, which increases your number of subscribers both safety and practically.

    - We guarantee to gain you new 700+ subscribers per month
    - People subscribe because they are interested in your videos/channel, increasing video likes, comments and interaction.
    - All actions are made manually by our team. We do not use any bots.

    The price is just $60 (USD) per month, and we can start immediately. If you are interested and would like to see some of our previous work, let me know and we can discuss further.

    Kind Regards,

    To Unsubscribe, reply with the word unsubscribe in the subject.

    2024-09-28 回复
  3. Suzanne Kortig 作者
    MacOS · Google Chrome

    It's a pity you don't have a donate button! I'd without a doubt donate to this fantastic blog! I suppose for now i'll settle for bookmarking and adding your RSS feed to my Google account. I look forward to brand new updates and will share this blog with my Facebook group. Chat soon!

    https://cutt.ly/PeIdXJ3I
    rroyucrt48849.wixsite.com/advertisingmoney

    2024-10-04 回复
  4. Joanna Riggs 作者
    Windows 10 · Google Chrome

    Hi,

    I just visited yuyi.monster and wondered if you'd ever thought about having an engaging video to explain what you do?

    Our videos cost just $195 for a 30 second video ($239 for 60 seconds) and include a full script, voice-over and video.

    I can show you some previous videos we've done if you want me to send some over. Let me know if you're interested in seeing samples of our previous work.

    Regards,
    Joanna

    Unsubscribe: https://removeme.live/unsubscribe.php?d=yuyi.monster

    2024-12-09 回复
  5. Joanna Riggs 作者
    Windows 10 · Google Chrome

    Hi,

    I just visited yuyi.monster and wondered if you'd ever thought about having an engaging video to explain what you do?

    Our videos cost just $195 for a 30 second video ($239 for 60 seconds) and include a full script, voice-over and video.

    I can show you some previous videos we've done if you want me to send some over. Let me know if you're interested in seeing samples of our previous work. If you are not interested, just use the link at the bottom.

    Regards,
    Joanna

    Unsubscribe: https://removeme.live/unsubscribe.php?d=yuyi.monster

    2024-12-09 回复
  6. Amelia Brown 作者
    Windows 10 · Google Chrome

    Hi there,

    We run a YouTube growth service, which increases your number of subscribers both safely and practically.

    - We guarantee to gain you 700-1500+ subscribers per month.
    - People subscribe because they are interested in your channel/videos, increasing likes, comments and interaction.
    - All actions are made manually by our team. We do not use any 'bots'.

    The price is just $60 (USD) per month, and we can start immediately.

    If you have any questions, let me know, and we can discuss further.

    Kind Regards,
    Amelia

    2024-12-10 回复
  7. Amelia Brown 作者
    Windows 10 · Google Chrome

    Hi there,

    We run a YouTube growth service, which increases your number of subscribers both safely and practically.

    - We guarantee to gain you 700-1500+ subscribers per month.
    - People subscribe because they are interested in your channel/videos, increasing likes, comments and interaction.
    - All actions are made manually by our team. We do not use any 'bots'.

    The price is just $60 (USD) per month, and we can start immediately.

    If you have any questions, let me know, and we can discuss further.

    Kind Regards,
    Amelia

    2024-12-10 回复
  8. Lance Earls 作者
    Windows 10 · Google Chrome

    Affiliate marketers, this one’s for you: Social media blocks your ads? We won’t block your message. With our service, your ads go directly to website owners. One flat rate, no per-click charges, and guaranteed delivery.

    Feel free to reach out if you’d like to dive deeper into how I do this; my contact info is below.

    Regards,
    Lance Earls
    Email: Lance.Earls@morebiz.my
    Website: http://zc3kbq.contactblastingworks.my
    Skype: https://join.skype.com/invite/bON5aDdyKhPt

    2024-12-12 回复
  9. Gemma Covington 作者
    Windows 10 · Google Chrome

    Hi there,

    We run a TikTok growth service, which increases your number of followers both safely and practically.

    - We guarantee to gain you 700-1500+ followers per month.
    - People follow because they are interested in your profile, increasing likes, comments and interaction.
    - All actions are made manually by our team. We do not use any 'bots'.

    The price is just $60 (USD) per month, and we can start immediately.

    If you have any questions, let me know, and we can discuss further.

    Kind Regards,
    Gemma

    2024-12-13 回复
  10. Gemma Covington 作者
    Windows 10 · FireFox

    Hi there,

    We run a TikTok growth service, which increases your number of followers both safely and practically.

    - We guarantee to gain you 700-1500+ followers per month.
    - People follow because they are interested in your profile, increasing likes, comments and interaction.
    - All actions are made manually by our team. We do not use any 'bots'.

    The price is just $60 (USD) per month, and we can start immediately.

    If you have any questions, let me know, and we can discuss further.

    Kind Regards,
    Gemma

    2024-12-13 回复
  11. Ralf Renard 作者
    Windows 10 · FireFox

    I can blast your ad text to millions of website contact forms. One flat rate, never any per click costs. People will read your message just like you're reading this one that I just sent through your website contact form now.

    Interested? Get in touch via the contact details below.

    Regards,
    Ralf Renard
    Email: Ralf.Renard@morebiz.my
    Website: http://z8bqvu.contactblastingworks.my
    Skype: https://join.skype.com/invite/bON5aDdyKhPt

    2024-12-14 回复
  12. Michelle Marlowe 作者
    Windows 10 · Google Chrome

    Hi there,

    We’re excited to introduce Mintsuite, the ultimate platform to enhance your online presence and drive results. Mintsuite empowers you to create stunning websites, manage social media like a pro, and generate traffic effortlessly.

    Create Stunning Websites
    Manage Social Media Effortlessly
    Generate Unlimited Traffic

    Grab Mintsuite now for just $16 (normally $180)!

    Check out the amazing features of Mintsuite here: https://furtherinfo.info/mint

    Thanks for your time,
    Michelle

    2024-12-15 回复
  13. Amelia Brown 作者
    MacOS · FireFox

    Hi there,

    We run a Youtube growth service, where we can increase your subscriber count safely and practically.

    - Guaranteed: We guarantee to gain you 700-1500 new subscribers each month.
    - Real, human subscribers who subscribe because they are interested in your channel/videos.
    - Safe: All actions are done, without using any automated tasks / bots.

    Our price is just $60 (USD) per month and we can start immediately.

    If you are interested then we can discuss further.

    Kind Regards,
    Amelia

    2024-12-29 回复
  14. Joanna Riggs 作者
    Windows 10 · Google Chrome

    Hi,

    I just visited yuyi.monster and wondered if you'd ever thought about having an engaging video to explain what you do?

    Our prices start from just $195.

    Let me know if you're interested in seeing samples of our previous work.

    Regards,
    Joanna

    Unsubscribe: https://removeme.live/unsubscribe.php?d=yuyi.monster

    2025-01-05 回复

More Info for me 📱

IP信息

人生倒计时

今日已经过去小时
这周已经过去
本月已经过去
今年已经过去个月