verl安装
Verl仓库与版本¶
- 地址:
https://github.com/volcengine/verl
- 版本:
0.5.0.dev
安装¶
- conda安装参考:conda安装
-
创建新环境:
-
安装 cuda 12.4 (所有命令在sudo下执行)
wget https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb dpkg -i cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb cp /var/cuda-repo-ubuntu2204-12-4-local/cuda-*-keyring.gpg /usr/share/keyrings/ apt-get update apt-get -y install cuda-toolkit-12-4 update-alternatives --set cuda /usr/local/cuda-12.4
-
安装 cudnn 9.8.0 (所有命令在sudo下执行), ubuntu版本不对的话修改一下
wget https://developer.download.nvidia.com/compute/cudnn/9.8.0/local_installers/cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb dpkg -i cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb cp /var/cudnn-local-repo-ubuntu2204-9.8.0/cudnn-*-keyring.gpg /usr/share/keyrings/ apt-get update apt-get -y install cudnn-cuda-12
-
安装 Nvidia Apex
-
其他依赖安装(flash-attention, vllm, flashinfer, flashinfer只有cp38的包)
wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl pip install flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl pip install vllm==0.8.3 wget -nv https://github.com/flashinfer-ai/flashinfer/releases/download/v0.2.2.post1/flashinfer_python-0.2.2.post1+cu124torch2.6-cp38-abi3-linux_x86_64.whl && \ pip install --no-cache-dir flashinfer_python-0.2.2.post1+cu124torch2.6-cp38-abi3-linux_x86_64.whl
-
安装 verl
使用¶
使用gsm8k进行训练与测试: 1. 数据集格式:
{
"question": "Natalia四月份向48个朋友出售了发夹,五月份的销量减半。问四五月总共销售多少发夹?",
"answer": "五月销售数量:48/2 = <<48/2=24>>24个\n总销售量:48+24 = <<48+24=72>>72个\n#### 72"
}
- 数据准备:
examples/data_preprocess/gsm8k.py
,其中data路径得改一下 -
开始训练,注意模型的路径和数据路径都要改成自己的
这里用了4卡4090, 15个epoch,大概1个多小时能训完PYTHONUNBUFFERED=1 CUDA_VISIBLE_DEVICES=0,1,2,3 python -m verl.trainer.main_ppo \ data.train_files=/home/hzc/data4/llm_related/verl/data/gsm8k/train.parquet \ data.val_files=/home/hzc/data4/llm_related/verl/data/gsm8k/test.parquet \ data.train_batch_size=512 \ data.max_prompt_length=512 \ data.max_response_length=256 \ actor_rollout_ref.model.path=/home/hzc/data4/llm_related/models/Qwen2.5-0.5B-Instruct \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.actor.ppo_mini_batch_size=128 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=8 \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \ actor_rollout_ref.rollout.tensor_model_parallel_size=2 \ actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=8 \ critic.optim.lr=1e-5 \ critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \ critic.ppo_micro_batch_size_per_gpu=8 \ algorithm.kl_ctrl.kl_coef=0.001 \ trainer.logger=['console'] \ trainer.val_before_train=False \ trainer.default_hdfs_dir=null \ trainer.n_gpus_per_node=4 \ trainer.nnodes=1 \ trainer.save_freq=10 \ trainer.test_freq=10 \ trainer.project_name='verl_demo' \ trainer.total_epochs=15 2>&1 | tee verl_demo.log
-
想要测试一下各个checkpoint的分数,需要先把FSDP格式的checkpoint转换成标准的huggingface格式的checkpoint。
-
tokenizer和model这下都可以用
AutoTokenizer
和AutoModelForCausalLM
加载了, 模仿verl/verl/utils/reward_score/gsm8k.py
里面的评估函数可以让LLM写一个评估checkpoint效果的脚本