Skip to content

verl安装

Verl仓库与版本

  1. 地址: https://github.com/volcengine/verl
  2. 版本: 0.5.0.dev

安装

  1. conda安装参考:conda安装
  2. 创建新环境:

    conda create -n verl python==3.10
    conda activate verl
    pip install torch==2.6.0
    

  3. 安装 cuda 12.4 (所有命令在sudo下执行)

    wget https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb
    dpkg -i cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb
    cp /var/cuda-repo-ubuntu2204-12-4-local/cuda-*-keyring.gpg /usr/share/keyrings/
    apt-get update
    apt-get -y install cuda-toolkit-12-4
    update-alternatives --set cuda /usr/local/cuda-12.4
    

  4. 安装 cudnn 9.8.0 (所有命令在sudo下执行), ubuntu版本不对的话修改一下

    wget https://developer.download.nvidia.com/compute/cudnn/9.8.0/local_installers/cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb
    dpkg -i cudnn-local-repo-ubuntu2204-9.8.0_1.0-1_amd64.deb
    cp /var/cudnn-local-repo-ubuntu2204-9.8.0/cudnn-*-keyring.gpg /usr/share/keyrings/
    apt-get update
    apt-get -y install cudnn-cuda-12
    

  5. 安装 Nvidia Apex

    git clone https://github.com/NVIDIA/apex.git && \
    cd apex && \
    MAX_JOB=32 pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
    cd ..
    

  6. 其他依赖安装(flash-attention, vllm, flashinfer, flashinfer只有cp38的包)

    wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
    pip install flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
    pip install vllm==0.8.3
    wget -nv https://github.com/flashinfer-ai/flashinfer/releases/download/v0.2.2.post1/flashinfer_python-0.2.2.post1+cu124torch2.6-cp38-abi3-linux_x86_64.whl && \
    pip install --no-cache-dir flashinfer_python-0.2.2.post1+cu124torch2.6-cp38-abi3-linux_x86_64.whl
    

  7. 安装 verl

    git clone https://github.com/volcengine/verl.git
    cd verl
    pip install -e .
    

使用

使用gsm8k进行训练与测试: 1. 数据集格式:

{
    "question": "Natalia四月份向48个朋友出售了发夹,五月份的销量减半。问四五月总共销售多少发夹?",
    "answer": "五月销售数量:48/2 = <<48/2=24>>24个\n总销售量:48+24 = <<48+24=72>>72个\n#### 72"
}

  1. 数据准备:examples/data_preprocess/gsm8k.py,其中data路径得改一下
  2. 开始训练,注意模型的路径和数据路径都要改成自己的

    PYTHONUNBUFFERED=1 
    CUDA_VISIBLE_DEVICES=0,1,2,3
    python -m verl.trainer.main_ppo \
     data.train_files=/home/hzc/data4/llm_related/verl/data/gsm8k/train.parquet \
     data.val_files=/home/hzc/data4/llm_related/verl/data/gsm8k/test.parquet \
     data.train_batch_size=512 \
     data.max_prompt_length=512 \
     data.max_response_length=256 \
     actor_rollout_ref.model.path=/home/hzc/data4/llm_related/models/Qwen2.5-0.5B-Instruct \
     actor_rollout_ref.actor.optim.lr=1e-6 \
     actor_rollout_ref.actor.ppo_mini_batch_size=128 \
     actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=8 \
     actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \
     actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
     actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \
     actor_rollout_ref.rollout.name=vllm \
     actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=8 \
     critic.optim.lr=1e-5 \
     critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \
     critic.ppo_micro_batch_size_per_gpu=8 \
     algorithm.kl_ctrl.kl_coef=0.001 \
     trainer.logger=['console'] \
     trainer.val_before_train=False \
     trainer.default_hdfs_dir=null \
     trainer.n_gpus_per_node=4 \
     trainer.nnodes=1 \
     trainer.save_freq=10 \
     trainer.test_freq=10 \
     trainer.project_name='verl_demo' \
     trainer.total_epochs=15 2>&1 | tee verl_demo.log
    
    这里用了4卡4090, 15个epoch,大概1个多小时能训完

  3. 想要测试一下各个checkpoint的分数,需要先把FSDP格式的checkpoint转换成标准的huggingface格式的checkpoint。

    python -m verl.model_merger merge \
        --backend fsdp \
        --local_dir checkpoints/verl_demo/gsm8k/global_step_100/actor \
        --target_dir checkpoints/verl_demo/gsm8k/global_step_100/actor_hf
    

  4. tokenizer和model这下都可以用AutoTokenizerAutoModelForCausalLM加载了, 模仿verl/verl/utils/reward_score/gsm8k.py里面的评估函数可以让LLM写一个评估checkpoint效果的脚本