LLM推理部署（四）：一个用于训练、部署和评估基于大型语言模型的聊天机器人的开放平台FastChat

本文介绍: FastChat是用于对话机器人模型训练、部署、评估的开放平台。体验地址为：https://chat.lmsys.org/，该体验平台主要是为了收集人类的真实反馈，目前已经支持30多种大模型，已经收到500万的请求，收集了10万调人类对比大模型的数据，可以在排行榜（https://huggingface.co/space s/lmsys/chatbot–ar ena–leader b oa r d）进行查看。

FastCha t是用于对话机器人模型训练、部署、评估的开放平台。体验地址为：https://chat.lmsys.o rg/，该体验平台主要是为了收集人类的真实反馈，目前已经支持30多种大模型，已经收到500万的请求，收集了10万调人类对比大模型的数据，可以在排行榜（http s://huggingface.co/space s/lmsys/chatbot–ar ena–leaderbo ar d）进行查看。

FastCha t 核心特性包括：

pip3 install "fschat[model_worker,webui]"

St e p1 克隆源码并切换到对应的目录下

git clone https://github.com/lm-sys/FastChat.gitcd FastChat

如果是mac，还需要执行如下代码

brew install rust cmake

Step2 安装相关的包

pip3 install --upgrade pip  # enable PEP 660 supportpip3 install -e ".[model_worker,webui]"

下面展示一下不同模型以及不同大小启用聊天功能

模型大小	聊天命令	Hug gin g Face
7B	`python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.5`	lmsys/vi cuna-7b-v1.5
7B-16k	`python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.5-16k`	lmsys/vi cuna-7b-v1.5-16k
13B	`python3 -m fastchat.serve.cli --model-path lmsys/vicuna-13b-v1.5`	lms ys/vi cuna-13b-v1.5
13B-16k	`python3 -m fastchat.serve.cli --model-path lmsys/vicuna-13b-v1.5-16k`	lmsys/vi cuna-13b-v1.5-16k
33B	`python3 -m fastchat.serve.cli --model-path lmsys/vicuna-33b-v1.3`	lmsys/vi cuna-33b-v1.3

模型大小	聊天命令	Hug gin g Face
7B	`python3 -m fastchat.serve.cli --model-path lmsys/longchat-7b-32k-v1.5`	lmsys/long chat-7b-32k

模型大小	聊天命令	Hug ging Face
3B	`python3 -m fastchat.serve.cli --model-path lmsys/fastchat-t5-3b-v1.0`	lmsys/fast chat-t5-3b-v1.0

python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.5

python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.5 --num-gpus 2

python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.5 --num-gpus 2 --max-gpu-memory 8GiB

python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.5 --device cpu

CPU_ISA=amx python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.5 --device cpu

python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.5 --device mps --load-8bit

source /opt/intel/oneapi/setvars.sh

使用 --device xpu 启用XPU/GPU加速。

python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.3 --device xpu

python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.5 --load-8bit

python3 -m fastchat.serve.controller

python3 -m fastchat.serve.model_worker --model-path lmsys/vicuna-7b-v1.5

python3 -m fastchat.serve.test_message --model-name vicuna-7b-v1.5

python3 -m fastchat.serve.gradio_web_server

Hyper parameter	Global Batch Size	Learning rate	Epochs	Max length	Weight decay
Vicuna-13B	128	2e-5	3	2048	0

pip3 install -e ".[train]"

torchrun --nproc_per_node=4 --master_port=20001 fastchat/train/train_mem.py     --model_name_or_path meta-llama/Llama-2-7b-hf     --data_path data/dummy_conversation.json     --bf16 True     --output_dir output_vicuna     --num_train_epochs 3     --per_device_train_batch_size 2     --per_device_eval_batch_size 2     --gradient_accumulation_steps 16     --evaluation_strategy "no"     --save_strategy "steps"     --save_steps 1200     --save_total_limit 10     --learning_rate 2e-5     --weight_decay 0.     --warmup_ratio 0.03     --lr_scheduler_type "cosine"     --logging_steps 1     --fsdp "full_shard auto_wrap"     --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer'     --tf32 True     --model_max_length 2048     --gradient_checkpointing True     --lazy_preprocess True