加载HunYuan-A52B-Instruct-Int8推理报错

环境：8*A100 80G，
torch=2.5.1 cuda121
运行run_server_int8.py报错如下：
(VllmWorkerProcess pid=145185) ERROR 11-25 22:31:33 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method determine_num_available_blocks: Could not run '_C::rms_norm' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. '_C::rms_norm' is only available for these backends: [CPU, Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, AutogradMeta, Tracer, AutocastCPU, AutocastXPU, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].
请问这个问题怎么解决

腾讯开源/Tencent-Hunyuan-Large

内容风险标识

评论 (0)

腾讯开源/Tencent-Hunyuan-Large .gitee-modal { width: 500px !important; }

内容风险标识