Multi-Node Distributed Deployment of Qwen3.5-397B-A17B on Ascend 910B

While vLLM commonly relies on Ray for distributed multi-node inference, it is possible to achieve cross-node coordination without an external scheduler by combining data parallelism (DP) and tensor parallelism (TP). This article walks through a concrete deployment on two Atlas 800I A2 servers (each with 8× Ascend 910B 64 GB) using the quantized ...

Posted on Thu, 11 Jun 2026 18:34:50 +0000 by djp120