Deploying and Testing Gemma 4 Locally with GPUStack: A Multimodal Agent Capability Guide
The recent release of Gemma 4 introduces models that compete with Qwen 3.5, offering enhanced reasoning, native multi-modal understanding, and agentic features like tool calling and structured output. The model family supports text, image, video, and audio inputs with a 128K-256K context window, depending on the variant.
This walkthrough covers ...
Posted on Sat, 13 Jun 2026 17:26:06 +0000 by teongkia
Multi-Node Distributed Deployment of Qwen3.5-397B-A17B on Ascend 910B
While vLLM commonly relies on Ray for distributed multi-node inference, it is possible to achieve cross-node coordination without an external scheduler by combining data parallelism (DP) and tensor parallelism (TP). This article walks through a concrete deployment on two Atlas 800I A2 servers (each with 8× Ascend 910B 64 GB) using the quantized ...
Posted on Thu, 11 Jun 2026 18:34:50 +0000 by djp120
Building a Production-Ready Qwen3 Model Service Platform from Scratch
System Requirements
This guide covers deploying Qwen3 models on an Ubuntu 22.04 cloud instance equipped with an NVIDIA A10 GPU (24GB VRAM). The setup requires network connectivity for downloading container images and model files.
Environment Verification
Confirm GPU availability:
lspci | grep -i nvidia
gcc --version
NVIDIA Driver Installation
...
Posted on Thu, 14 May 2026 21:11:23 +0000 by phyzar
Accelerated Multi-node Inference with Ascend: Simplified Deployment of Large-scale Models Using GPUStack
Deploying large-scale models on Ascend NPUs often presents a significant challenge due to the complexity of configuring distributed inference using the standard MindIE engine. Although its performance is acceptable, the setup process involves intricate steps such as environment preparation, initialization, and fine-tuning of parameters. Even mi ...
Posted on Fri, 08 May 2026 09:15:03 +0000 by spasme