Blog
QNAP Unveils QAI-h1290FX Edge AI Storage Server

Built with server-grade AMD EPYC processing, with support for NVIDIA RTX GPU acceleration, and twelve U.2 NVMe/SATA SSD slots, the QAI-h1290FX delivers a high-performance, on-prem AI infrastructure for organizations that demand low-latency inference, full data privacy, and operational control—without relying on the cloud.
Powered by QNAP’s ZFS-based QuTS hero operating system, the QAI-h1290FX provides enterprise-grade data integrity, near-limitless snapshots, and inline deduplication. It supports native GPU access in containers through Container Station and GPU passthrough for virtual machines via Virtualization Station. IT teams, developers, and research groups can efficiently run inference models, generative AI applications, and RAG pipelines with full control over performance and resource allocation.
The QAI-h1290FX includes a curated selection of preloaded AI tools such as AnythingLLM, OpenWebUI, and Ollama, allowing fast deployment of private LLM workflows. Additional AI applications like Stable Diffusion, ComfyUI, n8n, and vLLM are also being integrated to expand functionality. This enables users to rapidly build on-prem AI platforms and automate workflows in a secure, scalable, and fully controlled environment.
“The QAI-h1290FX meets the growing demand for on-prem AI infrastructure,” said Oliver Lam, Product Manager at QNAP. “We wanted to eliminate the friction in building a GPU workstation, installing tools, and configuring complex environments. With the QAI-h1290FX, users can deploy and run their AI models right out of the box—with full control over their data and zero reliance on the cloud.”
Key Features of the QAI-h1290FX
- All-Flash Storage Architecture: Twelve U.2 NVMe/SATA SSD slots enable ultra-fast I/O for high-frequency AI model execution and data streaming.
- 16-core AMD EPYC 7302P Processor: Provides 32 threads of server-class compute power—ideal for AI inference, virtualization, and heavy parallel workloads.
- GPU-ready Architecture: Supports optional NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation GPU, featuring up to 96 GB of GPU memory and support for CUDA, TensorRT, and Transformer Engine acceleration—significantly boosting performance for on-prem LLM inference, image generation, and deep learning workloads.
- Containerized AI Environment & GPU Resource Management: Supports Docker and LXD with intuitive GPU allocation. Users can quickly launch AI tools via the built-in AI app center and assign GPU resources without command-line configuration.
- Fully Local Deployment with No Cloud Dependency: Run AI-powered chat assistants, document search engines, or knowledge bases fully on-premises. Keep sensitive data in-house while accelerating AI workflows.
- High-speed Networking and Scalable Architecture: Comes with dual 25GbE and dual 2.5GbE ports. PCIe slots support optional 100GbE upgrades. Compatible with QNAP JBOD expansion enclosures for large-scale AI data storage.
Use Case Highlights
- Internal AI Assistants / On-Prem Chat Interfaces
- Deploy conversational AI interfaces for knowledge lookup, employee training, and policy Q&A—fully under your control.
- Enterprise RAG Search
- Leverage private RAG pipelines to perform fast, contextual search across contracts, reports, and internal documents.
- Image Generation for Creative Teams
- Run Stable Diffusion or ComfyUI for AI-powered design workflows and visual content generation.
- AI-Driven IT Automation
- Use n8n to automate inference tasks, content generation, or alerts—integrating AI seamlessly into business processes.
With the QAI-h1290FX, QNAP delivers a practical, high-performance path for deploying generative AI within enterprise boundaries. Whether used in legal, HR, creative, or IT operations, it helps teams move faster, stay compliant, and maintain full control over their AI strategy—right at the edge.












