NVIDIA A800 40GB Active

NVIDIA^® A800 40GB Active

SKU: VCNA800-PB

Where to Buy

Description

Certification Request

EOL Notification Form EOL Notification Form

NVIDIA A800 40GB Active

High-Performance Data Science and AI Platform

Rapid growth in workload complexity, data size, and the proliferation of emerging workloads like generative AI are ushering in a new era of computing, accelerating scientific discovery, improving productivity, and revolutionizing content creation. As models continue to explode in size and complexity to take on next-level challenges, an increasing number of workloads will need to run on local devices. Next-generation workstation platforms will need to deliver high-performance computing capabilities to support these complex workloads.

The NVIDIA A800 40GB Active GPU accelerates data science, AI, and HPC workflows with 432 third-generation Tensor Cores to maximize AI performance and ultra-fast and efficient inference capabilities. With third-generation NVIDIA NVLink technology, A800 40GB Active offers scalable performance for heavy AI workloads, doubling the effective memory footprint and enabling GPU-to-GPU data transfers up to 400 gigabytes per second (GB/s) of bidirectional bandwidth. This board is an AI-ready development platform with NVIDIA AI Enterprise, and delivers workstations ideally suited to the needs of skilled AI developers and data scientists.

Supercharge AI Development Out of the Box With NVIDIA AI Enterprise

Each NVIDIA A800 40GB Active GPU comes with a three-year subscription to NVIDIA AI Enterprise, an end-to-end software platform with enterprise security, stability, manageability, and support.

Performance Highlights
Architecture	NVIDIA Ampere
CUDA^® Cores	6912
Tensor Cores \| Gen 3	432
FP64 Performance	9.7 TFLOPS
FP32 Performance	19.5 TFLOPS
TF32 Tensor Core	311.8 TFLOPS^*
INT8 Tensor Core	1247.4 TOPS^*
NVLink	2-way low profile (2-slot and 3-slot bridges), 400 GB/s bidirectional
GPU Memory	40GB HBM2
Memory Interface	5120-bit
Memory Bandwidth	1555.2 GB/s
Multi-Instance GPU Support	Up to 7 MIG Instances
Thermal Solution	Active
Maximum Power Consumption	240W
^* Structural sparsity enabled

NVIDIA A800 40GB ACTIVE USE CASES

Amplified Graphics Performance When Paired with the Power of NVIDIA RTX

To support display functionality and deliver high-performance graphics for visual applications, the computing capabilities of NVIDIA A800 40GB Active are designed to be paired with NVIDIA RTX™-accelerated GPUs. NVIDIA RTX 4000 Ada Generation, RTX A4000, and T1000 8GB GPUs are certified to run in tandem with A800 40GB Active, delivering powerful real-time ray tracing and AI-accelerated graphics performance in a single-slot form factor.

Data Science and Data Analytics

Accelerate end-to-end data science and analytics workflows with powerful performance to extract meaningful insights from large-scale datasets quickly. By combining the high-performance computing capabilities of the A800 40GB Active with NVIDIA AI Enterprise, data practitioners can leverage a large collection of libraries, tools, and technologies to accelerate data science workflows - from data prep and training to inference.

Training and Development

With 40GB of HBM2 memory and powerful third-generation Tensor Cores that deliver up to 2x the performance of the previous generation, the A800 40GB Active GPU delivers incredible performance to conquer demanding AI development and training workflows on workstation platforms, including data preparation and processing, model optimization and tuning, and early-stage training.

The NVIDIA AI Enterprise software platform accelerates and simplifies deploying AI at scale, allowing organizations to develop once and deploy anywhere. Coupling this powerful software platform with the A800 40GB Active GPU enables AI developers to build, iterate, and refine AI models on workstations using included frameworks, simplifying the scaling process and reserving costly data center computing resources for more expensive, large-scale computations.

Inference

Inference is where AI delivers results, providing actionable insights by operationalizing trained models. With 432 third-generation Tensor Cores and 6,912 CUDA^® Cores, A800 40GB Active delivers 2X the inference operation performance versus the previous generation with support for structural sparsity and a broad range of precisions, including TF32, INT8, and FP64. AI developers can use NVIDIA inference software including NVIDIA TensorRT™ and NVIDIA Triton™ Inference Server that are part of NVIDIA AI Enterprise to simplify and optimize the deployment of AI models at scale.

Generative AI

Using neural networks to identify patterns and structures within existing data, generative AI applications enable users to generate new and original content from a wide variety of inputs and outputs, including images, sounds, animation, and 3D models. Leverage the NVIDIA generative AI solution, NeMo™ Framework, included in NVIDIA AI Enterprise along with NVIDIA A800 40GB Active GPU for easy, fast, and customizable generative AI model development.

High-Performance Computing

The A800 40GB Active GPU delivers incredible performance for GPU-accelerated computer-aided engineering (CAE) applications. Engineering and product development professionals can run large-scale simulations for finite element analysis (FEA), computational fluid dynamics (CFD), construction engineering management (CEM), and other engineering analysis codes in full FP64 precision with incredible speed, shortening development timelines and accelerating time to value. With the addition of RTX-accelerated GPUs providing display capabilities, scientists and engineers can visualize large-scale simulations and models in full design fidelity.

Energy and Geosciences

With 9.7 TFLOPS of FP64 compute performance, the A800 40GB Active GPU enables geoscience professionals to power the latest AI-augmented exploration and production software workflows and accelerate simulation processes to gain faster insight into subsurface data. For large-scale datasets, two A800 40GB Active GPUs can be connected with NVLink to provide 80GB of memory and twice the processing power.

Life Sciences

With the A800 40GB Active professionals across life science disciplines can accelerate complex data processing tasks, enable faster discovery, and improve decision-making. AI-accelerated life science applications like genomics sequencing, medical imaging, and personalized medicine can benefit from faster training and inference performance to accelerate the analysis of large datasets. For complex simulations and data processing tasks requiring high accuracy, FP64 capabilities allow for scientific applications like molecular dynamics, drug discovery, and genomic analysis to run with higher accuracy and precision, yielding more reliable results.

Warranty

3-Year Limited Warranty

Free dedicated phone and email technical support
(1-800-230-0130)

Dedicated NVIDIA professional products Field Application Engineers

Resources

Links

Contact gopny@pny.com for additional information.

Features

NVIDIA A800 40GB ACTIVE

PERFORMANCE AND USEABILITY FEATURES

NVIDIA Ampere Architecture

NVIDIA A800 40GB Active is one of the world's most powerful data center GPUs for AI, data analytics, and high-performance computing (HPC) applications. Building upon the major SM enhancements from the Turing GPU, the NVIDIA Ampere architecture enhances tensor matrix operations and concurrent executions of FP32 and INT32 operations.

More Efficient CUDA Cores

The NVIDIA Ampere architecture's CUDA^® cores bring up to 2.5x the single-precision floating point (FP32) throughput compared to the previous generation, providing significant performance improvements for any class or algorithm, or application that can benefit from embarrassingly parallel acceleration techniques.

Third-Generation Tensor Cores

Purpose-built for deep learning matrix arithmetic at the heart of neural network training and inferencing functions, the NVIDIA A800 40GB Active includes enhanced Tensor Cores that accelerate more datatypes (TF32 and BF16) and includes a new Fine-Grained Structured Sparsity feature that delivers up to 2x throughput for tensor matrix operations compared to the previous generation.

PCIe Gen 4

The NVIDIA A800 40GB Active supports PCI Express Gen 4, which provides double the bandwidth of PCIe Gen 3, improving data-transfer speeds from CPU memory for data-intensive tasks like AI and data science.

Multi-Instance GPU (MIG): Securely, Isolated Multi-Tenancy

Every AI and HPC application can benefit from acceleration, but not every application needs the performance of a full A800 40GB Active GPU. Multi-Instance GPU (MIG) maximizes the utilization of GPU-accelerated infrastructure by allowing an A800 40GB Active GPU to be partitioned into as many as seven independent instances, fully isolated at the hardware level. This provides multiple users access to GPU acceleration with their own high-bandwidth memory, cache, and compute cores. Now, developers can access breakthrough acceleration for all their applications, big and small, and get guaranteed quality of service. And IT administrators can offer right-sized GPU acceleration for optimal utilization and expand access to every user and application.

Ultra-Fast HBM2 Memory

To feed its massive computational throughput, the NVIDIA A800 40GB Active GPU has 40GB of high-speed HBM2 memory with a class-leading 1,555GB/s of memory bandwidth—a 79 percent increase compared to NVIDIA Quadro GV100. In addition to 40GB of HBM2 memory, A800 40GB Active has significantly more on-chip memory, including a 48 megabyte (MB) level 2 cache, which is nearly 7x larger than the previous generation. This provides the right combination of extreme bandwidth on-chip cache and large on-package high-bandwidth memory to accelerate the most compute-intensive AI models.

Compute Preemption

Preemption at the instruction-level provides finer grain control over compute and tasks to prevent longer-running applications from either monopolizing system resources or timing out.

MULTI-GPU TECHNOLOGY SUPPORT

Third-Generation NVLink

Connect a pair of NVIDIA A800 40GB Active cards with NVLink to increase the effective memory footprint and scale application performance by enabling GPU-to-GPU data transfers at rates up to 100GB/s (bidirectional) for a total bandwidth of 200GB/s. Scaling applications across multiple GPUs requires extremely fast movement of data. The third generation of NVLink in A800 40GB Active provides 400GB/s of GPU-to-GPU direct bandwidth.

SOFTWARE SUPPORT

Software Optimized for AI

Deep learning frameworks such as Caffe2, MXNet, CNTK, TensorFlow, and others deliver dramatically faster training times and higher multi-node training performance. GPU-accelerated libraries such as cuDNN, cuBLAS, and TensorRT deliver higher performance for both deep learning inference and High-Performance Computing (HPC) applications.

NVIDIA CUDA Parallel Computing Platform

Natively execute standard programming languages like C/C++ and Fortran, and APIs such as OpenCL, OpenACC, and Direct Compute to accelerate techniques such as ray tracing, video and image processing, and computation fluid dynamics.

Unified Memory

A single, seamless 49-bit virtual address space allows for the transparent migration of data between the full allocation of CPU and GPU memory.

NVIDIA AI Enterprise

Enterprise adoption of AI is now mainstream and leading to an increased demand for skilled AI developers and data scientists. Organizations require a flexible, high-performance platform consisting of optimized hardware and software to maximize productivity and accelerate AI development. NVIDIA A800 40GB Active and NVIDIA AI Enterprise provide an ideal foundation for these vital initiatives.

Specifications

NVIDIA A800 40GB ACTIVE

SPECIFICATIONS

PNY Part Number	VCNA800-PB
Product	NVIDIA A800 40GB Active
Architecture	NVIDIA Ampere
Foundry	TSMC
Process Size	7 nm NVIDIA Custom Process
Die Size	826 mm
CUDA^® Cores	6912
Streaming Multiprocessors	108
Tensor Cores \| Gen 3	432
FP64 Performance	9.7 TFLOPS
FP32 Performance	19.5 TFLOPS
TF32 Tensor Core	311.8 TFLOPS^*
BFLOAT16 Tensor Core	312 TFLOPS \| 624 TFLOPS^*
FP16 Tensor Core	312 TFLOPS \| 624 TFLOPS^*
INT8 Tensor Core	1247.4 TOPS^*
INT4 Tensor Core	1248 TOPS \| 2496 TFLOPS^*
NVLink	2-way low profile (2-slot and 3-slot bridges), 400 GB/s bidirectional
NVLink Bandwidth	400 GB/s
GPU Memory	40GB HBM2
Memory Interface	5120-bit
Memory Bandwidth	1555.2 GB/s
Multi-Instance GPU Support	Up to 7 MIG Instances
System Interface	PCIe 4.0 x16
Display Support	None Provided, use companion NVIDIA T1000 or RTX A4000 board for video output
Thermal Solution	Active
Form Factor	4.4" H x 10.5" L, Dual-Slot
Power Connector	CEM5 16-pin
Maximum Power Consumption	240W
^* Structural sparsity enabled. ¹ 3-year software subscription and enterprise support for NVIDIAS AI Enterprise license. Activation required.

AVAILABLE ACCESSORIES

RTXA6000NVLINK-KIT provides an NVLink connector for A800 40GB Active suitable for standard PCIe slot spacing motherboards. Each pair of NVIDIA A800 40GB Active boards requires three (3) NVLink bridges for proper NVLink operation. Application software support is required.

RTXA6000NVLINK-3S-KIT provides an NVLink connector for the A800 40GB Active for motherboards implementing wider PCIe slot spacing. All other features, benefits, application support, and three (3) NVLink kits per pair of NVIDIA A800 40GB Active boards is required.

SUPPORTED OPERATING SYSTEMS

Microsoft Windows 11 (64-bit)
Microsoft Windows 10 (64-bit)
Red Hat Enterprise Linux 7.x
SUSE Linux Enterprise Desktop 15.x
OpenSuse 15

Fedora 31
Ubuntu 18.04
FreeBSD 11.x
Solaris 11

PACKAGE CONTAINS

NVIDIA A800 40GB Active

CEM5 16-pin to dual 8-pin PCIe auxiliary power cable

NVIDIA A800 40GB Active

NVIDIA® A800 40GB Active

NVIDIA A800 40GB Active

High-Performance Data Science and AI Platform

Supercharge AI Development Out of the Box With NVIDIA AI Enterprise

Performance Highlights

Architecture

CUDA® Cores

Tensor Cores | Gen 3

FP64 Performance

FP32 Performance

TF32 Tensor Core

INT8 Tensor Core

NVLink

GPU Memory

Memory Interface

Memory Bandwidth

Multi-Instance GPU Support

Thermal Solution

Maximum Power Consumption

NVIDIA A800 40GB ACTIVE USE CASES

Amplified Graphics Performance When Paired with the Power of NVIDIA RTX

Data Science and Data Analytics

Training and Development

Inference

Generative AI

High-Performance Computing

Energy and Geosciences

Life Sciences

Warranty

Resources

Links

NVIDIA A800 40GB ACTIVE

PERFORMANCE AND USEABILITY FEATURES

NVIDIA Ampere Architecture

More Efficient CUDA Cores

Third-Generation Tensor Cores

PCIe Gen 4

Multi-Instance GPU (MIG): Securely, Isolated Multi-Tenancy

Ultra-Fast HBM2 Memory

Compute Preemption

MULTI-GPU TECHNOLOGY SUPPORT

Third-Generation NVLink

SOFTWARE SUPPORT

Software Optimized for AI

NVIDIA CUDA Parallel Computing Platform

Unified Memory

NVIDIA AI Enterprise

NVIDIA A800 40GB ACTIVE

SPECIFICATIONS

PNY Part Number

Product

Architecture

Foundry

Process Size

Die Size

CUDA® Cores

Streaming Multiprocessors

Tensor Cores | Gen 3

FP64 Performance

FP32 Performance

TF32 Tensor Core

BFLOAT16 Tensor Core

FP16 Tensor Core

INT8 Tensor Core

INT4 Tensor Core

NVLink

NVLink Bandwidth

GPU Memory

Memory Interface

Memory Bandwidth

Multi-Instance GPU Support

System Interface

Display Support

Thermal Solution

Form Factor

Power Connector

Maximum Power Consumption

AVAILABLE ACCESSORIES

SUPPORTED OPERATING SYSTEMS

NVIDIA^® A800 40GB Active

CUDA^® Cores

CUDA^® Cores