PNY Technologies Inc.
0
A100PCIe_fr.png
a100-pcie-3QTR.png
A100PCIe_top-2.png
A100PCIe_top.png
A100PCIe_3QTR-TopLeft-Dual_NVLinks_Black_Alt.png
A100PCIe_fr.png
a100-pcie-3QTR.png
A100PCIe_top-2.png
A100PCIe_top.png
A100PCIe_3QTR-TopLeft-Dual_NVLinks_Black_Alt.png

NVIDIA A100

NVIDIA® A100

  • SKU: NVA100TCGPU-KIT
  • Description

    NVIDIA A100

    Unprecedented Acceleration for World’s Highest-Performing Elastic Data Centers

    The NVIDIA A100 Tensor Core GPU delivers unprecedented acceleration—at every scale—to power the world’s highest performing elastic data centers for AI, data analytics, and high-performance computing (HPC) applications. As the engine of the NVIDIA data center platform, A100 provides up to 20x higher performance over the prior NVIDIA Volta generation. A100 can efficiently scale up or be partitioned into seven isolated GPU instances, with Multi-Instance GPU (MIG) providing a unified platform that enables elastic data centers to dynamically adjust to shifting workload demands.

    A100 is part of the complete NVIDIA data center solution that incorporates building blocks across hardware, networking, software, libraries, and optimized AI models and applications from NGC. Representing the most powerful end-to-end AI and HPC platform for data centers, it allows researchers to deliver real-world results and deploy solutions into production at scale, while allowing IT to optimize the utilization of every available A100 GPU.

    NVIDIA A100

    Highlights

    CUDA Cores 6912
    Streaming Multiprocessors 108
    Tensor Cores | Gen 3 432
    GPU Memory 40 GB HBM2e ECC on by Default
    Memory Interface 5120-bit
    Memory Bandwidth 1555 GB/s
    NVLink 2-Way, 2-Slot, 600 GB/s Bidirectional
    MIG (Multi-Instance GPU) Support Yes, up to 7 GPU Instances
    FP64 9.7 TFLOPS
    FP64 Tensor Core 19.5 TFLOPS
    FP32 19.5 TFLOPS
    TF32 Tensor Core 156 TFLOPS | 312 TFLOPS*
    BFLOAT16 Tensor Core 312 TFLOPS | 624 TFLOPS*
    FP16 Tensor Core 312 TFLOPS | 624 TFLOPS*
    INT8 Tensor Core 624 TOPS | 1248 TOPS*
    INT4 Tensor Core 1248 TOPS | 2496 TOPS*
    Thermal Solutions Passive
    vGPU Support NVIDIA Virtual Compute Server (vCS)
    System Interface PCIE 4.0 x16
    Maximum Power Consumption 250 W

    NVIDIA Ampere-Based Architecture

    • A100 accelerates workloads big and small. Whether using MIG to partition an A100 GPU into smaller instances, or NVLink to connect multiple GPUs to accelerate large-scale workloads, the A100 easily handles different-sized application needs, from the smallest job to the biggest multi-node workload.

    Third-Generation Tensor Cores

    • First introduced in the NVIDIA Volta architecture, NVIDIA Tensor Core technology has brought dramatic speedups to AI training and inference operations, bringing down training times from weeks to hours and providing massive acceleration to inference. The NVIDIA Ampere architecture builds upon these innovations by providing up to 20x higher FLOPS for AI. It does so by improving the performance of existing precisions and bringing new precisions—TF32, INT8, and FP64—that accelerate and simplify AI adoption and extend the power of NVIDIA Tensor Cores to HPC.

    TF32 for AI: 20x Higher Performance, Zero Code Change

    • As AI networks and datasets continue to expand exponentially, their computing appetite is similarly growing. Lower precision math has brought huge performance speedups, but they’ve historically required some code changes. A100 brings a new precision, TF32, which works just like FP32 while providing 20x higher FLOPS for AI without requiring any code change. And NVIDIA’s automatic mixed precision feature enables a further 2x boost to performance with just one additional line of code using FP16 precision. A100 Tensor Cores also include support for BFLOAT16, INT8, and INT4 precision, making A100 an incredibly versatile accelerator for both AI training and inference.

    Double-Precision Tensor Cores: The Biggest Milestone Since FP64 for HPC

    • A100 brings the power of Tensor Cores to HPC, providing the biggest milestone since the introduction of double-precision GPU computing for HPC. The third generation of Tensor Cores in A100 enables matrix operations in full, IEEE-compliant, FP64 precision. Through enhancements in NVIDIA CUDA-X math libraries, a range of HPC applications that need double-precision math can now see a boost of up to 2.5x in performance and efficiency compared to prior generations of GPUs.

    Multi-Instance GPU (MIG)

    • Every AI and HPC application can benefit from acceleration, but not every application needs the performance of a full A100. With Multi-Instance GPU (MIG), each A100 can be partitioned into as many as seven GPU instances, fully isolated at the hardware level with their own high-bandwidth memory, cache, and compute cores. Now, developers can access breakthrough acceleration for all their applications, big and small, and get guaranteed quality of service. And IT administrators can offer right-sized GPU acceleration for optimal utilization and expand access to every user and application.

      MIG is available across both bare metal and virtualized environments and is supported by NVIDIA Container Runtime which supports all major runtimes such as LXC, Docker, CRI-O, Containerd, Podman, and Singularity. Each MIG instance is a new GPU type in Kubernetes and will be available across all Kubernetes distributions such as Red Hat OpenShift, VMware Project Pacific, and others on-premises and on public clouds via NVIDIA Device Plugin for Kubernetes. Administrators can also benefit from hypervisor-based virtualization, including KVM based hypervisors such as Red Hat RHEL/RHV, and VMware ESXi, on MIG instances through NVIDIA vComputeServer.

    HBM2e

    • With 40 gigabytes (GB) of high-bandwidth memory (HBM2e), A100 delivers improved raw bandwidth of 1.6TB/sec, as well as higher dynamic random access memory (DRAM) utilization efficiency at 95 percent. A100 delivers 1.7x higher memory bandwidth over the previous generation.

    Structural Sparsity

    • AI networks are big, having millions to billions of parameters. Not all of these parameters are needed for accurate predictions, and some can be converted to zeros to make the models “sparse” without compromising accuracy. Tensor Cores in A100 can provide up to 2x higher performance for sparse models. While the sparsity feature more readily benefits AI inference, it can also improve the performance of model training.

    Next Generation NVLink

    • NVIDIA NVLink in A100 delivers 2x higher throughput compared to the previous generation, at up to 600 GB/s to unleash the highest application performance possible on a single server. Two NVIDIA A100 PCIe boards can be bridged via NVLink, and multiple pairs of NVLink connected boards can reside in a single server (number varies based on server enclose, thermals, and power supply capacity).

    Every Deep Learning Framework, 700+ GPU-Accelerated Applications

    • The NVIDIA A100 Tensor Core GPU is the flagship product of the NVIDIA data center platform for deep learning, HPC, and data analytics. It accelerates every major deep learning framework and accelerates over 700 HPC applications. It’s available everywhere, from desktops to servers to cloud services, delivering both dramatic performance gains and cost-saving opportunities.

    Virtualization Capabilities

    • Virtualized compute workloads such as AI, Deep learning, and high-performance computing (HPC) with NVIDIA Virtual Compute Server (vCS). The NVIDIA A100 PCIe is an ideal upgrade path for existing V100/V100S Tensor Core GPU infrastructure.

    Structural Sparsity: 2X Higher Performance for AI

    • Modern AI networks are big, having millions and in some cases billions of parameters. Not all of these parameters are needed for accurate predictions, and some can be converted to zeros to make the models “sparse” without compromising accuracy. Tensor Cores in A100 can provide up to 2x higher performance for sparse models. While the sparsity feature more readily benefits AI inference, it can also improve the performance of model training.


    PNY Logo

    Warranty


    3-Year Limited Warranty

    Free dedicated phone and email technical support
    (1-800-230-0130)

    Dedicated NVIDIA professional products Field Application Engineers

    Contact gopny@pny.com for additional information.

  • Features

    NVIDIA A100

    PERFORMANCE AND USEABILITY FEATURES

    Data Center Class Reliability

    Designed for 24 x 7 data center operations and driven by power-efficient hardware and components selected for optimum performance, durability, and longevity. Every NVIDIA A100 board is designed, built and tested by NVIDIA to the most rigorous quality and performance standards, ensuring that leading OEMs and systems integrators can meet or exceed the most demanding real-world conditions.

    Secure and measured boot with hardware root of trust technology within the GPU provides an additional layer of security for data centers. A100 meets the latest data center standards and is NEBS Level 3 compliant. The NVIDIA A100 includes a CEC 1712 security chip that enables secure and measured boot with hardware root of trust, ensuring that firmware has not been tampered with or corrupted.

    NVIDIA Ampere Architecture

    NVIDIA A100 is the world's most powerful data center GPU for AI, data analytics, and high-performance computing (HPC) applications. Building upon the major SM enhancements from the Turing GPU, the NVIDIA Ampere architecture enhances tensor matrix operations and concurrent executions of FP32 and INT32 operations.

    More Efficient CUDA Cores

    The NVIDIA Ampere architecture’s CUDA cores bring up to 2.5x the single-precision floating point (FP32) throughput compared to the previous generation, providing significant performance improvements for any class or algorithm or application that can benefit from embarrassingly parallel acceleration techniques.

    Third Generation Tensor Cores

    Purpose-built for deep learning matrix arithmetic at the heart of neural network training and inferencing functions, the NVIDIA A100 includes enhanced Tensor Cores that accelerate more datatypes (TF32 and BF16) and includes a new Fine-Grained Structured Sparsity feature that delivers up to 2x throughput for tensor matrix operations compared to the previous generation.

    PCIe Gen 4

    The NVIDIA A100 supports PCI Express Gen 4, which provides double the bandwidth of PCIe Gen 3, improving data-transfer speeds from CPU memory for data-intensive tasks like AI and data science.

    High Speed HBM2e Memory

    With 40 gigabytes (GB) of high-bandwidth memory (HBM2e), the NVIDIA A100 PCIe delivers improved raw bandwidth of 1.55TB/sec, as well as higher dynamic random access memory (DRAM) utilization efficiency at 95 percent. A100 PCIe delivers 1.7x higher memory bandwidth over the previous generation.

    Error Correction Without a Performance or Capacity Hit

    HBM2e memory implements error correction without any performance (bandwidth) or capacity hit, unlike competing technologies like GDDR6 or GDDR6X.

    Compute Preemption

    Preemption at the instruction-level provides finer grain control over compute and tasks to prevent longer-running applications from either monopolizing system resources or timing out.

    MULTI-GPU TECHNOLOGY SUPPORT

    Third Generation NVLink

    Connect two NVIDIA A100 PCIe cards with NVLink to double the effective memory footprint and scale application performance by enabling GPU-to-GPU data transfers at rates up to 600 GB/s of bidirectional bandwidth. NVLink bridges are available for motherboards with standard or wide slot spacing.

    SOFTWARE SUPPORT

    Virtual GPU Software for Virtualization

    Support for NVIDIA Virtual Compute Server (vCS) accelerates virtualized compute workloads such as high-performance computing, AI, data science, big-data analytics, and HPC applications.

    Software Optimized for AI

    Deep learning frameworks such as Caffe2, MXNet, CNTK, TensorFlow, and others deliver dramatically faster training times and higher multi-node training performance. GPU accelerated libraries such as cuDNN, cuBLAS, and TensorRT delivers higher performance for both deep learning inference and High-Performance Computing (HPC) applications.

    NVIDIA CUDA Parallel Computing Platform

    Natively execute standard programming languages like C/C++ and Fortran, and APIs such as OpenCL, OpenACC and Direct Compute to accelerates techniques such as ray tracing, video and image processing, and computation fluid dynamics.

    Unified Memory

    A single, seamless 49-bit virtual address space allows for the transparent migration of data between the full allocation of CPU and GPU memory.

  • Specifications

    NVIDIA A100

    SPECIFICATIONS

    Compatible in all systems that accept an NVIDIA A100

    Architecture Ampere
    Process Size 7nm | TSMC
    Transistors 54 Billion
    Die Size 826 mm2
    CUDA Cores 6912
    Streaming Multiprocessors 108
    Tensor Cores | Gen 3 432
    Multi-Instance GPU (MIG) Support Yes, up to seven instances per GPU
    FP64 9.7 TFLOPS
    FP64 Tensor Core 19.5 TFLOPS
    FP32 19.5 TFLOPS
    TF32 Tensor Core 156 TFLOPS | 312 TFLOPS*
    BFLOAT16 Tensor Core 312 TFLOPS | 624 TFLOPS*
    FP16 Tensor Core 312 TFLOPS | 624 TFLOPS*
    INT8 Tensor Core 624 TOPS | 1248 TOPS*
    INT4 Tensor Core 1248 TOPS | 2496 TOPS*
    NVLink 2-Way Low Profile, 2-Slot
    NVLink Interconnect 600 GB/s Bidirectional
    GPU Memory 40 GB HBM2e
    Memory Interface 5120-bit
    Memory Bandwidth 1555 GB/s
    System Interface PCIe 4.0 x16
    Thermal Solution Passive
    vGPU Support NVIDIA Virtual Compute Server with MIG support
    Secure and Measured Boot Hardware Root of Trust CEC 1712
    NEBS Ready Level 3
    Power Connector 8-pin CPU
    Maximum Power Consumption 250 W

    *With Sparsity

    View All Product Specifications

    AVAILABLE ACCESSORIES

    • RTXA6000NVLINK-KIT provides an NVLink connector for A100 suitable for standard PCIe slot spacing motherboards, effectively fusing two physical boards into one logical entity with 21504 CUDA Cores, 672 Tensor Cores, 168 RT Cores, and 96 GB of GDDR6 ECC memory, with a bandwidth of 112 GB/s. Application support is required.
    • RTXA6000NVLINK-3S-KIT provides an NVLink connector for the NVIDIA A100 PCIe for motherboards implementing wider PCIe slot spacing. All other features, benefits, application support, and three (3) NVLink kits per pair of A100 boards is identical to the standard slot spacing version.

    SUPPORTED OPERATING SYSTEMS

    • Windows Server 2012 R2
    • Windows Server 2016 1607, 1709
    • Windows Server 2019
    • RedHat CoreOS 4.7
    • Red Hat Enterprise Linux 8.1-8.3
    • Red Hat Enterprise Linux 7.7-7.9
    • Red Hat Linux 6.6+
    • SUSE Linux Enterprise Server 15 SP2
    • SUSE Linux Enterprise Server 12 SP 3+
    • Ubuntu 14.04 LTS/16.04/18.04 LTS/20.04 LTS

    vGPU SOFTWARE SUPPORT

    • NVIDIA Virtual Compute Server (vCS)

    PACKAGE CONTAINS

    • NVIDIA A100 PCIe
    • 8-pin CPU auxiliary power cable
Close