NVIDIA T4

NVIDIA^® T4

SKU: TCSCT4-KIT

Where to Buy

Description

Certification Request

EOL Notification Form EOL Notification Form

NVIDIA T4

Next-Level Acceleration Has Arrived

The artificial intelligence revolution surges forward, igniting opportunities for businesses to reimagine how they solve their customers’ challenges. We’re racing toward a future where every customer interaction, every product, every service offering will be touched and improved by AI. And making that future a reality requires a computing platform that can accelerate the full diversity of modern AI, enabling businesses to re-envision how they meet—and exceed—customer demands and cost-effectively scale their AI-based products and services.

The NVIDIA T4 GPU is among the world’s most powerful universal inference accelerators. Powered by NVIDIA Turing Tensor Cores, T4 provides revolutionary multi-precision inference performance to accelerate the diverse applications of modern AI. T4 is a part of the NVIDIA AI inference platform that supports all AI frameworks and provides comprehensive tooling and integrations to drastically simplify the development and deployment of advanced AI.

Highlights

GPU Architecture	NVIDIA Turing
Turing Tensor Cores	320
NVIDIA CUDA Cores	2560
Peak FP32	8.1 TFLOPS
Mixed Precision \| FP16/FP32	65 TFLOPS
INT8	130 TOPS
INT4	260 TOPS
GPU Memory	16 GB GDDR6
Memory Bandwidth	300 GB/s
Thermal Solution	Passive
Maximum Power Consumption	70 W
System Interface	PCIe Gen 3.0 x16
Compute APIs	CUDA \| NVIDIA TensorRT \| ONYX

Turing Tensor Cores: The Heart of Universal Inference Acceleration

AI is evolving rapidly. In the past few years alone, a Cambrian explosion of neural network types has seen the emergence of convolutional neural networks (CNNs), recurrent neural networks (RNNs), generative adversarial networks (GANs), reinforcement learning (RL), and hybrid network architectures. Accelerating these diverse models requires both high performance and programmability.

NVIDIA T4 introduces the revolutionary Turing Tensor Core technology with multi-precision computing for AI inference. Powering breakthrough performance from FP32 to FP16 to INT8, as well as INT4 and binary precisions, T4 delivers dramatically higher performance than CPUs.

Developers can unleash the power of Turing Tensor Cores directly through NVIDIA TensorRT, software libraries and integrations with all AI frameworks. These tools let developers target optimal precision for different AI applications, achieving dramatic performance gains without compromising accuracy of results.

State-of-the-art Inference in Real-Time

Responsiveness is key to user engagement for services such as conversational AI, recommender systems, and visual search. As models increase in accuracy and complexity, delivering the right answer right now requires exponentially larger compute capability.

NVIDIA T4 features multi-process service (MPS) with hardware-accelerated work distribution. MPS reduces latency for processing requests, and enables multiple independent requests to be simultaneously processed, resulting in higher throughput and efficient utilization of GPUs.

Twice the Video Decode Performance

Video continues on its explosive growth trajectory, comprising over two-thirds of all Internet traffic. Accurate video interpretation through AI is driving the most relevant content recommendations, finding the impact of brand placements in sports events, and delivering perception capabilities to autonomous vehicles, among other usages.

NVIDIA T4 delivers breakthrough performance for AI video applications, with dedicated hardware transcoding engines that bring twice the decoding performance of prior-generation GPUs. T4 can decode up to 38 full-HD video streams, making it easy to integrate scalable deep learning into the video pipeline to deliver innovative, smart video services. It features performance and efficiency modes to enable either fast encoding or the lowest bit-rate encoding without losing video quality.

Industry’s Most Comprehensive AI Inference Platform

AI has crossed the chasm and is rapidly moving from early adoption by pioneers to broader use across industries and large-scale production deployments. Powered by the flexible NVIDIA CUDA development environment and a mature ecosystem with over 1M developers, NVIDIA AI Platform has been evolving for over a decade to offer comprehensive tooling and integrations to simplify the development and deployment of AI.

NVIDIA TensorRT enables optimization of trained models to efficiently run inference on GPUs. NVIDIA ATTIS and Kubernetes on NVIDIA GPUs streamline the deployment and scaling of AI-powered applications on GPU-accelerated infrastructure for inference. Libraries like cuDNN, cuSPARSE, CUTLASS, and DeepStream accelerate key neural network functions and use cases, like video transcoding. And workflow integrations with all AI frameworks freely available from NVIDIA GPU Cloud containers enable developers to transparently harness the innovations in GPU computing for end-to- end AI workflows, from training neural networks to running inference in production applications.

Warranty

3-Year Limited Warranty

Free dedicated phone and email technical support
(1-800-230-0130)

Dedicated NVIDIA Quadro Field Application Engineers

Resources

Product Brochure
Product Brief
Video Encode and Decode GPU Support Matrix

Contact gopny@pny.com for additional information.

Features

NVIDIA T4

Key Benefits

Data Scientists

GPU Inference enables you to bring state-of-the-art AI to your products and services by removing the computing bottleneck to innovation.

Every AI framework is supported on the NVIDIA inference platform, which drastically simplifies optimization and deployment of your AI models from training to inference.

IT Managers and Data Center Directors

AI will increasingly be used in products and services, with AI inference constituting an increasingly large portion of data center workloads.

NGC (NVIDIA GPU Cloud) simplifies deployment by providing a comprehensive catalog of performance-engineered containers for both training and inference.

With multi-precision support, T4 GPUs allow standardization on a single architecture for all AI inference workloads.

T4 GPUs provide the most efficient platform for both real-time inference as well as large batch inference.

NVIDIA GPUs are designed for the scalability, uptime, and serviceability needs of data centers.

GPU inference saves money by providing a dramatic boost in throughput and power efficiency.

Lower TCO and Broad Industry and Vendor support

GPU inference dramatically improves total cost of ownership (TCO) by delivering the same throughput with fewer, more powerful servers that require a fraction of power and floor space.

GPU inference servers are widely available through PNY’s ecosystem of leading OEMs and ODMs with enterprise class support.

Specifications

NVIDIA T4

SPECIFICATIONS

Compatible in all systems that accept an NVIDIA T4

GPU Architecture	NVIDIA Turing
Use Case	Universal Deep Learning Accelerator
NVIDIA GPU	TU104-895
GPU Clocks	585 MHz Base \| 1590 MHZ Maximum Boots
Turing Tensor Cores	320
NVIDIA CUDA Cores	2560
Peak FP32	8.1 TFLOPS
Mixed Precision \| FP16/FP32	65 TFLOPS
INT8	130 TOPS
INT4	260 TOPS
GPU Memory	16 GB GDDR6
ECC	Yes \| Enabled by default
Memory Interface	256-bit
Maximum Memory Clock	5001 MHz
Memory Bandwidth	300 GB/s
CODECs Supported	H.264 \| H.265
720p Encoding Streams	22 Simultaneously in HQ mode
1080p Encoding Streams	10 Simultaneously
Ultra HD \| 2160p Streams	2-3 Simultaneously
Thermal Solution	Passive
Operating Temperature	0 to 50 degrees Centigrade
Operating Humidity	5% to 90% relative humidity
Maximum Power Consumption	70 W
System Interface	PCIe Gen 3.0 x16 \| 32 GB/s PCIe Gen 3.0 x8 also supported
Form Factor	Low-Profile PCIe \| Single Slot
Physical Dimensions	6.61” L x 2.71” H
Compute APIs	CUDA \| NVIDIA TensorRT \| Open CL \| ONYX
Graphics APIs	DirectX 2 \| OpenGL 4.6 \| Vulkan 1.2