How to Move A PyTorch Tensor to the GPU?

Published on Sep 20, 2025

5 min read

Create a tensor on the CPU
Check if GPU is available
You can also specify the GPU device explicitly as follows:
tensor\_gpu = tensor\_cpu.to('cuda:0')
Print the tensor on GPU

How to Move A PyTorch Tensor to the GPU? image

Best GPU-Compatible PyTorch Tools to Buy in December 2025

PyTorch Pocket Reference: Building and Deploying Deep Learning Models

BUY & SAVE

$16.69 $29.99

Save 44%

ONE MORE?

To move a PyTorch tensor to the GPU, you can follow these steps:

First, check if a GPU is available by calling torch.cuda.is_available(). This will return a boolean value indicating whether a GPU is available or not.
If a GPU is available, you can create a CUDA device object using torch.device("cuda"). This sets the device to be the default GPU device.
Next, you can move a PyTorch tensor to the GPU by calling the to() method on the tensor object and passing the CUDA device as an argument. For example, if tensor is your PyTorch tensor, you can move it to the GPU using tensor.to(device).
If you have multiple GPUs available, you can specify a specific GPU device by passing the corresponding GPU index to torch.device("cuda:X"), where X is the index of the desired GPU.

It is important to note that all subsequent operations and computations involving this tensor will be performed on the GPU. To move the tensor back to the CPU, you can use tensor.to("cpu") or tensor.cpu().

Moving tensors to the GPU can significantly speed up computation, especially for deep learning models and large datasets, as GPUs are optimized for parallel computations.

How to move a PyTorch tensor to the GPU?

To move a PyTorch tensor to the GPU, you can use the to method provided by PyTorch. Here's how you can do it:

import torch

Create a tensor on the CPU

tensor_cpu = torch.tensor([1, 2, 3])

Check if GPU is available

if torch.cuda.is_available(): # Move the tensor to the GPU tensor_gpu = tensor_cpu.to('cuda') else: print("GPU is not available.")

You can also specify the GPU device explicitly as follows:

tensor_gpu = tensor_cpu.to('cuda:0')

Print the tensor on GPU

print(tensor_gpu)

The to method takes an argument specifying the device, which can be either 'cuda' for the default GPU or 'cuda:X' for a specific GPU device (where X is the GPU index). You can check if a GPU is available using torch.cuda.is_available().

What is the performance difference between CPU and GPU in PyTorch?

The performance difference between CPU and GPU in PyTorch can be significant depending on the task and the hardware configuration.

Generally, GPUs are specialized hardware designed to handle parallel computations efficiently, which makes them particularly well-suited for performing matrix operations and training deep neural networks. In PyTorch, when computations are executed on a GPU, the parallelism of the GPU architecture allows for faster training and inference times compared to a CPU.

While CPUs are generally capable of performing a wide variety of tasks, they are less efficient at parallel computation compared to GPUs. CPUs are better suited for tasks that require sequential processing or are not massively parallel.

Overall, if you have a compatible GPU, utilizing it for deep learning tasks in PyTorch can substantially speed up the training and inference times compared to running solely on a CPU. The degree of performance difference can vary based on factors such as the specific GPU model, complexity of the model, batch size, and utilization of GPU-accelerated libraries like CUDA.

How to handle out-of-memory errors when moving tensors to the GPU in PyTorch?

When moving tensors to the GPU in PyTorch, you may encounter out-of-memory (OOM) errors due to insufficient GPU memory. Here are some steps you can take to handle such errors:

Reduce the batch size: One common cause of OOM errors is trying to process a large batch of data that exceeds the available GPU memory. By reducing the batch size, you can fit more data into memory. However, note that this may also affect the model's convergence.
Use smaller models: Another approach is to use smaller or more efficient models that require fewer parameters and consume less memory. You can consider using pretrained models or reducing the network's size or complexity.
Free up memory: After each iteration or batch, explicitly delete any intermediate tensors that are no longer required. You can use the del keyword to manually release the memory for unused variables.
Utilize mixed precision training: Enabling mixed precision training can help reduce memory usage. PyTorch's Automatic Mixed Precision (AMP) library allows you to use lower-precision (e.g., half-precision) floating-point numbers for certain parts of the model, while maintaining full precision for others. This can significantly reduce the memory footprint.
Use data parallelism: If you have multiple GPUs available, you can use PyTorch's DataParallel or DistributedDataParallel wrappers to automatically split the batch across multiple GPUs. This can distribute the memory usage and prevent OOM errors.
Use gradient checkpointing: For models with a large memory footprint during backward pass (e.g., models with lots of sequential layers), you can use PyTorch's gradient checkpointing technique. This technique trades off some compute time for reduced memory consumption during backward pass.
Upgrade to a GPU with more memory: If none of the above solutions work, you may need to upgrade to a GPU with larger memory capacity. Alternatively, you can consider using cloud-based services that provide access to GPUs with more memory.

It's important to remember that handling OOM errors requires a trade-off between memory efficiency and model performance, so it's essential to choose a suitable approach according to your specific requirements.