Troubleshooting - GPU Memory Profiler

This guide helps you resolve common issues with GPU Memory Profiler.

Common issues

Import errors

ModuleNotFoundError: No module named 'gpumemprof'

Solution:

# Install the package
pip install -e .

# Or install from PyPI
pip install gpu-memory-profiler

ModuleNotFoundError: No module named 'torch'

Solution:

# Install PyTorch
pip install torch

# Or install with CUDA support
pip install torch --index-url https://download.pytorch.org/whl/cu118

ModuleNotFoundError: No module named 'tensorflow'

Solution:

# Install TensorFlow (GPU support is included automatically)
pip install tensorflow

CUDA issues

CUDA not available

Symptoms:

Error: CUDA not available
Profiler falls back to CPU mode

Solutions:

Check CUDA installation

nvidia-smi
nvcc --version

Verify PyTorch CUDA

import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")

Verify TensorFlow CUDA

import tensorflow as tf
print(f"GPU devices: {tf.config.list_physical_devices('GPU')}")

Install CUDA-compatible versions

# PyTorch with CUDA
pip install torch --index-url https://download.pytorch.org/whl/cu118

# TensorFlow (GPU support is included automatically)
pip install tensorflow

CUDA out of memory

Symptoms:

Error: CUDA out of memory
Training crashes

Solutions:

Reduce batch size
Clear cache
Gradient checkpointing
Monitor memory

# Reduce batch size
dataloader = DataLoader(dataset, batch_size=16)  # Instead of 64

import torch
torch.cuda.empty_cache()

from torch.utils.checkpoint import checkpoint
# Wrap memory-heavy layers with checkpoint()

from gpumemprof import GPUMemoryProfiler

profiler = GPUMemoryProfiler()
profiler.start_monitoring(interval=0.5)

# Your training code here
profiler.stop_monitoring()

Memory leak issues

Memory usage keeps increasing

Symptoms:

Memory usage grows over time
Profiler detects memory leaks

Solutions:

# Ensure tensors are properly deleted
del tensor
torch.cuda.empty_cache()

CLI issues

gpumemprof: command not found

Solution:

# Reinstall the package
pip install -e .

# Check if entry points are installed
pip show gpu-memory-profiler

CLI commands fail

Solutions:

Check Python path

which python
which gpumemprof

Reinstall with entry points

pip uninstall gpu-memory-profiler
pip install -e .

Use Python module directly

python -m gpumemprof.cli info
python -m tfmemprof.cli info

Visualization issues

Plots don't display

Symptoms:

No plots appear
Error: No display name and no $DISPLAY environment variable

Solutions:

Non-interactive backend
Save to files
Use Plotly

import matplotlib
matplotlib.use('Agg')  # Use non-interactive backend

from gpumemprof import MemoryVisualizer

visualizer = MemoryVisualizer(profiler)
visualizer.plot_memory_timeline(interactive=False, save_path='timeline.png')

from gpumemprof import MemoryVisualizer

visualizer = MemoryVisualizer(profiler)
visualizer.export_data(format='json', save_path='dashboard_data')

Dash visualization fails

Symptoms:

Error: ImportError: No module named 'dash'

Solution:

pip install dash

Performance issues

Profiler adds too much overhead

Symptoms:

Training is significantly slower
High CPU usage

Solutions:

Increase sampling interval

profiler = GPUMemoryProfiler()
profiler.start_monitoring(interval=2.0)  # Sample every 2 seconds

Disable visualization

profiler = GPUMemoryProfiler(track_tensors=False)

Selective profiling

# Only profile specific functions
from gpumemprof import profile_function

@profile_function
def critical_function():
    pass

Dependency conflicts

typing_extensions version conflict

Symptoms:

Error with TensorFlow CLI
Version conflicts between packages

Solutions:

Check versions

pip list | grep typing

Install compatible version

pip install typing-extensions==4.5.0

Use virtual environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -e .

Platform-specific issues

macOS: CUDA not available

Solution:

CUDA is not available on macOS. Use CPU mode or MPS (Metal Performance Shaders) instead.

# Install PyTorch with MPS support
pip install torch torchvision

macOS: TensorFlow issues on Apple Silicon

Solution:

# Install TensorFlow (Apple Silicon is supported natively since TF 2.13)
pip install tensorflow

# For Metal GPU acceleration, also install:
pip install tensorflow-metal

Windows: Path issues

Solution:

# Use forward slashes or raw strings
python -m gpumemprof.cli info

Windows: Permission issues

Solution:

# Run as administrator or use --user flag
pip install --user -e .

Debug mode

Enable debug logging

import logging
logging.basicConfig(level=logging.DEBUG)

from gpumemprof import GPUMemoryProfiler
profiler = GPUMemoryProfiler()

Verbose CLI output

# Use detailed/system output commands
gpumemprof info --detailed
gpumemprof monitor --duration 10

Check system information

CLI
Python

# Quickest way to check environment health
gpumemprof info --detailed
tfmemprof info

from gpumemprof import get_gpu_info
info = get_gpu_info()  # Returns GPU details, or {"error": ...} on non-CUDA hosts
print(info)

Getting help

Before asking for help

Check the documentation

Run diagnostics

gpumemprof info --detailed
gpumemprof diagnose --duration 0 --output ./diag_bundle
tfmemprof info
tfmemprof diagnose --duration 0 --output ./tf_diag_bundle

Test with minimal example

from gpumemprof import GPUMemoryProfiler
import torch

profiler = GPUMemoryProfiler()

def test():
    return torch.randn(100, 100).cuda()

profile = profiler.profile_function(test)
summary = profiler.get_summary()
print(profile.to_dict())
print(summary)

Reporting issues

When reporting issues, include:

System information

OS and version
Python version
PyTorch/TensorFlow versions
CUDA version (if applicable)

Error messages

Full error traceback
Any warning messages

Reproduction steps

Minimal code example
Expected vs actual behavior

Environment

Virtual environment details
Package versions (pip freeze)

Community support

GitHub Issues

Report bugs and request features

Documentation

Browse the complete documentation

Examples

Check out example code

​Common issues

​Import errors

​CUDA issues

​Memory leak issues

​CLI issues

​Visualization issues

​Performance issues

Increase sampling interval

Disable visualization

Selective profiling

​Dependency conflicts

​Platform-specific issues

​Debug mode

​Enable debug logging

​Verbose CLI output

​Check system information

​Getting help

​Before asking for help

​Reporting issues

System information

Error messages

Reproduction steps

Environment

​Community support

GitHub Issues

Documentation

Examples

​See also

Common issues

Import errors

CUDA issues

Memory leak issues

CLI issues

Visualization issues

Performance issues

Dependency conflicts

Platform-specific issues

Debug mode

Enable debug logging

Verbose CLI output

Check system information

Getting help

Before asking for help

Reporting issues

Community support

See also