好几次遇到问为什么安装的 tensorflow 不能调用GPU,之前搞定过几次,前两天又有人问,又捣鼓了很久才搞定,这里简单记录一下我遇到的问题,以及解决方案。

一、安装方法

(一)安装并更新 conda

1.安装 conda

        安装 conda 很重要,使用 pip 安装 tensorflow-gpu 太多问题了(这里默认已经安装了conda)。

2.更新 conda

conda update -n base -c defaults conda --repodata-fn=repodata.json

         之前根据百度,都是执行:

conda update -n base -c defaults conda

        然后,首先该命令无法更新到最新的 conda;其次,我们在使用 conda -V 查看版本时,conda 版本显示错误。

        该解决方案来自于 GitHub:I got update warning message but unable to update · Issue #12519 · conda/conda · GitHubhttps://github.com/conda/conda/issues/12519

        将 conda 的 base 更新到最新,我觉得原因是能够同步最新的包依赖关系,过时的版本可能导致依赖出问题。

(二)创建环境

1.创建环境

conda create -n TensorFlow2.4 python=3.9

         当然,这里可以根据自己的 CUDA 版本选择对应的 tensorflow 版本,我的 CUDA 版本为 11.3 :

(Tensorflow2.4) name@eclab:~$ nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2021 NVIDIA Corporation

Built on Sun_Mar_21_19:15:46_PDT_2021

Cuda compilation tools, release 11.3, V11.3.58

Build cuda_11.3.r11.3/compiler.29745058_0

         不显示的话,可以自行进一步搜索为什么 nvcc -V 不显示:

Ubuntu20.04LTS系统CUDA已经安装但nvcc -V显示command not found_nvcc -v 提示未找到命令_AISecurity盐究员的博客-CSDN博客安装了NVIDIA驱动程序,同时也安装了CUDA,但使用nvcc -V使用nvcc -V命令可以查看CUDA的版本,如下所示为正常的输入、输出内容,可以看出通过nvcc -V命令,可以看到目前所使用的CUDA版本。_nvcc -v 提示未找到命令https://blog.csdn.net/m0_38068876/article/details/127836484        注 .bashrc 文件添加环境变量时,需要根据 /usr/local/ 下的 cuda实际情况进行修改,这里展示我的情况:

(Tensorflow2.4) name@eclab:~$ cd /usr/local/

(Tensorflow2.4) name@eclab:/usr/local$ ls

bin cuda-11.3 games lib sbin src

cuda etc include man share sunlogin

 这里有 cuda 软链接,链接到 cuda-11.3,所以建议使用下面命令:

# cuda-11.3

export PATH=/usr/local/cuda-11.3/bin${PATH:+:${PATH}}

export LD_LIBRARY_PATH=/usr/local/cuda-11.3/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

(三)安装 tensorflow-gpu

1.安装

        激活环境:

conda activate TensorFlow2.4

         安装 tensorflow-gpu:

conda install tensorflow-gpu

         注:不要使用pip安装!不要使用pip安装!不要使用pip安装!

         这里没有选择 tensorflow-gpu 版本,conda 自动下载了 tensorflow-gpu==2.4.1 (版本对应可以查看 Build from source  |  TensorFlow)。

2.测试

        执行如下两个命令即可:

(Tensorflow2.4) name@eclab:/usr/local$ python

Python 3.9.17 (main, Jul 5 2023, 20:41:20)

[GCC 11.2.0] :: Anaconda, Inc. on linux

Type "help", "copyright", "credits" or "license" for more information.

>>> import tensorflow as tf

2023-07-10 10:22:16.571135: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1

>>> tf.config.list_physical_devices('GPU')

2023-07-10 10:22:27.565493: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set

2023-07-10 10:22:27.567453: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1

2023-07-10 10:22:27.611185: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:

pciBusID: 0000:02:00.0 name: NVIDIA TITAN X (Pascal) computeCapability: 6.1

coreClock: 1.531GHz coreCount: 28 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 447.48GiB/s

2023-07-10 10:22:27.612680: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 1 with properties:

pciBusID: 0000:03:00.0 name: NVIDIA TITAN X (Pascal) computeCapability: 6.1

coreClock: 1.531GHz coreCount: 28 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 447.48GiB/s

2023-07-10 10:22:27.613857: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 2 with properties:

pciBusID: 0000:82:00.0 name: NVIDIA TITAN X (Pascal) computeCapability: 6.1

coreClock: 1.531GHz coreCount: 28 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 447.48GiB/s

2023-07-10 10:22:27.614783: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 3 with properties:

pciBusID: 0000:83:00.0 name: NVIDIA TITAN X (Pascal) computeCapability: 6.1

coreClock: 1.531GHz coreCount: 28 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 447.48GiB/s

2023-07-10 10:22:27.614821: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1

2023-07-10 10:22:27.617316: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10

2023-07-10 10:22:27.617370: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10

2023-07-10 10:22:27.619509: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10

2023-07-10 10:22:27.619882: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10

2023-07-10 10:22:27.622449: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10

2023-07-10 10:22:27.623913: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10

2023-07-10 10:22:27.629319: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7

2023-07-10 10:22:27.644606: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0, 1, 2, 3

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:3', device_type='GPU')]

(四)使用 pip 安装的问题

         下面演示使用 pip 安装的话存在的问题。

(base) name@eclab:~$ conda create -n Tensorflow-err python=3.9

(Tensorflow-err) name@eclab:~$ pip install tensorflow-gpu==2.4.1

ERROR: Could not find a version that satisfies the requirement tensorflow-gpu==2.4.1 (from versions: 2.5.0, 2.5.1, 2.5.2, 2.5.3, 2.6.0, 2.6.1, 2.6.2, 2.6.3, 2.6.4, 2.6.5, 2.7.0rc0, 2.7.0rc1, 2.7.0, 2.7.1, 2.7.2, 2.7.3, 2.7.4, 2.8.0rc0, 2.8.0rc1, 2.8.0, 2.8.1, 2.8.2, 2.8.3, 2.8.4, 2.9.0rc0, 2.9.0rc1, 2.9.0rc2, 2.9.0, 2.9.1, 2.9.2, 2.9.3, 2.10.0rc0, 2.10.0rc1, 2.10.0rc2, 2.10.0rc3, 2.10.0, 2.10.1, 2.11.0rc0, 2.11.0rc1, 2.11.0rc2, 2.11.0, 2.12.0)

ERROR: No matching distribution found for tensorflow-gpu==2.4.1

        首先安装不了2.4.1,根据提示,选择安装2.5.0;

pip install tensorflow-gpu==2.5

        使用步骤(三)中的 2.测试方法:

(Tensorflow-err) name@eclab:~$ python

Python 3.9.17 (main, Jul 5 2023, 20:41:20)

[GCC 11.2.0] :: Anaconda, Inc. on linux

Type "help", "copyright", "credits" or "license" for more information.

>>> import tensorflow as tf

2023-07-10 10:38:37.238756: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

>>> tf.config.list_physical_devices('GPU')

2023-07-10 10:38:40.413250: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1

2023-07-10 10:38:40.456066: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:

pciBusID: 0000:02:00.0 name: NVIDIA TITAN X (Pascal) computeCapability: 6.1

coreClock: 1.531GHz coreCount: 28 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 447.48GiB/s

2023-07-10 10:38:40.457549: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:

pciBusID: 0000:03:00.0 name: NVIDIA TITAN X (Pascal) computeCapability: 6.1

coreClock: 1.531GHz coreCount: 28 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 447.48GiB/s

2023-07-10 10:38:40.458707: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 2 with properties:

pciBusID: 0000:82:00.0 name: NVIDIA TITAN X (Pascal) computeCapability: 6.1

coreClock: 1.531GHz coreCount: 28 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 447.48GiB/s

2023-07-10 10:38:40.459651: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 3 with properties:

pciBusID: 0000:83:00.0 name: NVIDIA TITAN X (Pascal) computeCapability: 6.1

coreClock: 1.531GHz coreCount: 28 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 447.48GiB/s

2023-07-10 10:38:40.459700: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

2023-07-10 10:38:40.464266: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11

2023-07-10 10:38:40.464333: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11

2023-07-10 10:38:40.465775: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10

2023-07-10 10:38:40.466117: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10

2023-07-10 10:38:40.467045: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11

2023-07-10 10:38:40.468303: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11

2023-07-10 10:38:40.468555: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.3/lib64

2023-07-10 10:38:40.468578: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1766] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.

Skipping registering GPU devices...

[]

        错误内容:

tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.3/lib64

精彩内容

评论可见,请评论后查看内容,谢谢!!!
 您阅读本篇文章共花了: