GPUを実装したwindows PCにTensorFlowをインストール

「Raspberry Pi 3へのTensorFlowのインストール」でRaspberry PiにTensorFlowをインストールしましたが、今回は、GPUを実装したwindows PCにTensorFlowをインストールします。

開発環境

Windows10
64bit版Python 3.8.2rc2
pip 19.2.3
TensorFlow 2.6.0
CUDA 11.4
cuDNN v8.2.4
NVIDIA GeForce GT 1030 グラフィックボード:CUDAコア数- 384基

64ビット版Python 3.8のインストール

TensorFlow 2.ｘは64ビット版Pythonに対応しています。python.orgのダウンロードページ「Python Releases for Windows」から，Windows版64ビット版Python3.8 executable installerをダウンロードします。

Python 3.6～3.9、pip、venv が 19.0 以降が必要です。Windows 用の Python 3 リリース、64 ビット版（オプション機能として pip を選択）をインストールします。
次のコマンドでバージョンを確認します。
> python --version
Python 3.8.2rc2
> pip3 --version
pip 19.2.3 from c:\users\ne\appdata\local\programs\python\python38\lib\site-packages\pip (python 3.8)

TensorFlow 2.8のインストール

Python 3.8 GPU サポートのTensorFlow 2.6は次のURLからダウンロードします。詳細については「pip での TensorFlow のインストール」を参照してください。

Python 3.8 GPU サポート：https://storage.googleapis.com/tensorflow/windows/gpu/tensorflow_gpu-2.6.0-cp38-cp38-win_amd64.whl

WindowsのTensorFlow の Python パッケージの URL 一覧を次に示します。

次のコマンドでWindowsにTensorFlow をインストールします。

>pip3 install --upgrade https://storage.googleapis.com/tensorflow/windows/gpu/tensorflow_gpu-2.6.0-cp38-cp38-win_amd64.whl

エラー1： 32bit版Python 3.8がインストールされていると次のように「whl is not a supported wheel on this platform」が発生

ERROR: tensorflow_gpu-2.6.0-cp38-cp38-win_amd64.whl is not a supported wheel on this platform.

pipに対応していないバージョンのwhlはインストールができません。次のコマンドで、pipが対応しているcpの一覧を確認できます（64bit版Python 3.8で実行）。

>python
Python 3.8.2rc2 (tags/v3.8.2rc2:777ba07, Feb 18 2020, 09:11:15) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from pip._internal.utils.compatibility_tags import get_supported
Traceback (most recent call last):
  File "", line 1, in 
ModuleNotFoundError: No module named 'pip._internal.utils.compatibility_tags'
>>> from pip._internal.pep425tags import get_supported
>>> get_supported()
[('cp38', 'cp38', 'win_amd64'), ('cp38', 'none', 'win_amd64'), ('py3', 'none', 'win_amd64'), ('cp38', 'none', 'any'), ('cp3', 'none', 'any'), ('py38', 'none', 'any'), ('py3', 'none', 'any'), ('py37', 'none', 'any'), ('py36', 'none', 'any'), ('py35', 'none', 'any'), ('py34', 'none', 'any'), ('py33', 'none', 'any'), ('py32', 'none', 'any'), ('py31', 'none', 'any'), ('py30', 'none', 'any')]
>>>

エラー2： Microsoft Visual C++ 再頒布可能パッケージがインストールされていないと次のように「Could not find the DLL(s) ‘msvcp140_1.dll’.」が発生

>python -c "import tensorflow as tf;print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
    ・・・
ImportError: Could not find the DLL(s) 'msvcp140_1.dll'. TensorFlow requires that these DLLs be installed in a directory that is named in your %PATH% environment variable. You may install these DLLs by downloading "Microsoft C++ Redistributable for Visual Studio 2015, 2017 and 2019" for your platform from this URL: https://support.microsoft.com/help/2977003/the-latest-supported-visual-c-downloads

TensorFlow 2.1.0 以降のバージョンでは、Microsoft Visual C++ 再頒布可能パッケージに含まれる msvcp140_1.dll ファイルが必要です。

CUDA/cuDNNのインストール

NVIDIA CUDAは、「CUDA Toolkit 11.4」から次のように自身の環境に合わせてCUDA Toolkitをダウンロードします。

次のようにインストールオプションを設定します。

NVIDIA cuDNNは、「NVIDIA cuDNN」から次のようにcuDNN をダウンロードします。

ダウンロードしたファイルを展開して、展開したフォルダ内の「bin / include / lib」フォルダを「C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4」の同じフォルダ名に移動します（上書きコピー）。

コマンドプロンプトで次のコマンドを実行し、パスが表示されることを確認します。

where nvcc
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\bin\nvcc.exe

エラー1：CUDAがインストールされていないと次のように「cudart64_110.dll not found」が発生

>python -c "import tensorflow as tf;print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
2021-09-25 05:49:26.264775: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2021-09-25 05:49:26.265407: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
    ・・・

エラー2：cuDNNがインストールされていないと次のように「cudnn64_8.dll not found」が発生

>python -c "import tensorflow as tf;print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
2021-09-25 17:40:55.805098: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudnn64_8.dll'; dlerror: cudnn64_8.dll not found
    ・・・

TensorFlow の動作確認

次のPythonスクリプトでTensorFlow の動作を確認します。

>python -c "import tensorflow as tf;print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
2021-09-26 04:41:41.924383: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-09-26 04:41:42.450589: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1335 MB memory:  -> device: 0, name: NVIDIA GeForce GT 1030, pci bus id: 0000:01:00.0, compute capability: 6.1
tf.Tensor(-40.827362, shape=(), dtype=float32)

次のPythonによりTensorFlowのインストールの確認スクリプトを作成します。。

import tensorflow as tf
mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)

次のコマンドを実行して動作を確認します。

>cd C:\Users\ne\Desktop\
>python TensorFlowtest.py
2021-09-26 04:40:34.658391: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-09-26 04:40:39.892682: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1335 MB memory:  -> device: 0, name: NVIDIA GeForce GT 1030, pci bus id: 0000:01:00.0, compute capability: 6.1
2021-09-26 04:40:40.320652: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
Epoch 1/5
1875/1875 [==============================] - 7s 2ms/step - loss: 0.2943 - accuracy: 0.9137
Epoch 2/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.1418 - accuracy: 0.9571
Epoch 3/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.1042 - accuracy: 0.9688
Epoch 4/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0886 - accuracy: 0.9724
Epoch 5/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0752 - accuracy: 0.9759
313/313 [==============================] - 1s 2ms/step - loss: 0.0776 - accuracy: 0.9769