小編給大家分享一下Tensorflow2對GPU內存進行分配的示例分析,相信大部分人都還不怎么了解,因此分享這篇文章給大家參考一下,希望大家閱讀完這篇文章后大有收獲,下面讓我們一起去了解一下吧!
從以下的異常堆??梢钥吹绞荁LAS程序集初始化失敗,可以看到是執行MatMul的時候發生的異常,基本可以斷定可能數據集太大導致memory不夠用了。
2021-08-10 16:38:04.917501: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED 2021-08-10 16:38:04.960048: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED 2021-08-10 16:38:04.986898: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED 2021-08-10 16:38:04.992366: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED 2021-08-10 16:38:04.992389: W tensorflow/stream_executor/stream.cc:1455] attempting to perform BLAS operation using StreamExecutor without BLAS support Traceback (most recent call last): File "/home/mango/PycharmProjects/DeepLearing/minist_conv.py", line 32, in <module> model.fit(train_images, train_labels, epochs=5, batch_size=64) File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/keras/engine/training.py", line 1183, in fit tmp_logs = self.train_function(iterator) File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/def_function.py", line 889, in __call__ result = self._call(*args, **kwds) File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/def_function.py", line 950, in _call return self._stateless_fn(*args, **kwds) File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/function.py", line 3023, in __call__ return graph_function._call_flat( File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/function.py", line 1960, in _call_flat return self._build_call_outputs(self._inference_function.call( File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/function.py", line 591, in call outputs = execute.execute( File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, tensorflow.python.framework.errors_impl.InternalError: Blas xGEMM launch failed : a.shape=[1,64,576], b.shape=[1,576,64], m=64, n=64, k=576 [[node sequential/dense/MatMul (defined at home/mango/PycharmProjects/DeepLearing/minist_conv.py:32) ]] [Op:__inference_train_function_993] Function call stack: train_function
mango@mango-ubuntu:~$ /usr/local/cuda/bin/nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Wed_Jul_14_19:41:19_PDT_2021 Cuda~~ compilation tools, release 11.4, V11.4.100== Build cuda_11.4.r11.4/compiler.30188945_0 mango@mango-ubuntu:~$ tail -n 10 /usr/include/cudnn_version.h #ifndef CUDNN_VERSION_H_ #define CUDNN_VERSION_H_ #define CUDNN_MAJOR 8 #define CUDNN_MINOR 2 #define CUDNN_PATCHLEVEL 2 #define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL) #endif /* CUDNN_VERSION_H */ mango@mango-ubuntu:~$ python3 --version Python 3.9.5 mango@mango-ubuntu:~$ nvidia-smi Tue Aug 10 19:57:58 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A | | N/A 54C P0 N/A / N/A | 329MiB / 2002MiB | 9% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1818 G /usr/lib/xorg/Xorg 186MiB | | 0 N/A N/A 2002 G /usr/bin/gnome-shell 45MiB | | 0 N/A N/A 3435 G ...AAAAAAAAA= --shared-files 75MiB | | 0 N/A N/A 6016 G python3 13MiB | +-----------------------------------------------------------------------------+ mango@mango-ubuntu:~$ python3 Python 3.9.5 (default, May 11 2021, 08:20:37) [GCC 10.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow as tf 2021-08-10 18:33:05.917520: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 >>> tf.__version__ '2.5.0' >>>
By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES) visible to the process. This is done to more efficiently use the relatively precious GPU memory resources on the devices by reducing memory fragmentation.
默認情況下,為了通過減少內存碎片更有效地利用設備上相對寶貴的GPU內存資源,TensorFlow進程會使用所有可見的GPU。
In some cases it is desirable for the process to only allocate a subset of the available memory, or to only grow the memory usage as is needed by the process. TensorFlow provides two methods to control this.
在某些情況下,進程只分配可用內存的一個子集,或者只根據進程的需要增加內存使用量。TensorFlow提供了兩種方法來控制這種情況。
The first option is to turn on memory growth by calling tf.config.experimental.set_memory_growth, which attempts to allocate only as much GPU memory as needed for the runtime allocations: it starts out allocating very little memory, and as the program gets run and more GPU memory is needed, the GPU memory region is extended for the TensorFlow process. Memory is not released since it can lead to memory fragmentation. To turn on memory growth for a specific GPU, use the following code prior to allocating any tensors or executing any ops.
第一種選擇是通過調用tf.config.experimental.set_memory_growth來打開內存增長,它嘗試只分配運行時所需的GPU內存:它開始分配很少的內存,當程序運行時需要更多的GPU內存時,GPU內存區域會進一步擴展增大。內存不會被釋放,因為這會導致內存碎片。為了打開特定GPU的內存增長,在分配任何張量或執行任何操作之前,使用以下代碼。
gpus = tf.config.list_physical_devices('GPU') if gpus: try: # Currently, memory growth needs to be the same across GPUs for gpu in gpus: tf.config.experimental.set_memory_growth(gpu, True) logical_gpus = tf.config.list_logical_devices('GPU') print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs") except RuntimeError as e: # Memory growth must be set before GPUs have been initialized print(e)
Another way to enable this option is to set the environmental variable TF_FORCE_GPU_ALLOW_GROWTH to true. This configuration is platform specific.
啟用該選項的另一種方法是將環境變量TF_FORCE_GPU_ALLOW_GROWTH設置為true。此配置是特定于平臺的。
The second method is to configure a virtual GPU device with tf.config.experimental.set_virtual_device_configuration and set a hard limit on the total memory to allocate on the GPU.
This is useful if you want to truly bound the amount of GPU memory available to the TensorFlow process. This is common practice for local development when the GPU is shared with other applications such as a workstation GUI.
第二種方法是使用tf.config.experimental.set_virtual_device_configuration配置虛擬GPU設備,并設置GPU上可分配的總內存的硬限制。
如果你想真正將GPU內存的數量綁定到TensorFlow進程中,這是非常有用的。當GPU與其他應用程序(如工作站GUI)共享時,這是本地開發的常見做法。
gpus = tf.config.list_physical_devices('GPU') if gpus: # Restrict TensorFlow to only allocate 1GB of memory on the first GPU try: tf.config.set_logical_device_configuration( gpus[0], [tf.config.LogicalDeviceConfiguration(memory_limit=1024)]) logical_gpus = tf.config.list_logical_devices('GPU') print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs") except RuntimeError as e: # Virtual devices must be set before GPUs have been initialized print(e)
通過上邊對TensorFlow文檔的分析,默認情況下會占用所有的GPU內存,但是TensorFlow提供了兩種方式可以靈活的控制內存的分配策略;
我們可以直接設置GPU內存按需動態分配
import tensorflow as tf physical_gpus = tf.config.list_physical_devices('GPU') tf.config.experimental.set_memory_growth(physical_gpus[0], True)
通過以下命令可以看到執行過程中GPU內存的占用最高為697M
mango@mango-ubuntu:~$ while true; do nvidia-smi; sleep 0.2; done; Tue Aug 10 20:30:58 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A | | N/A 58C P0 N/A / N/A | 1026MiB / 2002MiB | 72% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1818 G /usr/lib/xorg/Xorg 186MiB | | 0 N/A N/A 2002 G /usr/bin/gnome-shell 45MiB | | 0 N/A N/A 3435 G ...AAAAAAAAA= --shared-files 73MiB | | 0 N/A N/A 6016 G python3 13MiB | | 0 N/A N/A 13829 C /usr/bin/python3.9 697MiB | +-----------------------------------------------------------------------------+
我們也可以限制最多使用1024M的GPU內存
import tensorflow as tf physical_gpus = tf.config.list_physical_devices('GPU') tf.config.set_logical_device_configuration(physical_gpus[0], [tf.config.LogicalDeviceConfiguration(memory_limit=1024)])
同樣通過命令可以看到執行過程中GPU內存的占用最高為1455M
mango@mango-ubuntu:~$ while true; do nvidia-smi; sleep 0.2; done; Tue Aug 10 20:31:24 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A | | N/A 58C P0 N/A / N/A | 1784MiB / 2002MiB | 74% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1818 G /usr/lib/xorg/Xorg 186MiB | | 0 N/A N/A 2002 G /usr/bin/gnome-shell 46MiB | | 0 N/A N/A 3435 G ...AAAAAAAAA= --shared-files 72MiB | | 0 N/A N/A 6016 G python3 13MiB | | 0 N/A N/A 13570 C /usr/bin/python3.9 1455MiB | +-----------------------------------------------------------------------------+
通過四中的測試結果可得
默認的分配策略會占用所有的內存,并且執行中不會進行釋放,如果訓練數據量比較打很容易內存不夠用;
限制最大使用內存,測試占用內存比設置的大,這個可能跟訓練中間使用的模型和操作的復雜程度有關系,需要根據具體的業務場景設置合適的值;但是要注意不能設置大了,否則還是會報錯,但是設置小了只是執行的慢一些罷了;
設置內存按需分配可能是一個相對比較中庸的方案,感覺可能是一個更好的方案,不知道TensorFlow為什么沒有設置為默認值,留作一個問題,后續有新的認知的話再補充;
單GPU模擬多GPU環境
當我們的本地開發環境只有一個GPU,但卻需要編寫多GPU的程序在工作站上進行訓練任務時,TensorFlow為我們提供了一個方便的功能,可以讓我們在本地開發環境中建立多個模擬GPU,從而讓多GPU的程序調試變得更加方便。以下代碼在實體GPU GPU:0 的基礎上建立了兩個顯存均為2GB的虛擬GPU。
gpus = tf.config.list_physical_devices('GPU') if gpus: # Create 2 virtual GPUs with 1GB memory each try: tf.config.set_logical_device_configuration( gpus[0], [tf.config.LogicalDeviceConfiguration(memory_limit=1024), tf.config.LogicalDeviceConfiguration(memory_limit=1024)]) logical_gpus = tf.config.list_logical_devices('GPU') print(len(gpus), "Physical GPU,", len(logical_gpus), "Logical GPUs") except RuntimeError as e: # Virtual devices must be set before GPUs have been initialized print(e)
多GPU的數據并行
使用 tf.distribute.Strategy可以將模型拷貝到每個GPU上,然后將訓練數據分批在不同的GPU上執行,達到數據并行。
tf.debugging.set_log_device_placement(True) gpus = tf.config.list_logical_devices('GPU') strategy = tf.distribute.MirroredStrategy(gpus) with strategy.scope(): inputs = tf.keras.layers.Input(shape=(1,)) predictions = tf.keras.layers.Dense(1)(inputs) model = tf.keras.models.Model(inputs=inputs, outputs=predictions) model.compile(loss='mse', optimizer=tf.keras.optimizers.SGD(learning_rate=0.2))
以上是“Tensorflow2對GPU內存進行分配的示例分析”這篇文章的所有內容,感謝各位的閱讀!相信大家都有了一定的了解,希望分享的內容對大家有所幫助,如果還想學習更多知識,歡迎關注億速云行業資訊頻道!
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。