其他
【他山之石】编译PyTorch静态库
“他山之石,可以攻玉”,站在巨人的肩膀才能看得更高,走得更远。在科研的道路上,更需借助东风才能更快前行。为此,我们特别搜集整理了一些实用的代码链接,数据集,软件,编程技巧等,开辟“他山之石”专栏,助你乘风破浪,一路奋勇向前,敬请关注。
地址:https://www.zhihu.com/people/gemfield
01
静态编译相关的文档不全; CMake文件bug太多,其整体结构比较糟糕。
最小尺度CPU版本; 最小尺度CUDA版本。
MKL,Intel Math Kernel Library。用于Intel处理器的(注意和MKL-DNN库的区别,MKL-DNN是英特尔的一个独立的神经网络库:MKL for Deep Neural Networks); ATLAS,Automatically Tuned Linear Algebra Software,多平台的; OpenBLAS,多平台的; Accelerate,苹果macOS、iOS平台上的; Eigen,这个库只有头文件,实现了BLAS和一部分LAPACK,集成到了PyTorch项目的thirdparty下; cuBLAS,NVIDIA的BLAS实现,基于NVIDIA的GPU硬件(此外还有cuFFT、cuRAND, cuSPARSE等); MAGMA,基于CUDA、HIP、 Intel Xeon Phi、OpenCL的BLAS/LAPACK实现;
02
默认值; 用户指定; 通过检测系统环境获得; 通过检测软件包的安装情况获得; 通过开关与开关之间的逻辑关系推导而来;
CMakeLists.txt中要添加的编译单元; 编译器、链接器的命令行参数; 代码中的ifdef宏;
ATEN_NO_TEST,是否编译ATen test binaries; BUILD_BINARY,Build C++ binaries; BUILD_DOCS,Build Caffe2 documentation; BUILD_CAFFE2_MOBILE,Build libcaffe2 for mobile,也就是在libcaffe2和libtorch mobile中选择,目前已经废弃,默认使用libtorch mobile; CAFFE2_USE_MSVC_STATIC_RUNTIME,Using MSVC static runtime libraries; BUILD_TEST,Build C++ test binaries (need gtest and gbenchmark); BUILD_STATIC_RUNTIME_BENCHMARK,Build C++ binaries for static runtime benchmarks (need gbenchmark); BUILD_TENSOREXPR_BENCHMARK,Build C++ binaries for tensorexpr benchmarks (need gbenchmark); BUILD_MOBILE_BENCHMARK,Build C++ test binaries for mobile (ARM) targets(need gtest and gbenchmark); BUILD_MOBILE_TEST,Build C++ test binaries for mobile (ARM) targets(need gtest and gbenchmark); BUILD_JNI,Build JNI bindings; BUILD_MOBILE_AUTOGRAD,Build autograd function in mobile build (正在开发中); INSTALL_TEST,Install test binaries if BUILD_TEST is on; USE_CPP_CODE_COVERAGE,Compile C/C++ with code coverage flags; USE_ASAN,Use Address Sanitizer; USE_TSAN,Use Thread Sanitizer; CAFFE2_STATIC_LINK_CUDA,Statically link CUDA libraries; USE_STATIC_CUDNN,Use cuDNN static libraries; USE_KINETO,Use Kineto profiling library; USE_FAKELOWP,Use FakeLowp operators; USE_FFMPEG; USE_GFLAGS; USE_GLOG; USE_LEVELDB; USE_LITE_PROTO,Use lite protobuf instead of full; USE_LMDB; USE_PYTORCH_METAL,Use Metal for PyTorch iOS build; USE_NATIVE_ARCH,Use -march=native; USE_STATIC_NCCL; USE_SYSTEM_NCCL,Use system-wide NCCL; USE_NNAPI; USE_NVRTC,Use NVRTC. Only available if USE_CUDA is on; USE_OBSERVERS,Use observers module; USE_OPENCL; USE_OPENCV; USE_PROF,Use profiling; USE_REDIS; USE_ROCKSDB; USE_SNPE,使用高通的神经网络引擎; USE_SYSTEM_EIGEN_INSTALL,Use system Eigen instead of the one under third_party; USE_TENSORRT,Using Nvidia TensorRT library; USE_VULKAN,Use Vulkan GPU backend; USE_VULKAN_API,Use Vulkan GPU backend v2; USE_VULKAN_SHADERC_RUNTIME,Use Vulkan Shader compilation runtime(Needs shaderc lib); USE_VULKAN_RELAXED_PRECISION,Use Vulkan relaxed precision(mediump); USE_ZMQ; USE_ZSTD; USE_MKLDNN_CBLAS,Use CBLAS in MKLDNN; USE_TBB; HAVE_SOVERSION,Whether to add SOVERSION to the shared objects; USE_SYSTEM_LIBS,Use all available system-provided libraries; USE_SYSTEM_CPUINFO,Use system-provided cpuinfo; USE_SYSTEM_SLEEF,Use system-provided sleef; USE_SYSTEM_GLOO,Use system-provided gloo; USE_SYSTEM_FP16,Use system-provided fp16; USE_SYSTEM_PTHREADPOOL,Use system-provided pthreadpool; USE_SYSTEM_PSIMD,Use system-provided psimd; USE_SYSTEM_FXDIV,Use system-provided fxdiv; USE_SYSTEM_BENCHMARK,Use system-provided google benchmark; USE_SYSTEM_ONNX,Use system-provided onnx; USE_SYSTEM_XNNPACK,Use system-provided xnnpack。
BUILD_CUSTOM_PROTOBUF,Build and use Caffe2's own protobuf under third_party; BUILD_PYTHON,Build Python binaries; BUILD_CAFFE2,Master flag to build Caffe2; BUILD_CAFFE2_OPS,Build Caffe2 operators; BUILD_SHARED_LIBS,Build libcaffe2.so; CAFFE2_LINK_LOCAL_PROTOBUF,If set, build protobuf inside libcaffe2.so; COLORIZE_OUTPUT,Colorize output during compilation; USE_CUDA; USE_CUDNN; USE_ROCM; USE_FBGEMM,Use FBGEMM (quantized 8-bit server operators); USE_METAL,Use Metal for Caffe2 iOS build; USE_NCCL,须在UNIX上,且USE_CUDA 或USE_ROCM是打开的; USE_NNPACK; USE_NUMPY; USE_OPENMP,Use OpenMP for parallel code; USE_QNNPACK;Use QNNPACK (quantized 8-bit operators); USE_PYTORCH_QNNPACK,Use ATen/QNNPACK (quantized 8-bit operators); USE_VULKAN_WRAPPER,Use Vulkan wrapper; USE_XNNPACK, USE_DISTRIBUTED; USE_MPI,Use MPI for Caffe2. Only available if USE_DISTRIBUTED is on; USE_GLOO,Only available if USE_DISTRIBUTED is on; USE_TENSORPIPE,Only available if USE_DISTRIBUTED is on; ONNX_ML,Enable traditional ONNX ML API; USE_NUMA; USE_VALGRIND; USE_MKLDNN; BUILDING_WITH_TORCH_LIBS,Tell cmake if Caffe2 is being built alongside torch libs。
SELECTED_OP_LIST,Path to the yaml file that contains the list of operators to include for custom build. Include all operators by default; OP_DEPENDENCY,Path to the yaml file that contains the op dependency graph for custom build。
03
USE_DISTRIBUTED,如果不是Linux/Win32,则关闭; USE_LIBUV,macOS上,且手工打开USE_DISTRIBUTED,则打开; USE_NUMA,如果不是Linux,则关闭; USE_VALGRIND,如果不是Linux,则关闭; USE_TENSORPIPE,如果是Windows,则关闭; USE_KINETO,如果是windows,则关闭; 如果是构建Android、iOS等移动平台上的libtorch,则:
set(BUILD_PYTHON OFF)
set(BUILD_CAFFE2_OPS OFF)
set(USE_DISTRIBUTED OFF)
set(FEATURE_TORCH_MOBILE ON)
set(NO_API ON)
set(USE_FBGEMM OFF)
set(USE_QNNPACK OFF)
set(INTERN_DISABLE_ONNX ON)
set(INTERN_USE_EIGEN_BLAS ON)
set(INTERN_DISABLE_MOBILE_INTERP ON)
USE_MKLDNN,如果不是64位x86_64,则关闭; USE_FBGEMM,如果不是64位x86_64,则关闭;如果不支持AVX512指令集,则关闭; USE_KINETO,如果是手机平台,则关闭; USE_GLOO,如果不是64位x86_64,则关闭;
USE_DISTRIBUTED,在Windows上,如果找不到libuv,则关闭; USE_GLOO,在Windows上,如果找不到libuv,则关闭; USE_KINETO,如果没有USE_CUDA,则关闭; MKL相关,不再赘述; NNPACK家族相关的((QNNPACK, PYTORCH_QNNPACK, XNNPACK) ),不再赘述; USE_BLAS,会被相关依赖修正; USE_PTHREADPOOL,会被相关依赖修正; USE_LAPACK,如果LAPACK包不能被找到,则关闭;且运行时会导致出错:“gels : Lapack library not found in compile time”;
如果手工打开了USE_SYSTEM_LIBS,则:
set(USE_SYSTEM_CPUINFO ON)
set(USE_SYSTEM_SLEEF ON)
set(USE_SYSTEM_GLOO ON)
set(BUILD_CUSTOM_PROTOBUF OFF)
set(USE_SYSTEM_EIGEN_INSTALL ON)
set(USE_SYSTEM_FP16 ON)
set(USE_SYSTEM_PTHREADPOOL ON)
set(USE_SYSTEM_PSIMD ON)
set(USE_SYSTEM_FXDIV ON)
set(USE_SYSTEM_BENCHMARK ON)
set(USE_SYSTEM_ONNX ON)
set(USE_SYSTEM_XNNPACK ON)
如果设置环境变量BUILD_PYTORCH_MOBILE_WITH_HOST_TOOLCHAIN,则set(INTERN_BUILD_MOBILE ON),而INTERN_BUILD_MOBILE一旦打开,则:
#只有编译caffe2 mobile的时候才是OFF,其它时候都是ON,也就是都会编译ATen的op
set(INTERN_BUILD_ATEN_OPS ON)
set(BUILD_PYTHON OFF)
set(BUILD_CAFFE2_OPS OFF)
set(USE_DISTRIBUTED OFF)
set(FEATURE_TORCH_MOBILE ON)
set(NO_API ON)
set(USE_FBGEMM OFF)
set(USE_QNNPACK OFF)
set(INTERN_DISABLE_ONNX ON)
set(INTERN_USE_EIGEN_BLAS ON)
set(INTERN_DISABLE_MOBILE_INTERP ON)
是否支持AVX2(perfkernels有依赖); 是否支持AVX512(fbgemm有依赖); 寻找BLAS实现,如果目标是Mobile平台,使用Eigen;如果不是Mobile,则寻找MKL、openblas(找不到不会报错,但程序运行时会提示:gels : Lapack library not found in compile time); Protobuf; python解释器; NNPACK(NNPACK backend 是x86-64); OpenMP(是MKL-DNN的依赖); NUMA; pybind11; CUDA; ONNX; MAGMA(基于GPU等设备的blas/lapack实现); metal(苹果生态); NEON(ARM生态,这里肯定是检测不到相关的硬件了); MKL-DNN(Intel的深度学习库); ATen parallel backend: NATIVE; Sleef(thirdparty下的三方库); RT : /usr/lib/x86_64-linux-gnu/librt.so ; FFTW3 : /usr/lib/x86_64-linux-gnu/libfftw3.so; OpenSSL: /usr/lib/x86_64-linux-gnu/libcrypto.so; MPI;
root@gemfield:~# pip3 install setuptools
root@gemfield:~# pip3 install pyyaml
root@gemfield:~# pip3 install dataclasses
04
>>> print(torch.__config__.show())
PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) Math Kernel Library Version 2020.0.1 Product Build 20200208 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v1.5.0 (Git Hash e2ac1fac44c5078ca927cb9b90e1b3066a0b2ed0)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 10.1
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;\
-gencode;arch=compute_50,code=sm_50;\
-gencode;arch=compute_60,code=sm_60;\
-gencode;arch=compute_61,code=sm_61;\
-gencode;arch=compute_70,code=sm_70;\
-gencode;arch=compute_75,code=sm_75;\
-gencode;arch=compute_37,code=compute_37
- CuDNN 7.6.3
- Magma 2.5.2
- Build settings: BLAS=MKL, \
BUILD_TYPE=Release, \
CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp \
-DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK \
-DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type \
-Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas \
-Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function \
-Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing \
-Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic \
-Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always \
-faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno \
-fno-trapping-math -Werror=format -Wno-stringop-overflow, \
PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, \
USE_CUDA=ON, \
USE_EXCEPTION_PTR=1, \
USE_GFLAGS=OFF, USE_GLOG=OFF, \
USE_MKL=ON, \
USE_MKLDNN=ON, \
USE_MPI=OFF, \
USE_NCCL=ON, \
USE_NNPACK=ON, \
USE_OPENMP=ON, \
USE_STATIC_DISPATCH=OFF
05
caffe2; 可执行文件; python; test; numa; 分布式(DISTRIBUTED); ROCM; GLOO; MPI; CUDA;
root@gemfield:~# wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB
root@gemfield:~# apt-key add GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB
root@gemfield:~# echo deb https://apt.repos.intel.com/mkl all main > /etc/apt/sources.list.d/intel-mkl.list
root@gemfield:~# apt update
root@gemfield:~# apt install intel-mkl-64bit-2020.4-912
cmake \
-DCMAKE_VERBOSE_MAKEFILE:BOOL=1 \
-DUSE_CUDA=OFF \
-DBUILD_CAFFE2=OFF \
-DBUILD_PYTHON:BOOL=OFF \
-DBUILD_CAFFE2_OPS=OFF \
-DUSE_DISTRIBUTED=OFF \
-DBUILD_TEST=OFF \
-DBUILD_BINARY=OFF \
-DBUILD_MOBILE_BENCHMARK=0 \
-DBUILD_MOBILE_TEST=0 \
-DUSE_ROCM=OFF \
-DUSE_GLOO=OFF \
-DUSE_LEVELDB=OFF \
-DUSE_MPI:BOOL=OFF \
-DBUILD_CUSTOM_PROTOBUF:BOOL=OFF \
-DUSE_OPENMP:BOOL=OFF \
-DBUILD_SHARED_LIBS:BOOL=OFF \
-DCMAKE_BUILD_TYPE:STRING=Release \
-DPYTHON_EXECUTABLE:PATH=`which python3` \
-DCMAKE_INSTALL_PREFIX:PATH=../libtorch_cpu_mkl \
../pytorch
cmake --build . --target install -- "-j8"
lib/libprotobuf.a
lib/libsleef.a
lib/libclog.a
lib/libcpuinfo.a
lib/libnnpack.a
lib/libasmjit.a
lib/libmkldnn.a
lib/libpytorch_qnnpack.a
lib/libcaffe2_protos.a
lib/libprotobuf-lite.a
lib/libfbgemm.a
lib/libc10.a
lib/libpthreadpool.a
lib/libtorch_cpu.a
lib/libdnnl.a
lib/libqnnpack.a
lib/libprotoc.a
lib/libXNNPACK.a
lib/libtorch.a
lib/libonnx_proto.a
lib/libfmt.a
lib/libonnx.a
lib/libfoxi_loader.a
cmake \
-DCMAKE_VERBOSE_MAKEFILE:BOOL=1 \
-DBUILD_CAFFE2=ON \
-DBUILD_CAFFE2_OPS=ON \
-DUSE_OPENMP=ON \
-DUSE_MKLDNN=ON \
-DUSE_GFLAGS=OFF \
-DUSE_GLOG=OFF \
-DUSE_CUDA=OFF \
-DBUILD_PYTHON:BOOL=OFF \
-DUSE_DISTRIBUTED=OFF \
-DBUILD_TEST=OFF \
-DBUILD_BINARY=OFF \
-DBUILD_MOBILE_BENCHMARK=0 \
-DBUILD_MOBILE_TEST=0 \
-DUSE_ROCM=OFF \
-DUSE_GLOO=OFF \
-DUSE_LEVELDB=OFF \
-DUSE_MPI:BOOL=OFF \
-DBUILD_SHARED_LIBS:BOOL=OFF \
-DCMAKE_BUILD_TYPE:STRING=Release \
-DPYTHON_EXECUTABLE:PATH=`which python3` \
-DCMAKE_INSTALL_PREFIX:PATH=../libtorch_cpu_caffe2 \
../pytorch
libCaffe2_perfkernels_avx.a
libCaffe2_perfkernels_avx2.a
libCaffe2_perfkernels_avx512.a
caffe2::EmbeddingLookupIdx_int32_t_float_float_false__avx2_fma
caffe2::EmbeddingLookupIdx_int32_t_float_float_true__avx2_fma
caffe2::EmbeddingLookupIdx_int32_t_half_float_false__avx2_fma
caffe2::EmbeddingLookupIdx_int32_t_half_float_true__avx2_fma
caffe2::EmbeddingLookupIdx_int32_t_uint8_t_float_false__avx2_fma
caffe2::EmbeddingLookupIdx_int32_t_uint8_t_float_true__avx2_fma
caffe2::EmbeddingLookupIdx_int64_t_float_float_false__avx2_fma
caffe2::EmbeddingLookupIdx_int64_t_float_float_true__avx2_fma
caffe2::EmbeddingLookupIdx_int64_t_half_float_false__avx2_fma
caffe2::EmbeddingLookupIdx_int64_t_half_float_true__avx2_fma
caffe2::EmbeddingLookupIdx_int64_t_uint8_t_float_false__avx2_fma
caffe2::EmbeddingLookupIdx_int64_t_uint8_t_float_true__avx2_fma
caffe2::EmbeddingLookup_int32_t_float_float_false__avx2_fma
caffe2::EmbeddingLookup_int32_t_float_float_true__avx2_fma
caffe2::EmbeddingLookup_int32_t_half_float_false__avx2_fma
caffe2::EmbeddingLookup_int32_t_half_float_true__avx2_fma
caffe2::EmbeddingLookup_int32_t_uint8_t_float_false__avx2_fma
caffe2::EmbeddingLookup_int32_t_uint8_t_float_true__avx2_fma
caffe2::EmbeddingLookup_int64_t_float_float_false__avx2_fma
caffe2::EmbeddingLookup_int64_t_float_float_true__avx2_fma
caffe2::EmbeddingLookup_int64_t_half_float_false__avx2_fma
caffe2::EmbeddingLookup_int64_t_half_float_true__avx2_fma
caffe2::EmbeddingLookup_int64_t_uint8_t_float_false__avx2_fma
caffe2::EmbeddingLookup_int64_t_uint8_t_float_true__avx2_fma
caffe2::Fused8BitRowwiseEmbeddingLookupIdx_int32_t_uint8_t_float_false__avx2_fma
caffe2::Fused8BitRowwiseEmbeddingLookupIdx_int64_t_uint8_t_float_false__avx2_fma
caffe2::Fused8BitRowwiseEmbeddingLookup_int32_t_uint8_t_float_false__avx2_fma
caffe2::Fused8BitRowwiseEmbeddingLookup_int64_t_uint8_t_float_false__avx2_fma
caffe2::TypedAxpy__avx2_fma
caffe2::TypedAxpy__avx_f16c
caffe2::TypedAxpyHalffloat__avx2_fma
caffe2::TypedAxpyHalffloat__avx_f16c
caffe2::TypedAxpy_uint8_float__avx2_fma
/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.a
#/opt/intel/mkl/lib/intel64/libmkl_sequential.a
/opt/intel/mkl/lib/intel64/libmkl_gnu_thread.a
/opt/intel/mkl/lib/intel64/libmkl_core.a
/usr/lib/x86_64-linux-gnu/libpthread.so
/usr/lib/x86_64-linux-gnu/libm.so
/usr/lib/x86_64-linux-gnu/libdl.so
06
#if(NOT INTERN_BUILD_MOBILE)
# set(AT_MKL_ENABLED 0)
# set(AT_MKL_MT 0)
# set(USE_BLAS 1)
# if(NOT (ATLAS_FOUND OR OpenBLAS_FOUND OR MKL_FOUND OR VECLIB_FOUND OR GENERIC_BLAS_FOUND))
# message(WARNING "Preferred BLAS (" ${BLAS} ") cannot be found, now searching for a general BLAS library")
# find_package(BLAS)
# if(NOT BLAS_FOUND)
# set(USE_BLAS 0)
# endif()
# endif()
#
# if(MKL_FOUND)
# add_definitions(-DTH_BLAS_MKL)
# if("${MKL_THREADING}" STREQUAL "SEQ")
# add_definitions(-DTH_BLAS_MKL_SEQ=1)
# endif()
# if(MSVC AND MKL_LIBRARIES MATCHES ".*libiomp5md\\.lib.*")
# add_definitions(-D_OPENMP_NOFORCE_MANIFEST)
# set(AT_MKL_MT 1)
# endif()
# set(AT_MKL_ENABLED 1)
# endif()
if(INTERN_USE_EIGEN_BLAS) # Eigen BLAS for Mobile
set(USE_BLAS 1) set(AT_MKL_ENABLED 0) include(${CMAKE_CURRENT_LIST_DIR}/External/EigenBLAS.cmake) list(APPEND Caffe2_DEPENDENCY_LIBS eigen_blas)endif()
root@gemfield:~# git diff cmake/External/EigenBLAS.cmake
......
set(__EIGEN_BLAS_INCLUDED TRUE)
-
-if(NOT INTERN_BUILD_MOBILE OR NOT INTERN_USE_EIGEN_BLAS)
+if(NOT INTERN_USE_EIGEN_BLAS)
return()
endif()
cmake \
-DINTERN_USE_EIGEN_BLAS=ON \
-DCMAKE_VERBOSE_MAKEFILE:BOOL=1 \
-DBUILD_CAFFE2=OFF \
-DBUILD_CAFFE2_OPS=OFF \
-DBUILD_PYTHON:BOOL=OFF \
-DUSE_DISTRIBUTED=OFF \
-DBUILD_TEST=OFF \
-DBUILD_BINARY=OFF \
-DBUILD_MOBILE_BENCHMARK=0 \
-DBUILD_MOBILE_TEST=0 \
-DUSE_ROCM=OFF \
-DUSE_GLOO=OFF \
-DUSE_CUDA=OFF \
-DUSE_LEVELDB=OFF \
-DUSE_MPI:BOOL=OFF \
-DBUILD_CUSTOM_PROTOBUF:BOOL=OFF \
-DUSE_OPENMP:BOOL=OFF \
-DBUILD_SHARED_LIBS:BOOL=OFF \
-DCMAKE_BUILD_TYPE:STRING=Release \
-DPYTHON_EXECUTABLE:PATH=`which python3` \
-DCMAKE_INSTALL_PREFIX:PATH=../libtorch_cpu_eigen \
../pytorch
cmake --build . --target install -- "-j8"
lib/libprotobuf.a
lib/libsleef.a
lib/libclog.a
lib/libcpuinfo.a
lib/libeigen_blas.a
lib/libnnpack.a
lib/libasmjit.a
lib/libmkldnn.a
lib/libpytorch_qnnpack.a
lib/libcaffe2_protos.a
lib/libprotobuf-lite.a
lib/libfbgemm.a
lib/libc10.a
lib/libpthreadpool.a
lib/libtorch_cpu.a
lib/libdnnl.a
lib/libqnnpack.a
lib/libprotoc.a
lib/libXNNPACK.a
lib/libtorch.a
lib/libonnx_proto.a
lib/libfmt.a
lib/libonnx.a
lib/libfoxi_loader.a
07
要编译CUDA版本,必须启用USE_CUDA; 另外,回落到CPU的时候,我们依然需要有对应的LAPACK实现,这里还是选择MKL,安装方法见前文; 还有一个地方需要注意:是否使用MAGMA。这里Gemfield先不使用。
自动检测本地机器上的设备号; 检测不到,则使用默认的一组; 用户通过TORCH_CUDA_ARCH_LIST环境变量指定。TORCH_CUDA_ARCH_LIST环境变量,比如TORCH_CUDA_ARCH_LIST="3.5 5.2 6.0 6.1+PTX",决定了要编译的pytorch支持哪些cuda架构。支持的架构越多,最后的库越大;
#cuda9
export TORCH_CUDA_ARCH_LIST="3.5;5.0;5.2;6.0;6.1;7.0;7.0+PTX"
#cuda10
export TORCH_CUDA_ARCH_LIST="3.5;5.0;5.2;6.0;6.1;7.0;7.5;7.5+PTX"
#cuda11
export TORCH_CUDA_ARCH_LIST="3.5;5.0;5.2;6.0;6.1;7.0;7.5;8.0;8.0+PTX"
#cuda11.1
export TORCH_CUDA_ARCH_LIST="5.0;7.0;8.0;8.6;8.6+PTX"
cmake \
-DCMAKE_VERBOSE_MAKEFILE:BOOL=1 \
-DUSE_CUDA=ON \
-DBUILD_CAFFE2=OFF \
-DBUILD_CAFFE2_OPS=OFF \
-DUSE_DISTRIBUTED=OFF \
-DBUILD_TEST=OFF \
-DBUILD_BINARY=OFF \
-DBUILD_MOBILE_BENCHMARK=0 \
-DBUILD_MOBILE_TEST=0 \
-DUSE_ROCM=OFF \
-DUSE_GLOO=OFF \
-DUSE_LEVELDB=OFF \
-DUSE_MPI:BOOL=OFF \
-DBUILD_PYTHON:BOOL=OFF \
-DBUILD_CUSTOM_PROTOBUF:BOOL=OFF \
-DUSE_OPENMP:BOOL=OFF \
-DBUILD_SHARED_LIBS:BOOL=OFF \
-DCMAKE_BUILD_TYPE:STRING=Release \
-DPYTHON_EXECUTABLE:PATH=`which python3` \
-DCMAKE_INSTALL_PREFIX:PATH=../libtorch_cuda \
../pytorch
-- CUDA detected: 10.2 -- CUDA nvcc is: /usr/local/cuda/bin/nvcc -- CUDA toolkit directory: /usr/local/cuda -- cuDNN: v7.6.5 (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libcudnn.so) -- Autodetected CUDA architecture(s): 3.5;5.0;5.2;6.0;6.1;7.0;7.5;7.5+PTX; -- Found CUDA with FP16 support, compiling with torch.cuda.HalfTensor;
cmake --build . --target install -- "-j8"
libasmjit.a
libc10.a
libc10_cuda.a
libcaffe2_protos.a
libclog.a
libcpuinfo.a
libdnnl.a
libfbgemm.a
libfmt.a
libfoxi_loader.a
libmkldnn.a
libnccl_static.a
libnnpack.a
libonnx.a
libonnx_proto.a
libprotobuf.a
libprotobuf-lite.a
libprotoc.a
libpthreadpool.a
libpytorch_qnnpack.a
libqnnpack.a
libsleef.a
libtorch.a
libtorch_cpu.a
libtorch_cuda.a
libXNNPACK.a
root@gemfield:~# export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/cuda-10.2/targets/x86_64-linux/lib/
08
最后
本文目的在于学术交流,并不代表本公众号赞同其观点或对其内容真实性负责,版权归原作者所有,如有侵权请告知删除。
“他山之石”历史文章
工业界视频理解解决方案大汇总
动手造轮子-rnn
凭什么相信你,我的CNN模型?关于CNN模型可解释性的思考
c++接口libtorch介绍& vscode+cmake实践
python从零开始构建知识图谱
一文读懂 PyTorch 模型保存与载入
适合PyTorch小白的官网教程:Learning PyTorch With Examples
pytorch量化备忘录
LSTM模型结构的可视化
PointNet论文复现及代码详解
SCI写作常用句型之研究结果&发现
白话生成对抗网络GAN及代码实现
pytorch的余弦退火学习率
Pytorch转ONNX-实战篇(tracing机制)
更多他山之石专栏文章,
请点击文章底部“阅读原文”查看
分享、点赞、在看,给个三连击呗!