Ascend NPU

如何查看版本号

CANN

1
2
3
4
5
6
7
8
❯ cat /usr/local/Ascend/ascend-toolkit/latest/arm64-linux/ascend_toolkit_install.info
package_name=Ascend-cann-toolkit
version=8.0.RC3
innerversion=V100R001C19SPC001B155
compatible_version=[V100R001C13,V100R001C19],[V100R001C30]
arch=aarch64
os=linux
path=/usr/local/Ascend/ascend-toolkit/8.0.RC3/aarch64-linux

npu-smi info使用

npu-smi info 类似 nvidia-smi 命令，打印 NPU 调用情况
需要使用 root / sudo 权限来使用

在docker 中需要添加 --privileged

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
sudo docker run --privileged --device /dev/davinci1 \
    --device /dev/davinci_manager \
    --device /dev/devmm_svm \
    --device /dev/hisi_hdc \
    -v /usr/local/dcmi:/usr/local/dcmi \
    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
    -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
    -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
    -v /etc/ascend_install.info:/etc/ascend_install.info \
-it --rm     ascendai/cann:8.0.rc2.beta1-910b-openeuler22.03-py3.8 bash

torch + npu + docker 使用

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
python -m venv /venv
source /venv/bin/activate
pip config set global.index-url https://repo.huaweicloud.com/repository/pypi/simple

# /venv/bin/python -m pip install torch==2.1.0 torch-npu==2.1.0.post10 torchvision==0.16
pip3 install torch==2.3.1 torch-npu==2.3.1.post4 torchvision==0.18.1


# 克隆 detectron2
yum install git g++ gcc make cmake -y
git clone -b device_arg https://github.com/MengqingCao/detectron2
# 注意：必须是 venv 中，才能安装 detectron2, 否则出现 检测不到 torch 的问题
# python -m venv /venv
# source /venv/bin/activate
/venv/bin/python -m pip install ./detectron2
# packaging import 错误，需要先 pip install -U setuptools

# 测试 pytorch
/venv/bin/python -c "import torch;import torch_npu;x=torch.Tensor([1,2]).npu();y=torch.Tensor([3,4]).npu();print(x+y)"

# 测试 detectron2
python -m detectron2.utils.collect_env

# source /usr/local/Ascend/ascend-toolkit/set_env.sh
# source /usr/local/Ascend/toolbox/set_env.sh

直接使用 ascendai 提供 pytorch docker 镜像

paddlepaddle + npu + docker

百度官方镜像

昇腾 NPU 安装说明-使用文档-PaddlePaddle深度学习平台

pull 镜像，运行

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# 拉取镜像
docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann80RC1-ubuntu20-x86_64-gcc84-py39 # X86 架构
docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann80RC1-ubuntu20-aarch64-gcc84-py39 # ARM 架构

# 考如下命令启动容器，ASCEND_RT_VISIBLE_DEVICES 可指定可见的 NPU 卡号
docker run -it --name paddle-npu-dev -v $(pwd):/work \
    --privileged --network=host --shm-size=128G -w=/work \
    -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
    -v /usr/local/dcmi:/usr/local/dcmi \
    -e ASCEND_RT_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" \
    ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann80RC1-ubuntu20-$(uname -m)-gcc84-py39 /bin/bash

# 检查容器内是否可以正常识别昇腾 NPU 设备
npu-smi info

安装 paddlepaddle (cpu 版), paddle-custom-npu

1
2
pip install paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu/
pip install paddle-custom-npu -i https://www.paddlepaddle.org.cn/packages/nightly/npu/

测试padle npu 可用性

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
python -c "import paddle_custom_device; paddle_custom_device.npu.version()"

# 预期得到如下输出结果
version: 0.0.0
commit: 147d506b2baa1971ab47b4550f0571e1f6b201fc
cann: 8.0.RC1
....

# 飞桨基础健康检查
python -c "import paddle; paddle.utils.run_check()"
# 预期得到输出如下
Running verify PaddlePaddle program ...
PaddlePaddle works well on 1 npu.
PaddlePaddle works well on 8 npus.
PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.

安装 pytorch

1
2
3
4
pip3 install torch==2.1.0 torch-npu==2.1.0.post10 torchvision==0.16

# 测试 torch + npu 可用
python -c "import torch;import torch_npu;x=torch.Tensor([1,2]).npu();y=torch.Tensor([3,4]).npu()"

同时使用 torch, torch_npu, paddleocr

导入顺序：
- paddleocr
- torch
- torch_npu

修复 libgomp***.so.1.0.0 错误：

1
LD_PRELOAD="/venv/lib/python3.9/site-packages/torch/lib/../../torch.libs/libgomp-4dbbc2f2.so.1.0.0:$LD_PRELOAD" /venv/bin/python app.py

docker 使用 npu 方法

方法一

1
2
3
4
5
docker run -it --rm --privileged --network=host --shm-size=128G -w=/work \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/dcmi:/usr/local/dcmi \
-e ASCEND_RT_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" \

ASCEND_RT_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" 指定可用设备

1
2
3
4
5
6
7
docker run -it --name cann_sawyer2 --privileged --network=host --shm-size=128G -w=/work \
       -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
       -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
       -v /usr/local/dcmi:/usr/local/dcmi \
       -e ASCEND_RT_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" \
       ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann80RC1-ubuntu20-aarch64-gcc84-py39 bash
       # ascendai/cann:8.0.rc2.beta1-910b-openeuler22.03-py3.8 bash

paddle + layoutparser + npu

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
"""
    Examples::
        >>> import layoutparser as lp
        >>> model = lp.models.PaddleDetectionLayoutModel('lp://PubLayNet/ppyolov2_r50vd_dcn_365e/config')
        >>> model.detect(image)
"""


import layoutparser as lp
from layoutparser.models import PaddleDetectionLayoutModel

class MdqPaddleDetectionLayoutModel(PaddleDetectionLayoutModel):
    def load_predictor(
        self,
        model_dir,
        device=None,
        enable_mkldnn=False,
        thread_num=10,
    ):
        """set AnalysisConfig, generate AnalysisPredictor
        Args:
            model_dir (str): root path of __model__ and __params__
            device (str): cuda, npu or cpu
        Returns:
            predictor (PaddlePredictor): AnalysisPredictor
        Raises:
            ValueError: predict by TensorRT need enforce_cpu == False.
        """

        config = paddle.inference.Config(
            os.path.join(
                model_dir, "inference.pdmodel"
            ),  # TODO: Move them to some constants
            os.path.join(model_dir, "inference.pdiparams"),
        )


        if device.startswith('cuda')
            # initial GPU memory(M), device ID
            # 2000 is an appropriate value for PaddleDetection model
            config.enable_use_gpu(2000, 0)
            # optimize graph and fuse op
            config.switch_ir_optim(True)
        elif device.startswith('npu'):
            # enable ascend npu device, device_id, 启用华为NPU
            device_id = int(device[4:])
            config.enable_use_npu(2000, device_id)
        else:
            config.disable_gpu()
            config.set_cpu_math_library_num_threads(thread_num)
            if enable_mkldnn:
                config.enable_mkldnn()
                try:
                    # cache 10 different shapes for mkldnn to avoid memory leak
                    config.set_mkldnn_cache_capacity(10)
                    config.enable_mkldnn()
                except Exception as e:
                    print(
                        "The current environment does not support `mkldnn`, so disable mkldnn."
                    )

        # disable print log when predict
        config.disable_glog_info()
        # enable shared memory
        config.enable_memory_optim()
        # disable feed, fetch OP, needed by zero_copy_run
        config.switch_use_feed_fetch_ops(False)
        predictor = paddle.inference.create_predictor(config)
        return predictor


if __name__ == '__main__':
    lp.models.PaddleDetectionLayoutModel('lp://PubLayNet/ppyolov2_r50vd_dcn_365e/config')
    img = '/work/demo.jpg'
    model.detect(img)

fastdeploy + paddleocr

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
import fastdeploy as fd
from fastdeploy.serving.server import SimpleServer
import os
import logging
from pathlib import Path
import paddleocr

logging.getLogger().setLevel(logging.INFO)

# Configurations
lang = 'ch'
if lang == 'ch'
    # det_model_dir = f"det/ch/ch_PP-OCRv3_det_infer"
    # cls_model_dir = "cls/ch/ch_ppocr_mobile_v2.0_cls_infer"
    # rec_model_dir = "rec/ch/ch_PP-OCRv3_rec_infer"
    # rec_label_file = "ppocr_keys_v1.txt"
    det_model_dir = f"det/ch/ch_PP-OCRv4_det_infer"
    cls_model_dir = "cls/ch/ch_ppocr_mobile_v2.0_cls_infer"
    rec_model_dir = "rec/ch/ch_PP-OCRv4_rec_infer__server"
    rec_label_file = Path(paddleocr.__file__).parent.joinpath("/ppocr/utils/ppocr_keys_v1.txt")
elif lang == 'en':
    det_model_dir = f"det/ch/ch_PP-OCRv3_det_infer"
    cls_model_dir = "cls/ch/ch_ppocr_mobile_v2.0_cls_infer"
    rec_model_dir = "rec/ch/ch_PP-OCRv3_rec_infer"
    rec_label_file = "ppocr_keys_v1.txt"

device = "npu:0"
# backend: ['paddle', 'trt'], you can also use other backends, but need to modify
# the runtime option below
backend = "paddle"

# Prepare models
# Detection model
model_save_prefix = "~/.paddleocr/whl/"
det_model_file = os.path.join(model_save_prefix, det_model_dir, "inference.pdmodel")
det_params_file = os.path.join(model_save_prefix, det_model_dir, "inference.pdiparams")
# Classification model
cls_model_file = os.path.join(model_save_prefix, cls_model_dir, "inference.pdmodel")
cls_params_file = os.path.join(model_save_prefix, cls_model_dir, "inference.pdiparams")
# Recognition model
rec_model_file = os.path.join(model_save_prefix, rec_model_dir, "inference.pdmodel")
rec_params_file = os.path.join(model_save_prefix, rec_model_dir, "inference.pdiparams")

# Setup runtime option to select hardware, backend, etc.
option = fd.RuntimeOption()

if device.lower() == "gpu":
    option.use_gpu()
if backend == "trt":
    option.use_trt_backend()
elif device.lower().startswith('npu'):
    option.use_ascend()
else:
    option.use_paddle_infer_backend()

det_option = option
# det_option.set_trt_input_shape("x", [1, 3, 64, 64], [1, 3, 640, 640], [1, 3, 960, 960])

# det_option.set_trt_cache_file("det_trt_cache.trt")
print(det_model_file, det_params_file)
det_model = fd.vision.ocr.DBDetector(
    det_model_file, det_params_file, runtime_option=det_option
)

cls_batch_size = 1
rec_batch_size = 6

cls_option = option
cls_option.set_trt_input_shape(
    "x", [1, 3, 48, 10], [cls_batch_size, 3, 48, 320], [cls_batch_size, 3, 48, 1024]
)

# cls_option.set_trt_cache_file("cls_trt_cache.trt")
cls_model = fd.vision.ocr.Classifier(
    cls_model_file, cls_params_file, runtime_option=cls_option
)

rec_option = option
rec_option.set_trt_input_shape(
    "x", [1, 3, 48, 10], [rec_batch_size, 3, 48, 320], [rec_batch_size, 3, 48, 2304]
)

# rec_option.set_trt_cache_file("rec_trt_cache.trt")
rec_model = fd.vision.ocr.Recognizer(
    rec_model_file, rec_params_file, rec_label_file, runtime_option=rec_option
)

# Create PPOCRv3 pipeline
ppocr_v3 = fd.vision.ocr.PPOCRv3(
    det_model=det_model, cls_model=cls_model, rec_model=rec_model
)

ppocr_v3.cls_batch_size = cls_batch_size
ppocr_v3.rec_batch_size = rec_batch_size

# Create server, setup REST API
app = SimpleServer()
app.register(
    task_name="fd/ppocrv3",
    model_handler=fd.serving.handler.VisionModelHandler,
    predictor=ppocr_v3,
)

docker run mdq-table-extractor

1
2
3
4
5
6
7
docker run -it --rm \
    --privileged --shm-size=128G -w=/app \
    -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
    -v /usr/local/dcmi:/usr/local/dcmi \
    -e ASCEND_RT_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" \
    -v /home/matgene/sawyer/source/mdq-table-extractor:/app -p 18088:8088 mdq-table-extractor:dev3.3-npu

docker npu + paddlex

1
2
3
4
paddlex --pipeline OCR --input ./scripts/demo1.jpg --use_doc_orientation_classify False --use_doc_unwarping False --use_textline_orientation False --save_path ./output --device npu:1


paddlex --pipeline OCR --input general_ocr_002.png --use_doc_orientation_classify False --use_doc_unwarping False --use_textline_orient ation False --save_path ./output

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
from paddlex import create_pipeline

pipeline = create_pipeline(pipeline="./OCR.yaml")

output = pipeline.predict(
    input="./scripts/demo.jpg",
    use_doc_orientation_classify=False,
    use_doc_unwarping=False,
    use_textline_orientation=False,
    device='npu:2'
)
for res in output:
    res.print()
    res.save_to_img("./output/")
    res.save_to_json("./output/")



from paddlex import create_pipeline

pipeline = create_pipeline(pipeline="./OCR.yaml", device="npu:2") # 将设备名修改为 npu、mlu、xpu、dcu 或 gcu

output = pipeline.predict("./scripts/demo.jpg")
for res in output:
    res.print()
    res.save_to_img("./output/")

paddle 指定占用 npu 现存大小

FLAGS_fraction_of_gpu_memory_to_use=0.02

FLAGS_fraction_of_gpu_memory_to_use

程序启动后占用的保留显存大小，如果超过指定大小，从GPU/NPU 的总现存中获取
这个指定的程序占用的最低现存大小
在 NPU 上使用这个变量的好处：可以避免 paddlepaddle 占用所有的现存

Ascend NPU

文章目录

如何查看版本号

CANN

npu-smi info使用

torch + npu + docker 使用

直接使用 ascendai 提供 pytorch docker 镜像

paddlepaddle + npu + docker

百度官方镜像

docker 使用 npu 方法

paddle + layoutparser + npu

fastdeploy + paddleocr

docker run mdq-table-extractor

docker npu + paddlex

paddle 指定占用 npu 现存大小

FLAGS_fraction_of_gpu_memory_to_use=0.02

在 NPU 上没有类似 GPU 类似的 paddle.device.cuda.empty_cache() 函数

文章目录

如何查看版本号

CANN

npu-smi info使用

torch + npu + docker 使用

直接使用 ascendai 提供 pytorch docker 镜像

paddlepaddle + npu + docker

百度官方 镜像

docker 使用 npu 方法

paddle + layoutparser + npu

fastdeploy + paddleocr

docker run mdq-table-extractor

docker npu + paddlex

paddle 指定占用 npu 现存大小

FLAGS_fraction_of_gpu_memory_to_use=0.02

在 NPU 上没有类似 GPU 类似的 paddle.device.cuda.empty_cache() 函数

百度官方镜像