1.需要注意的地方torch版本问题

要求torch>=1.1,同时torch跟cuda有版本对应限制,这个一定要注意。我先安装的0.4.0,不行,又装了1.4.0,但是版本要求的cuda太高了,我又重新装回到1.0.0,正好对应我的cuda-9.0,万万没想到程序要求的是大于1.1.0。耗费两三个小时。

cuda与pytorch版本不匹配问题,先看cuda是什么版本的,再去找对应的torch版本进行安装。

 

当然在这个过程中,很可能会碰上这个bug,因为网速太慢,而导致下载失败

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/pip/_vendor/urllib3/response.py", line 397, in _error_catcher
    yield
  File "/usr/local/lib/python3.5/dist-packages/pip/_vendor/urllib3/response.py", line 479, in read
    data = self._fp.read(amt)
  File "/usr/local/lib/python3.5/dist-packages/pip/_vendor/cachecontrol/filewrapper.py", line 62, in read
    data = self.__fp.read(amt)
  File "/usr/lib/python3.5/http/client.py", line 458, in read
    n = self.readinto(b)
  File "/usr/lib/python3.5/http/client.py", line 498, in readinto
    n = self.fp.readinto(b)
  File "/usr/lib/python3.5/socket.py", line 575, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.5/ssl.py", line 929, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.5/ssl.py", line 791, in read
    return self._sslobj.read(len, buffer)
  File "/usr/lib/python3.5/ssl.py", line 575, in read
    v = self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/pip/_internal/cli/base_command.py", line 188, in main
    status = self.run(options, args)
  File "/usr/local/lib/python3.5/dist-packages/pip/_internal/commands/install.py", line 345, in run
    resolver.resolve(requirement_set)
  File "/usr/local/lib/python3.5/dist-packages/pip/_internal/legacy_resolve.py", line 196, in resolve
    self._resolve_one(requirement_set, req)
  File "/usr/local/lib/python3.5/dist-packages/pip/_internal/legacy_resolve.py", line 359, in _resolve_one
    abstract_dist = self._get_abstract_dist_for(req_to_install)
  File "/usr/local/lib/python3.5/dist-packages/pip/_internal/legacy_resolve.py", line 307, in _get_abstract_dist_for
    self.require_hashes
  File "/usr/local/lib/python3.5/dist-packages/pip/_internal/operations/prepare.py", line 199, in prepare_linked_requirement
    progress_bar=self.progress_bar
  File "/usr/local/lib/python3.5/dist-packages/pip/_internal/download.py", line 1064, in unpack_url
    progress_bar=progress_bar
  File "/usr/local/lib/python3.5/dist-packages/pip/_internal/download.py", line 924, in unpack_http_url
    progress_bar)
  File "/usr/local/lib/python3.5/dist-packages/pip/_internal/download.py", line 1152, in _download_http_url
    _download_url(resp, link, content_file, hashes, progress_bar)
  File "/usr/local/lib/python3.5/dist-packages/pip/_internal/download.py", line 861, in _download_url
    hashes.check_against_chunks(downloaded_chunks)
  File "/usr/local/lib/python3.5/dist-packages/pip/_internal/utils/hashes.py", line 75, in check_against_chunks
    for chunk in chunks:
  File "/usr/local/lib/python3.5/dist-packages/pip/_internal/download.py", line 829, in written_chunks
    for chunk in chunks:
  File "/usr/local/lib/python3.5/dist-packages/pip/_internal/utils/ui.py", line 156, in iter
    for x in it:
  File "/usr/local/lib/python3.5/dist-packages/pip/_internal/download.py", line 818, in resp_read
    decode_content=False):
  File "/usr/local/lib/python3.5/dist-packages/pip/_vendor/urllib3/response.py", line 531, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "/usr/local/lib/python3.5/dist-packages/pip/_vendor/urllib3/response.py", line 496, in read
    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
  File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.5/dist-packages/pip/_vendor/urllib3/response.py", line 402, in _error_catcher
    raise ReadTimeoutError(self._pool, None, 'Read timed out.')
pip._vendor.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='pypi.tuna.tsinghua.edu.cn', port=443): Read timed out.

另外也有时间限制,把时间限制从100改成1000就可以了。

解决方法如下所示:

sudo /usr/bin/python3.5 -m pip install  --default-timeout=1000 --no-cache-dir torch==1.1.0 -i https://pypi.tuna.tsinghua.edu.cn/simple

清华源不行就换成阿里源,下个破torch折腾了三个小时了 :

sudo /usr/bin/python3.5 -m pip install  --default-timeout=1000 --no-cache-dir torch==1.1.0 -i https://mirrors.aliyun.com/pypi/simple

 

2.error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

看bug来说应该是有些辅助库没有装上

error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

解决blog有:

 

sudo apt-get install python3.5-dev

我的程序采用之后无法解决,中间的版本号根据自己的python版本来改动。

 

我在采用上述方法之后,然后下载了requirements文件夹里面所有要求的python库之后,运行就能通过了。可能不光ubuntu系统库缺东西,还有python有些库缺东西。

运行mmdetection遇到的坑_Python

 

3.error: [Errno 13] Permission denied: '/usr/local/lib/python3.5/dist-packages/easy-install.pth'

这个很简单,直接chmod就可以

sudo chmod 777 /usr/local/lib/python3.5/dist-packages
sudo chmod 777 /usr/local/lib/python3.5/dist-packages/*

4.error: [Errno 13] Permission denied: '/usr/local/bin/convert-onnx-to-caffe2'

解决:

sudo chmod 777 /usr/local/bin

 

5.解决fatal error: torch/extension.h: No such file or directory

这个问题很典型的就是torch与cuda版本不匹配导致的。

就是torch可能是0.4.0的,升级一下,升到1.1.0

 

6.from .. import deform_conv_cuda

这个问题十分糟心,运行test.py疯狂报这个错,我一直没解决好,我看了其他人的blog上面写的是:

python setup.py develop

这样就可以了,但是我运行setup.py之后,就会报另一个很蛋疼的错误,也就是下一个问题。解决方法也在下面说。

 

7.‘:/usr/local/cuda-10.0/nvcc': No such file or directory

按照报错的意思来说,就是cuda文件夹里面找不到这个nvcc,但是我发现确实有这个nvcc文件,但是现实路径不对。

这个解决方法按照别人的方法来也很简单,就是直接改bashrc文件就好。

打开终端:

export CUDA_HOME=/usr/local/cuda-9.0
source ~/.bashrc

按照别人讲的,就这样就可以了,但是我的改了他妈的一天都不对,直接想弃坑了。因为我的服务器上装了cuda10.0跟10.1,有时候地址写的是10.0,有时候写的是10.1,这就很蛋疼。

你再看那个引号里面的路径,是多了一个冒号:。根据别人的做法,删掉冒号,重新输入地址,看似是对的,但是我的就是不行呢!!!!然后我用的一直是实验室的服务器,我在家里用shell一直连不上,疯狂从终端输入./pycharm.h就是没反应,然后又是查pycharm的运行号,kill掉进程。这个过程很烦躁。

最后解决的方法,就是重启了一个工程,然后开始下载python3.5的虚拟环境,把requirements文件夹下面的所有库都下载一遍,再运行setup.py develop就可以了,然后test.py也可以正常运行。。。。NMSL...

我到现在也不明白为什么大家都可以的方法,碰到我的电脑上就是行不通。

 

运行到此处,setup.py就可以完美运行了。

然后运行test.py文件,效果图就是这样,简单测了一下,用的model是mask scoring_rcnn_x101_64*4d_fpn_1x:

运行mmdetection遇到的坑_Python_02

把这么瘦的我拍的这么胖

 

在训练自己的数据库遇到的问题:

8.AssertionError: annotation file format <class 'list'> not supported

我在train的时候,转为自己的数据,同时把json文件地址进行修改:

/usr/bin/python3.5 tools/train_chicken.py configs/mask_rcnn_x101_64x4d_fpn_1x.py --gpus 1 --validate --work_dir /work_dir

其中json文件地址在configs/mask_rcnn_x101_64x4d_fpn_1x.py修改,具体位置:

data = dict(
    imgs_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/train_via_region_data.json',
        img_prefix=data_root + 'train2017/',
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/val_via_region_data.json',
        img_prefix=data_root + 'val2017/',
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/test_via_region_data.json',
        img_prefix=data_root + 'val2017/',
        pipeline=test_pipeline))

报错显示我的annotations有问题。我打算先下载coco数据库的json文件,对比一下二者之间的区别

 

原因是我在使用json文件时,我的图片数据库的裁剪方式为多边形,而coco的是矩形,只存四个点的位置信息,换成矩形就可以了。

分割生成json文件我一上传到csdn的下载界面1

9.KeyError: 'categories'

还是json的问题,换为矩形之后,报错显示类别错误,目前正在解决,需要修改coco的categories,或者把自己的类别改为coco81中种类其中的一种,我原来的类别是chicken,coco里没有,我就改成bird了。

还有要在json里面加上categories类别。

运行mmdetection遇到的坑_Python_03

修改mmdetection/mmdet/core/evaluation下的class_names.py中的voc_classes,将其改为要训练的数据集的类别名称。注意的是,如果类别只有一个,也是要加上逗号的,否则会报错,如下:

 

运行mmdetection遇到的坑_Python_04

 

问题的关键在于如何制作标准的coco的json文件。

我上传的网页能够制作coco数据集,可以使用就行了。生成coco数据及之后,就能正常运行。

运行mmdetection遇到的坑_Python_05

10.ValueError: need at least one array to concatenate

 

制作完毕coco数据集之后,运行以下命令报错:

 /usr/bin/python3.5 tools/train_chicken.py configs/mask_rcnn_x101_64x4d_fpn_1x.py

报错为:

Traceback (most recent call last):
  File "tools/train_chicken.py", line 142, in <module>
    main()
  File "tools/train_chicken.py", line 138, in main
    meta=meta)
  File "/home/zlee/下载/mmdetection-master/mmdet/apis/train.py", line 111, in train_detector
    meta=meta)
  File "/home/zlee/下载/mmdetection-master/mmdet/apis/train.py", line 225, in _non_dist_train
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/home/zlee/.local/lib/python3.5/site-packages/mmcv/runner/runner.py", line 359, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/zlee/.local/lib/python3.5/site-packages/mmcv/runner/runner.py", line 259, in train
    for i, data_batch in enumerate(data_loader):
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 193, in __iter__
    return _DataLoaderIter(self)
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 493, in __init__
    self._put_indices()
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 591, in _put_indices
    indices = next(self.sample_iter, None)
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/sampler.py", line 172, in __iter__
    for idx in self.sampler:
  File "/home/zlee/下载/mmdetection-master/mmdet/datasets/loader/sampler.py", line 63, in __iter__
    indices = np.concatenate(indices)
ValueError: need at least one array to concatenate

显示valueError错误。还没解决

 

11.KeyError: 'category_id'

一般是引用某个字段,但没有定义

json文件里面类别id没写。写上就行。

"annotations": [{
			"id": 0,
			"image_id": "0",
			"segmentation": [47, 49, 192, 49, 192, 179, 47, 179],
			"area": 18850,
			"bbox": [47, 49, 145, 130],
			"iscrowd": 0
		},
		{
			"id": 1,
			"image_id": "0",
			"segmentation": [45, 304, 187, 304, 187, 563, 45, 563],
			"area": 36778,
			"bbox": [45, 304, 142, 259],
			"iscrowd": 0
		},

加上category_id就可以了,具体的id,需要去coco里面找,我的是bird,id是16

"annotations": [{
			"id": 0,
			"image_id": "0",
			"segmentation": [47, 49, 192, 49, 192, 179, 47, 179],
			"area": 18850,
			"bbox": [47, 49, 145, 130],
			"iscrowd": 0,
                        "category_id": 16
		},
		{
			"id": 1,
			"image_id": "0",
			"segmentation": [45, 304, 187, 304, 187, 563, 45, 563],
			"area": 36778,
			"bbox": [45, 304, 142, 259],
			"iscrowd": 0,
                        "category_id": 16
		},