python+TensorFlow实现人脸识别智能小程序的项目（包含TensorFlow版本与Pytorch版本）（二）

1、人脸业务流程1、人脸检测（Face Detection）问题2、人脸对齐（Face Alignment）问题3、人脸属性（Face Attribute）问题4、人脸比对（Face Compare）问题

2、人脸识别相关数据集3、人脸检测1、人脸检测需要解决的问题2、小人脸检测问题4、人脸目标检测算法5、TensorFlow+SSD环境搭建1、官网下载需要的项目2、安装基础包3、安装重要包protobuf与protoc这两个包的版本必须一致否则会报错

4、人脸检测数据集1、数据集结构2、标注文件解析：3、Passcal VOC数据格式4、将标注txt文件转xml格式代码5、查看人脸图像标注效果代码：6、将VOC数据集打包成TF-Record的格式

5、训练模型（ssd_resnet50_v1_fpn）6、模型优化注意事项7、训练好的模型转成pb文件8、模型测试Flask封装人脸检测模型web服务

1、人脸业务流程

1、人脸检测（Face Detection）问题

2、人脸对齐（Face Alignment）问题

3、人脸属性（Face Attribute）问题

4、人脸比对（Face Compare）问题

2、人脸识别相关数据集

3、人脸检测

1、人脸检测需要解决的问题

1、人脸可能出现在图像中的任何一个位置

2、人脸可能有不同的尺度

3、人脸在图像中可能有不同的视角和姿态

4、人脸可能部分被遮挡

2、小人脸检测问题

1、下采样倍率很大时，人脸区域基本消失

2、相比于感受野和anchor的尺寸来说，人脸的尺寸太小

3、anchor匹配策略（IOU小且变化敏感）

4、正负样本比例失衡

解决方案：

1、多尺度策略

2、调整优化anchor策略

3、在线的难例挖掘

3、IOU计算方式

4、人脸目标检测算法

使用SSD检测模型（后期使用效果更优的yolov5目标检测算法进行优化） SSD作为人脸检测器的有点如下：

1、端到端的训练

2、直接回归目标类别和位置

3、不同尺度的特征图上进行预测

SSD网络模型结构：（以VGG16为特征提取网络）

关于SSD详细的讲解，请参考我的另一篇博文 https://blog.csdn.net/guoqingru0311/article/details/130320681

5、TensorFlow+SSD环境搭建

TensorFlow-gpu版本为1.13.0 项目路径： https://github.com/tensorflow/models/tree/r1.13.0/research/object_detection 安装教程： https://github.com/tensorflow/models/blob/r1.13.0/research/object_detection/g3doc/installation.md

环境搭建：

1、官网下载需要的项目

https://github.com/tensorflow/models/tree/r1.13.0

2、安装基础包

pip install pillow lxml Cython jupyter matplotlib pandas opencv-python --default-timeout=100 -i https://pypi.tuna.tsinghua.edu.cn/simple

# 安装tensorflow-gpu版，此次项目使用的是1.13.0

pip install tensorflow-gpu==1.13.0 --default-timeout=100 -i https://pypi.tuna.tsinghua.edu.cn/simple

3、安装重要包protobuf与protoc这两个包的版本必须一致否则会报错

（1）安装protobuf

# 安装protobuf（Note：protobuf默认安装的是2.6.1版,安装时需要安装版本大于3）

pip install protobuf==3.15.0

（2）安装 protoc 与protobuf对应的版本是protoc-3.0.0-linux-x86_64.zip

先查看是否安装过protoc 1、首先查看本机的protoc的版本

protoc --version

2、本人机器在安装之前输出的是2.6.1版本的protoc，是通过如下的命令安装的：

sudo apt install protobuf-compiler

（如果已经安装可能会因为版本不匹配出现以下报错问题）开始正儿八经的安装教程了哈：：

# 1、下载对应版本的安装包文件

wget https://github.com/google/protobuf/releases/download/v3.0.0/protoc-3.0.0-linux-x86_64.zip

# 2、解压压缩包

apt-get install unzip

unzip protoc-3.0.0-linux-x86_64.zip -d protoc-3.0.0-linux-x86_64

# 3、将解压后的文件目录移动到/opt目录下，方便写入环境变量

mv protoc-3.0.0-linux-x86_64/ /opt

cd /opt/protoc-3.0.0-linux-x86_64/bin

chmod +x protoc #这一步很必要，否则执行的还是2.6.1版本的

export PATH=/opt/protoc-3.0.0-linux-x86_64/bin:$PATH # 写入环境变量

source ~/.bashrc

Note：此处需要保证protobuf与protoc版本保持一致，否则会报以下错误 3、编译proto文件在research文件下执行以下命令，如果object_detection/protos/文件夹中每个proto文件都生成了对应以.py格式的代码文件，就说明编译成功了。

# From tensorflow/models/research/

protoc object_detection/protos/*.proto --python_out=.

（3）将安装Slim加入PYTHONPATH

#1、安装slim包必须保证与tensorflow、object项目同版本，否则会报错

# 需要cd到 */research/slim 目录下，执行以下语句

python setup.py install

#2、因为要用到slim，所以得将slim加入python使用的path才能正常运行。还是在research文件下，执行以下命令：

export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

（4）安装完成测试

python object_detection/builders/model_builder_test.py

截止到以上，环境搭建已经完成，步骤确实复杂~~~~

4、人脸检测数据集

WIDER FACE数据集http://shuoyang1213.me/WIDERFACE/ Wider Face数据集最早是在2015年公开的（v1.0版本）。该数据集的图片来源是WIDER数据集，从中挑选出了32,203图片并进行了人脸标注，总共标注了393,703个人脸数据。并且对于每张人脸都附带有更加详细的信息，包扩blur（模糊程度）, expression（表情）, illumination（光照）, occlusion（遮挡）, pose（姿态）

1、数据集结构

接着，下载数据集，这里只下载训练集（Training Images），验证集（Validation Images）以及标注文件（Face annotations）下载好后进行解压，并按照如下结构摆放文件：

├── wider_face: 存放数据集根目录

├── WIDER_train: 训练集解压后的文件目录

│ └── images:

│ ├── 0--Parade: 对应该类别的所有图片

│ ├── ........

│ └── 61--Street_Battle: 对应该类别的所有图片

│

├── WIDER_val: 验证集解压后的文件目录

│ └── images:

│ ├── 0--Parade: 对应该类别的所有图片

│ ├── ........

│ └── 61--Street_Battle: 对应该类别的所有图片

│

└── wider_face_split: 标注文件解压后的文件目录

├── wider_face_train.mat: 训练集的标注文件，MATLAB存储格式

├── wider_face_train_bbx_gt.txt: 训练集的标注文件，txt格式

├── wider_face_val.mat: 验证集的标注文件，MATLAB存储格式

├── wider_face_val_bbx_gt.txt: 验证的标注文件，txt格式

├── wider_face_test.mat: 测试集的标注文件，MATLAB存储格式

├── wider_face_test_filelist.txt: 测试的标注文件，txt格式

└── readme.txt: 标注文件说明

2、标注文件解析：

首先看下readme.txt文件里的说明

Attached the mappings between attribute names and label values.

blur:

clear->0

normal blur->1

heavy blur->2

expression:

typical expression->0

exaggerate expression->1

illumination:

normal illumination->0

extreme illumination->1

occlusion:

no occlusion->0

partial occlusion->1

heavy occlusion->2

pose:

typical pose->0

atypical pose->1

invalid:

false->0(valid image)

true->1(invalid image)

The format of txt ground truth.

File name

Number of bounding box

x1, y1, w, h, blur, expression, illumination, invalid, occlusion, pose

0--Parade/0_Parade_marchingband_1_849.jpg

449 330 122 149 0 0 0 0 0 0

0--Parade/0_Parade_Parade_0_904.jpg

361 98 263 339 0 0 0 0 0 0

0--Parade/0_Parade_marchingband_1_799.jpg

78 221 7 8 2 0 0 0 0 0

78 238 14 17 2 0 0 0 0 0

113 212 11 15 2 0 0 0 0 0

134 260 15 15 2 0 0 0 0 0

163 250 14 17 2 0 0 0 0 0

201 218 10 12 2 0 0 0 0 0

182 266 15 17 2 0 0 0 0 0

245 279 18 15 2 0 0 0 0 0

304 265 16 17 2 0 0 0 2 1

328 295 16 20 2 0 0 0 0 0

389 281 17 19 2 0 0 0 2 0

406 293 21 21 2 0 1 0 0 0

436 290 22 17 2 0 0 0 0 0

522 328 21 18 2 0 1 0 0 0

643 320 23 22 2 0 0 0 0 0

653 224 17 25 2 0 0 0 0 0

793 337 23 30 2 0 0 0 0 0

535 311 16 17 2 0 0 0 1 0

29 220 11 15 2 0 0 0 0 0

3 232 11 15 2 0 0 0 2 0

20 215 12 16 2 0 0 0 2 0

通过个人分析统计，发现在训练集中总共有12,880张图片，其中有4张是没有人脸信息的。 train中没有人脸目标的样本：

0--Parade/0_Parade_Parade_0_452.jpg

2--Demonstration/2_Demonstration_Political_Rally_2_444.jpg

39--Ice_Skating/39_Ice_Skating_iceskiing_39_380.jpg

46--Jockey/46_Jockey_Jockey_46_576.jpg

验证集中总共有3,226张图片，其中有4张没有人脸信息。 val中没有人脸目标的样本：

0--Parade/0_Parade_Parade_0_275.jpg

7--Cheering/7_Cheering_Cheering_7_426.jpg

37--Soccer/37_Soccer_soccer_ball_37_281.jpg

50--Celebration_Or_Party/50_Celebration_Or_Party_houseparty_50_715.jpg

3、Passcal VOC数据格式

4、将标注txt文件转xml格式代码

import os,cv2,sys,shutil,numpy

from xml.dom.minidom import Document

import os

"""

将wider face数据集的txt格式的标签转换成xml格式

"""

def writexml(filename, saveimg, bboxes, xmlpath):

doc = Document()

annotation = doc.createElement('annotation')

doc.appendChild(annotation)

folder = doc.createElement('folder')

folder_name = doc.createTextNode('widerface')

folder.appendChild(folder_name)

annotation.appendChild(folder)

filenamenode = doc.createElement('filename')

filename_name = doc.createTextNode(filename)

filenamenode.appendChild(filename_name)

annotation.appendChild(filenamenode)

source = doc.createElement('source')

annotation.appendChild(source)

database = doc.createElement('database')

database.appendChild(doc.createTextNode('wider face Database'))

source.appendChild(database)

annotation_s = doc.createElement('annotation')

annotation_s.appendChild(doc.createTextNode('PASCAL VOC2007'))

source.appendChild(annotation_s)

image = doc.createElement('image')

image.appendChild(doc.createTextNode('flickr'))

source.appendChild(image)

flickrid = doc.createElement('flickrid')

flickrid.appendChild(doc.createTextNode('-1'))

source.appendChild(flickrid)

owner = doc.createElement('owner')

annotation.appendChild(owner)

flickrid_o = doc.createElement('flickrid')

flickrid_o.appendChild(doc.createTextNode('muke'))

owner.appendChild(flickrid_o)

name_o = doc.createElement('name')

name_o.appendChild(doc.createTextNode('muke'))

owner.appendChild(name_o)

size = doc.createElement('size')

annotation.appendChild(size)

width = doc.createElement('width')

width.appendChild(doc.createTextNode(str(saveimg.shape[1])))

height = doc.createElement('height')

height.appendChild(doc.createTextNode(str(saveimg.shape[0])))

depth = doc.createElement('depth')

depth.appendChild(doc.createTextNode(str(saveimg.shape[2])))

size.appendChild(width)

size.appendChild(height)

size.appendChild(depth)

segmented = doc.createElement('segmented')

segmented.appendChild(doc.createTextNode('0'))

annotation.appendChild(segmented)

for i in range(len(bboxes)):

bbox = bboxes[i]

objects = doc.createElement('object')

annotation.appendChild(objects)

object_name = doc.createElement('name')

object_name.appendChild(doc.createTextNode('face'))

objects.appendChild(object_name)

pose = doc.createElement('pose')

pose.appendChild(doc.createTextNode('Unspecified'))

objects.appendChild(pose)

truncated = doc.createElement('truncated')

truncated.appendChild(doc.createTextNode('0'))

objects.appendChild(truncated)

difficult = doc.createElement('difficult')

difficult.appendChild(doc.createTextNode('0'))

objects.appendChild(difficult)

bndbox = doc.createElement('bndbox')

objects.appendChild(bndbox)

xmin = doc.createElement('xmin')

xmin.appendChild(doc.createTextNode(str(bbox[0])))

bndbox.appendChild(xmin)

ymin = doc.createElement('ymin')

ymin.appendChild(doc.createTextNode(str(bbox[1])))

bndbox.appendChild(ymin)

xmax = doc.createElement('xmax')

xmax.appendChild(doc.createTextNode(str(bbox[0] + bbox[2])))

bndbox.appendChild(xmax)

ymax = doc.createElement('ymax')

ymax.appendChild(doc.createTextNode(str(bbox[1] + bbox[3])))

bndbox.appendChild(ymax)

f = open(xmlpath, "w")

f.write(doc.toprettyxml(indent=''))

f.close()

rootdir = "/home/aiserver/muke/dataset/widerface"

gtfile = "/home/aiserver/muke/dataset/widerface/data/wider_face_split/" \

"wider_face_train_bbx_gt.txt";

im_folder = "/home/aiserver/muke/dataset/widerface/data/WIDER_train/images";

##这里可以是test也可以是val

fwrite = open("/home/aiserver/muke/dataset/widerface/ImageSets/Main/train.txt", "w")

with open(gtfile, "r") as gt:

while(True):

gt_con = gt.readline()[:-1]

if gt_con is None or gt_con == "":

break

im_path = im_folder + "/" + gt_con;

print(im_path)

im_data = cv2.imread(im_path)

if im_data is None:

continue

##需要注意的一点是，图片直接经过resize之后，会存在更多的长宽比例，所以我们直接加pad

sc = max(im_data.shape)

im_data_tmp = numpy.zeros([sc, sc, 3], dtype=numpy.uint8)

off_w = (sc - im_data.shape[1]) // 2

off_h = (sc - im_data.shape[0]) // 2

##对图片进行周围填充，填充为正方形

im_data_tmp[off_h:im_data.shape[0]+off_h, off_w:im_data.shape[1]+off_w, ...] = im_data

im_data = im_data_tmp

# cv2.imshow("1", im_data)

# cv2.waitKey(0)

numbox = int(gt.readline())

#numbox = 0

bboxes = []

for i in range(numbox):

line = gt.readline()

infos = line.split(" ")

#x y w h ---

#去掉最后一个（\n）

for j in range(infos.__len__() - 1):

infos[j] = int(infos[j])

##注意这里加入了数据清洗

##保留resize到640×640 尺寸在8×8以上的人脸

if infos[2] * 80 < im_data.shape[1] or infos[3] * 80 < im_data.shape[0]:

continue

bbox = (infos[0] + off_w, infos[1] + off_h, infos[2], infos[3])

# cv2.rectangle(im_data, (int(infos[0]) + off_w, int(infos[1]) + off_h),

# (int(infos[0]) + off_w + int(infos[2]), int(infos[1]) + off_h + int(infos[3])),

# color=(0, 0, 255), thickness=1)

bboxes.append(bbox)

# cv2.imshow("1", im_data)

# cv2.waitKey(0)

filename = gt_con.replace("/", "_")

fwrite.write(filename.split(".")[0] + "\n")

cv2.imwrite("{}/JPEGImages/{}".format(rootdir, filename), im_data)

xmlpath = "{}/Annotations/{}.xml".format(rootdir, filename.split(".")[0])

writexml(filename, im_data, bboxes, xmlpath)

fwrite.close()

5、查看人脸图像标注效果代码：

import os,glob,cv2

import xml.etree.ElementTree as ET

import numpy as np

dir_path='/home/data/project/deep-learning-for-image-processing-master/pytorch_object_detection/Face_recognition/VOC2007/val'

def get_bboxes(xml_path):

tree = ET.parse(open(xml_path, 'rb'))

root = tree.getroot()

bboxes, cls = [], []

for obj in root.iter('object'):

obj_cls = obj.find('name').text

xmlbox = obj.find('bndbox')

xmin = float(xmlbox.find('xmin').text)

xmax = float(xmlbox.find('xmax').text)

ymin = float(xmlbox.find('ymin').text)

ymax = float(xmlbox.find('ymax').text)

bboxes.append([xmin, ymin, xmax, ymax])

cls.append(obj_cls)

bboxes = np.asarray(bboxes, np.int)

return bboxes, cls

img_dir=os.path.join(dir_path,'JPEGImages')

xml_dir=os.path.join(dir_path,'Annotations')

image_path_list=glob.glob(img_dir+'/*')

for image_path in image_path_list:

image_,image_name=os.path.split(image_path) # 返回文件路径与文件名

xml_path=os.path.join(xml_dir,os.path.splitext(image_name)[0]+'.xml')

image_data=cv2.imread(image_path)

# 读取xml文件

bboxes, cls=get_bboxes(xml_path)

for num in range(bboxes.__len__()):

print(bboxes[num],cls[num])

cv2.rectangle(image_data, (bboxes[num][0],bboxes[num][1]), (bboxes[num][2],bboxes[num][3]), (255, 0, 255), 2)

# print([bboxes[num][0],bboxes[num][1]])

cv2.putText(image_data, cls[num], (bboxes[num][0],bboxes[num][1]), cv2.FONT_HERSHEY_SIMPLEX, 0.75, (255, 0, 0), 2)

cv2.imshow("image",image_data)

cv2.waitKey(0)

cv2.destroyAllWindows()

6、将VOC数据集打包成TF-Record的格式

（1）将利用wideface.py处理过后整理得到的图像数据与xml数据存放目录结构如下所示： ImageSets：用于存放图像数据 Annotations：存放xml标签文件 ImageSets/Main：用于存放图像名称索引

-- widerface

|-- train

| |-- Annotations

| |-- ImageSets

| | `-- Main

| | `-- train.txt

| `-- JPEGImages

`-- val

|-- Annotations

|-- ImageSets

| `-- Main

| `-- train.txt

`-- JPEGImages

（2）制作类比标签文件在上图目录位置创建face_label_map.pbtxt文件，内容如下：

item {

name: "face"

id: 1

display_name: "face"

}

（3）打包成TF-recorder格式文件将以下代码复制到models-1.13.0/research/object_detection/dataset_tools目录下，命名为：create_pascal_tf_record.py

# Licensed under the Apache License, Version 2.0 (the "License");

# you may not use this file except in compliance with the License.

# You may obtain a copy of the License at

# http://www.apache.org/licenses/LICENSE-2.0

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

# ==============================================================================

r"""Convert raw PASCAL dataset to TFRecord for object_detection.

Example usage:

python object_detection/dataset_tools/create_pascal_tf_record.py \

--data_dir=/home/user/VOCdevkit \

--year=VOC2012 \

--output_path=/home/user/pascal.record

"""

from __future__ import absolute_import

from __future__ import division

from __future__ import print_function

import hashlib

import io

import logging

import os

from lxml import etree

import PIL.Image

import tensorflow as tf

from object_detection.utils import dataset_util

from object_detection.utils import label_map_util

flags = tf.app.flags

flags.DEFINE_string('data_dir', '', 'Root directory to raw PASCAL VOC dataset.') # gqr：此路径应该执行wider face上一层文件夹

flags.DEFINE_string('set', 'train', 'Convert training set, validation set or '

'merged set.')

flags.DEFINE_string('annotations_dir', 'Annotations',

'(Relative) path to annotations directory.')

flags.DEFINE_string('year', 'VOC2007', 'Desired challenge year.')

flags.DEFINE_string('output_path', '', 'Path to output TFRecord') # gqr:打包好数据存放路径

flags.DEFINE_string('label_map_path', 'data/pascal_label_map.pbtxt',

'Path to label map proto') # gqr:类别id存放地址

flags.DEFINE_boolean('ignore_difficult_instances', False, 'Whether to ignore '

'difficult instances')

FLAGS = flags.FLAGS

SETS = ['train', 'val', 'trainval', 'test'] # gqr:修改

YEARS = ['VOC2007', 'VOC2012', 'widerface'] # gqr:修改

def dict_to_tf_example(data,

dataset_directory,

label_map_dict,

ignore_difficult_instances=False,

image_subdirectory='JPEGImages'):

"""Convert XML derived dict to tf.Example proto.

Notice that this function normalizes the bounding box coordinates provided

by the raw data.

Args:

data: dict holding PASCAL XML fields for a single image (obtained by

running dataset_util.recursive_parse_xml_to_dict)

dataset_directory: Path to root directory holding PASCAL dataset

label_map_dict: A map from string label names to integers ids.

ignore_difficult_instances: Whether to skip difficult instances in the

dataset (default: False).

image_subdirectory: String specifying subdirectory within the

PASCAL dataset directory holding the actual image data.

Returns:

example: The converted tf.Example.

Raises:

ValueError: if the image pointed to by data['filename'] is not a valid JPEG

"""

img_path = os.path.join(data['folder'],FLAGS.set, image_subdirectory, data['filename'])

# print('*******************',img_path)

full_path = os.path.join(dataset_directory, img_path)

# print('*******************',full_path)

with tf.gfile.GFile(full_path, 'rb') as fid:

encoded_jpg = fid.read()

encoded_jpg_io = io.BytesIO(encoded_jpg)

image = PIL.Image.open(encoded_jpg_io)

if image.format != 'JPEG':

raise ValueError('Image format not JPEG')

key = hashlib.sha256(encoded_jpg).hexdigest()

width = int(data['size']['width'])

height = int(data['size']['height'])

xmin = []

ymin = []

xmax = []

ymax = []

classes = []

classes_text = []

truncated = []

poses = []

difficult_obj = []

if 'object' in data:

for obj in data['object']:

difficult = bool(int(obj['difficult']))

if ignore_difficult_instances and difficult:

continue

difficult_obj.append(int(difficult))

xmin.append(float(obj['bndbox']['xmin']) / width)

ymin.append(float(obj['bndbox']['ymin']) / height)

xmax.append(float(obj['bndbox']['xmax']) / width)

ymax.append(float(obj['bndbox']['ymax']) / height)

classes_text.append(obj['name'].encode('utf8'))

classes.append(label_map_dict[obj['name']])

truncated.append(int(obj['truncated']))

poses.append(obj['pose'].encode('utf8'))

example = tf.train.Example(features=tf.train.Features(feature={

'image/height': dataset_util.int64_feature(height),

'image/width': dataset_util.int64_feature(width),

'image/filename': dataset_util.bytes_feature(

data['filename'].encode('utf8')),

'image/source_id': dataset_util.bytes_feature(

data['filename'].encode('utf8')),

'image/key/sha256': dataset_util.bytes_feature(key.encode('utf8')),

'image/encoded': dataset_util.bytes_feature(encoded_jpg),

'image/format': dataset_util.bytes_feature('jpeg'.encode('utf8')),

'image/object/bbox/xmin': dataset_util.float_list_feature(xmin),

'image/object/bbox/xmax': dataset_util.float_list_feature(xmax),

'image/object/bbox/ymin': dataset_util.float_list_feature(ymin),

'image/object/bbox/ymax': dataset_util.float_list_feature(ymax),

'image/object/class/text': dataset_util.bytes_list_feature(classes_text),

'image/object/class/label': dataset_util.int64_list_feature(classes),

'image/object/difficult': dataset_util.int64_list_feature(difficult_obj),

'image/object/truncated': dataset_util.int64_list_feature(truncated),

'image/object/view': dataset_util.bytes_list_feature(poses),

}))

return example

def main(_):

if FLAGS.set not in SETS:

raise ValueError('set must be in : {}'.format(SETS))

if FLAGS.year not in YEARS:

raise ValueError('year must be in : {}'.format(YEARS))

data_dir = FLAGS.data_dir

years = ['fddb', 'widerface'] # gqr:修改

if FLAGS.year != 'merged':

years = [FLAGS.year]

writer = tf.python_io.TFRecordWriter(FLAGS.output_path) # gqr:定义Tf-record的写入的实例

label_map_dict = label_map_util.get_label_map_dict(FLAGS.label_map_path)

for year in years:

logging.info('Reading from PASCAL %s dataset.', year)

examples_path = os.path.join(data_dir, year,FLAGS.set, 'ImageSets', 'Main',FLAGS.set + '.txt') # gqr：修改

annotations_dir = os.path.join(data_dir, year,FLAGS.set,FLAGS.annotations_dir)

examples_list = dataset_util.read_examples_list(examples_path)

for idx, example in enumerate(examples_list):

if idx % 100 == 0:

logging.info('On image %d of %d', idx, len(examples_list))

path = os.path.join(annotations_dir, example + '.xml')

print('---------------------',path)

with tf.gfile.GFile(path, 'r') as fid:

xml_str = fid.read()

xml = etree.fromstring(xml_str)

data = dataset_util.recursive_parse_xml_to_dict(xml)['annotation']

tf_example = dict_to_tf_example(data, FLAGS.data_dir, label_map_dict,

FLAGS.ignore_difficult_instances)

writer.write(tf_example.SerializeToString())

writer.close()

if __name__ == '__main__':

tf.app.run()

python ./object_detection/dataset_tools/create_pascal_tf_record.py \

--data_dir=/home/data/project/deep-learning-for-image-processing-master/pytorch_object_detection/Face_recognition \

--year=widerface \

--output_path=../TF-recorder-data/face_val.record \

--set=train \

--label_map_path=object_detection/data/face_label_map.pbtxt

在models-1.13.0/research/目录下执行上述语句，训练集与验证集数据需要修改分别打包 –data_dir：为存放数据集的目录，widerface上一级目录 –year：数据集文件夹名称 –output_path：生成的TF-record文件的路径 –set：打包训练集是为train，验证集时为val –label_map_path：标签键值存放路径

在执行上述命令时可能会出现以下报错：

ModuleNotFoundError: No module named ‘object_detection’

解决方案：需在models-1.13.0/research/目录下执行：

python setup.py install

5、训练模型（ssd_resnet50_v1_fpn）

class SSDResnet50V1FpnFeatureExtractor(_SSDResnetV1FpnFeatureExtractor):

"""SSD Resnet50 V1 FPN feature extractor."""

def __init__(self,

is_training,

depth_multiplier,

min_depth,

pad_to_multiple,

conv_hyperparams_fn,

fpn_min_level=3,

fpn_max_level=7,

additional_layer_depth=256,

reuse_weights=None,

use_explicit_padding=False,

use_depthwise=False,

override_base_feature_extractor_hyperparams=False):

"""SSD Resnet50 V1 FPN feature extractor based on Resnet v1 architecture.

Args:

is_training: whether the network is in training mode.

depth_multiplier: float depth multiplier for feature extractor.

UNUSED currently.

min_depth: minimum feature extractor depth. UNUSED Currently.

pad_to_multiple: the nearest multiple to zero pad the input height and

width dimensions to.

conv_hyperparams_fn: A function to construct tf slim arg_scope for conv2d

and separable_conv2d ops in the layers that are added on top of the

base feature extractor.

fpn_min_level: the minimum level in feature pyramid networks.

fpn_max_level: the maximum level in feature pyramid networks.

additional_layer_depth: additional feature map layer channel depth.

reuse_weights: Whether to reuse variables. Default is None.

use_explicit_padding: Whether to use explicit padding when extracting

features. Default is False. UNUSED currently.

use_depthwise: Whether to use depthwise convolutions. UNUSED currently.

override_base_feature_extractor_hyperparams: Whether to override

hyperparameters of the base feature extractor with the one from

`conv_hyperparams_fn`.

"""

super(SSDResnet50V1FpnFeatureExtractor, self).__init__(

is_training,

depth_multiplier,

min_depth,

pad_to_multiple,

conv_hyperparams_fn,

resnet_v1.resnet_v1_50,

'resnet_v1_50',

'fpn',

fpn_min_level,

fpn_max_level,

additional_layer_depth,

reuse_weights=reuse_weights,

use_explicit_padding=use_explicit_padding,

use_depthwise=use_depthwise,

override_base_feature_extractor_hyperparams=

override_base_feature_extractor_hyperparams)

（1）修改网络配置文件找到目录models-1.13.0/research/object_detection/samples/configs下的ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync.config文件并复制重命名为ssd_resnet50_v1_fpn_shared_box_predictor_640x640_face_sync.config并进行如下修改：修改第15行的num_classes:1 （因为为人脸检测，类别为1）因为此次训练我不打算使用预训练权重，所以将第139行注释掉，并根据显卡显存大小设置合适的batch_size

将第179行与192行分别改成自己训练集与验证集TF-record文件的存放地址，将第181行与第194改成自己标签类别文件的路径（2）开始训练在**models-1.13.0/research/object_detection/**目录下的model_main.py文件训练时需要传入的参数如下：

flags.DEFINE_string(

'model_dir', None, 'Path to output model directory '

'where event and checkpoint files will be written.')

flags.DEFINE_string('pipeline_config_path', None, 'Path to pipeline config '

'file.')

flags.DEFINE_integer('num_train_steps', None, 'Number of train steps.')

flags.DEFINE_boolean('eval_training_data', False,

'If training data should be evaluated for this job. Note '

'that one call only use this in eval-only mode, and '

'`checkpoint_dir` must be supplied.')

flags.DEFINE_integer('sample_1_of_n_eval_examples', 1, 'Will sample one of '

'every n eval input examples, where n is provided.')

flags.DEFINE_integer('sample_1_of_n_eval_on_train_examples', 5, 'Will sample '

'one of every n train input examples for evaluation, '

'where n is provided. This is only used if '

'`eval_training_data` is True.')

flags.DEFINE_string(

'hparams_overrides', None, 'Hyperparameter overrides, '

'represented as a string containing comma-separated '

'hparam_name=value pairs.')

flags.DEFINE_string(

'checkpoint_dir', None, 'Path to directory holding a checkpoint. If '

'`checkpoint_dir` is provided, this binary operates in eval-only mode, '

'writing resulting metrics to `model_dir`.')

flags.DEFINE_boolean(

'run_once', False, 'If running in eval-only mode, whether to run just '

'one round of eval vs running continuously (default).'

)

FLAGS = flags.FLAGS

在**models-1.13.0/research/**目录下执行以下语句：

python object_detection/model_main.py --pipeline_config_path=object_detection/samples/configs/ssd_resnet50_v1_fpn_shared_box_predictor_640x640_face_sync.config \

--model_dir=../result_model \

--num_train_steps=100000 \

--alsologtostder

–pipeline_config_path：模型配置文件存放路径 –model_dir：模型与日志文件存放路径 –alsologtostder：显示训练信息

6、模型优化注意事项

与优化参数相关的文件是models-1.13.0/research/object_detection/samples/configs目录下的ssd_resnet50_v1_fpn_shared_box_predictor_640x640_face_sync.config配置文件上图中：人脸框的长宽比例一般为aspect_ratios:[1.0，2.0，0.5]三个比例参数输入的图像大小会被reshape到（640,640）大小的尺寸，图像缩放的越小，会损失掉原图像中的图像信息。 640x640，384x384，512x512 特征提取主干网络使用ssd_resnet50_v1_fpn结构，采集3~7层的特征图上图中分类的loss使用focal loss（考虑到了样本不平衡的问题），bbox回归loss使用smooth_l1

上图中为关于NMS的部分，由于图像中的人脸一般为小目标，所以建议IOU的阈值（iou_threshold）设置的不要过大上图中： fine_tune_checkpoint：为预训练权重路径，本次不使用，所以注解掉 batch_size：结合自身显存大小设置 num_steps：为训练迭代次数，本次设置100000 次上图中为数据增强方式上图中为优化器设置相关参数

7、训练好的模型转成pb文件

在上图中的目录下：↑↑↑ 在models-1.13.0/research/目录下执行：

python object_detection/export_inference_graph \

--input_type image_tensor \

--pipeline_config_path path/to/ssd_inception_v2.config \

--trained_checkpoint_prefix path/to/model.ckpt-xxxxx \ # xxxxx 为迭代次数

--output_directory path/to/exported_model_directory

参数说明： –input_type：传入数据类型：image_tensor –pipeline_config_path：模型配置文件存放路径 –trained_checkpoint_prefix：训练模型存放路径：model.ckpt-xxxxx \ # xxxxx 为迭代次数 –output_directory：转化得到的pb文件存放路径

上述指令注意路径问题，要不会报错：

python object_detection/export_inference_graph_face.py \

--input_type=image_tensor \

--pipeline_config_path=object_detection/samples/configs/ssd_resnet50_v1_fpn_shared_box_predictor_640x640_face_sync.config \

--trained_checkpoint_prefix=../result_model/model.ckpt-100000 \

--output_directory export_graph_result

转换成功后，在指定的路径下，生成以下文件：

注意：可将将上述指令写入到sh脚本脚本中：echo xxxxxxxxx >> pd.sh 方便今后调用。

8、模型测试

在目录/models-1.13.0/research/object_detection/下新建脚本文件test_face_graph.py如下

import tensorflow as tf

import os,glob,cv2

from utils import visualization_utils as vis_util

from PIL import Image

import numpy as np

from object_detection.utils import ops as utils_ops

from utils import label_map_util

# Path to frozen detection graph. This is the actual model that is used for the object detection.

PATH_TO_FROZEN_GRAPH = '../export_graph_result/frozen_inference_graph.pb' # gqr:转换pb文件的存放路径

# List of the strings that is used to add correct label for each box.

PATH_TO_LABELS = os.path.join('data', 'face_label_map.pbtxt') # gqr:类别字典文件

category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)

# gqr：加载模型进缓存

detection_graph = tf.Graph()

with detection_graph.as_default():

od_graph_def = tf.GraphDef()

with tf.gfile.GFile(PATH_TO_FROZEN_GRAPH, 'rb') as fid:

serialized_graph = fid.read()

od_graph_def.ParseFromString(serialized_graph)

tf.import_graph_def(od_graph_def, name='')

# gqr：获取图像数据

image_path_list=glob.glob("../face_test/*") #

# gqr:定义图像尺寸大小

Image_size=(640,640)

# gqr：模型前向推理

def run_inference_for_single_image(image, graph):

with graph.as_default():

with tf.Session() as sess:

# Get handles to input and output tensors

ops = tf.get_default_graph().get_operations()

all_tensor_names = {output.name for op in ops for output in op.outputs}

tensor_dict = {}

# gqr :key为需要获取到的graph的节点

for key in [

'num_detections', 'detection_boxes', 'detection_scores',

'detection_classes', 'detection_masks'

tensor_name = key + ':0'

if tensor_name in all_tensor_names:

tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(

tensor_name)

if 'detection_masks' in tensor_dict:

# The following processing is only for single image

detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])

detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])

# Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.

real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)

detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])

detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])

detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(

detection_masks, detection_boxes, image.shape[0], image.shape[1])

detection_masks_reframed = tf.cast(

tf.greater(detection_masks_reframed, 0.5), tf.uint8)

# Follow the convention by adding back the batch dimension

tensor_dict['detection_masks'] = tf.expand_dims(

detection_masks_reframed, 0)

image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')

# gqr:喂入数据，进行图例

output_dict = sess.run(tensor_dict,

feed_dict={image_tensor: np.expand_dims(image, 0)})

# all outputs are float32 numpy arrays, so convert types as appropriate

output_dict['num_detections'] = int(output_dict['num_detections'][0])

output_dict['detection_classes'] = output_dict[

'detection_classes'][0].astype(np.uint8)

output_dict['detection_boxes'] = output_dict['detection_boxes'][0]

output_dict['detection_scores'] = output_dict['detection_scores'][0]

if 'detection_masks' in output_dict:

output_dict['detection_masks'] = output_dict['detection_masks'][0]

return output_dict

for image_path in image_path_list:

image_data=cv2.imread(image_path)

image_shape=image_data.shape

new_image=cv2.resize(image_data,Image_size)

# gqr:对图像进行前向推理

output_dict=run_inference_for_single_image(new_image,detection_graph)

for index in range(len(output_dict["detection_scores"])):

if output_dict["detection_scores"][index]>0.4: # gqr：利用置信度进行过滤，0.6

bbox=output_dict["detection_boxes"][index]

# gqr：利用ssd预测的坐标取值在0~1之间，所以需要需要处理到原先尺寸

y1=int(bbox[0]*Image_size[0]) # gqr：y1

x1=int(bbox[1]*Image_size[1]) # gqr：x1

y2=int(bbox[2]*Image_size[0]) # gqr：y2

x2=int(bbox[3]*Image_size[1]) # gqr：x2

cv2.rectangle(new_image,(x1,y1),(x2,y2),(255, 0, 255),2)

cv2.putText(new_image,"face",(x1,y1) , cv2.FONT_HERSHEY_SIMPLEX, 0.75, (0, 255, 0), 2)

cv2.imshow("new_image",new_image)

cv2.waitKey(1)

cv2.destroyAllWindows()

效果如下：

注意：可以通过调整置信度阈值来设置检测目标的效果

Flask封装人脸检测模型web服务

我本次的开发采用在Windows的微信小程序访问Ubuntu中flask封装的人脸检测模型web接口的形式 1、在Ubuntu服务器使用flask封装人脸检测web服务安装flsk、gevent

pip install flask gevent

文件目录结构

|-- object_detection_tf

| |-- data

| | `-- face_label_map.pbtxt

| |-- export_graph_result

| | |-- checkpoint

| | |-- frozen_inference_graph.pb

| | |-- model.ckpt.data-00000-of-00001

| | |-- model.ckpt.index

| | |-- model.ckpt.meta

| | |-- pipeline.config

| | `-- saved_model

| | |-- saved_model.pb

| | `-- variables

| |-- face_test

| | |-- 1.jpg

| | |-- 2.jpeg

| | |-- 3.jpg

| | `-- 5.jpg

| |-- test_face_graph.py

| `-- utils

| |-- label_map_util.py

| |-- ops.py

| `-- visualization_utils.py

`-- server_Tensorflow.py

其中server_Tensorflow.py为启动服务程序，代码如下所示：

2、微信开发者工具准备本次我用的微信开发者工具采用的版本是v1.02.1904091-x64，注意版本问题，否则会出现版本冲突报错，下载地址：https://download.csdn.net/download/guoqingru0311/88299030

下载安装完毕后，出现以下界面，使用测试号即可：主要目录结构如下图所示：

文章链接

评论可见，请评论后查看内容，谢谢！！！

您阅读本篇文章共花了：

金钥匙

python+TensorFlow实现人脸识别智能小程序的项目（包含TensorFlow版本与Pytorch版本）（二）

人工智能 python 深度学习 cnn 基于TensorFlow的LibriSpeech语音识别

人工智能深度学习 python 吴恩达deeplearning.ai:Tensorflow训练一个神经网络

发表评论取消回复

金钥匙

python+TensorFlow实现人脸识别智能小程序的项目（包含TensorFlow版本与Pytorch版本）（二）

人工智能 python 深度学习 cnn 基于TensorFlow的LibriSpeech语音识别

人工智能 深度学习 python 吴恩达deeplearning.ai:Tensorflow训练一个神经网络

相关文章

发表评论取消回复

人工智能深度学习 python 吴恩达deeplearning.ai:Tensorflow训练一个神经网络