DeepFashion实现服装检测搭配
作者 | 李秋键
出品 | AI科技大本营
头图 | CSDN付费下载于视觉中国
在我们日常生活中,计算机视觉扮演着十分重要的角色,尤其是在服装、珠宝、装饰等外观对人们的选择起着重大作用的领域中。因此,研究用户喜好和商品特性的视觉效果变成了一个很重要的任务。
近年来,服装等商品的搭配、推荐受到了广泛的关注,并在基于视觉的推荐问题中取得了一定的成果。但是,目前工作对于商品的表征,往往是在一个通用的视觉特征空间中,比如CNN (Convolutional Neural Networks)网络的输出层特征。这样的视觉特征表示,对商品的类别比较敏感,却难以建模商品的不同风格。
这样的视觉特征表示很难有效地用于推荐系统中,因为相似风格的商品往往会被同一个人同时购买,但在视觉特征空间中却并不相似,这就为提升推荐效果带来了难度。而在论文DeepFashion: Powering Robust Clothes Recognition and Retrieval withRich Annotations (CVPR 2016)中提出的基于FashionNet实现的服装关键点检测恰好解决了这个问题。
实验前的准备
首先我们使用的python版本是3.6.5所用到的模块如下:
opencv是将用来进行图像处理和图片保存读取等操作。
numpy模块用来处理矩阵数据的运算。
Tensorflow-gpu模块是常用的用来搭建模型和训练的深度学习框架,通过调用GPU达到加速的效果。
scikit-learn是python中常见的机器学习集成库。
PIL库可以完成对图像进行批处理、生成图像预览、图像格式转换和图像处理操作,包括图像基本处理、像素处理、颜色处理等。
网络模型的定义和训练
FashionNet的前向计算过程总共分为三个阶段:第一个阶段,将一张衣服图片输入到网络中的蓝色分支,去预测衣服的关键点是否可见和位置。第二个阶段,根据在上一步预测的关键点位置,关键点池化层(landmark pooling layer)得到衣服的局部特征。第三个阶段,将“fc6 global”层的全局特征和“fc6 local”的局部特征拼接在一起组成“fc7_fusion”,作为最终的图像特征。FashionNet引入了四种损失函数,并采用一种迭代训练的方式去优化。这些损失分别为:回归损失对应于关键点定位,softmax损失对应于关键点是否可见和衣服类别,交叉熵损失函数对应属性预测和三元组损失函数对应于衣服之间的相似度学习。作者分别从衣服分类,属性预测和衣服搜索这三个方面,将FashionNet与其他方法相比较,都取得了明显更好的效果。
(1)网络层的定义:包括优化器,分类器,网络神经元定义等。具体代码如下:
def create_model(is_input_bottleneck, is_load_weights, input_shape, output_classes, optimizer='Adagrad', learn_rate=None, decay=0.0, momentum=0.0, activation='relu', dropout_rate=0.5):
logging.debug('input_shape {}'.format(input_shape))
logging.debug('input_shape {}'.format(type(input_shape)))
# Optimizer
optimizer, learn_rate = get_optimizer(optimizer, learn_rate, decay, momentum)
# Train
if is_input_bottleneck is True:
model_inputs = Input(shape=(input_shape))
common_inputs = model_inputs
# Predict
else: #input_shape = (img_width, img_height, 3)
base_model = applications.VGG16(weights='imagenet', include_top=False, input_shape=input_shape)
#base_model = applications.inception_v3.InceptionV3(include_top=False, weights='imagenet', input_shape=input_shape)
logging.debug('base_model inputs {}'.format(base_model.input)) # shape=(?, 224, 224, 3)
logging.debug('base_model outputs {}'.format(base_model.output)) # shape=(?, 7, 7, 512)
model_inputs = base_model.input
common_inputs = base_model.output
## Model Classification
x = Flatten()(common_inputs)
x = Dense(256, activation='tanh')(x)
x = Dropout(dropout_rate)(x)
predictions_class = Dense(output_classes, activation='softmax', name='predictions_class')(x)
## Model (Regression) IOU score
x = Flatten()(common_inputs)
x = Dense(256, activation='tanh')(x)
x = Dropout(dropout_rate)(x)
x = Dense(256, activation='tanh')(x)
x = Dropout(dropout_rate)(x)
predictions_iou = Dense(1, activation='sigmoid', name='predictions_iou')(x)
## Create Model
model = Model(inputs=model_inputs, outputs=[predictions_class, predictions_iou])
# logging.debug('model summary {}'.format(model.summary()))
## Load weights
if is_load_weights is True:
model.load_weights(top_model_weights_path_load, by_name=True)
## Compile
model.compile(optimizer=optimizer,
loss={'predictions_class': 'sparse_categorical_crossentropy', 'predictions_iou': 'mean_squared_error'}, metrics=['accuracy'],
loss_weights={'predictions_class': predictions_class_weight, 'predictions_iou': predictions_iou_weight})
logging.info('optimizer:{} learn_rate:{} decay:{} momentum:{} activation:{} dropout_rate:{}'.format(
optimizer, learn_rate, decay, momentum, activation, dropout_rate))
return model
(2)模型的初始化:
def init():
global batch_size
batch_size = batch_size_train
logging.debug('batch_size{}'.format(batch_size))
global class_names
class_names =sorted(get_subdir_list(dataset_train_path))
logging.debug('class_names{}'.format(class_names))
global input_shape
input_shape = (img_width,img_height, img_channel)
logging.debug('input_shape{}'.format(input_shape))
if notos.path.exists(output_path_name):
os.makedirs(output_path_name)
if notos.path.exists(logs_path_name):
os.makedirs(logs_path_name)
if not os.path.exists(btl_path):
os.makedirs(btl_path)
if not os.path.exists(btl_train_path):
os.makedirs(btl_train_path)
if notos.path.exists(btl_val_path):
os.makedirs(btl_val_path)
(3)bottleneck文件的保存:bottleneck结构就是为了降低参数量,Bottleneck 三步走是先用PW对数据进行降维,再进行常规卷积核的卷积,最后PW对数据进行升维(类似于沙漏型)。
def save_bottleneck():
logging.debug('class_names{}'.format(class_names))
logging.debug('batch_size{}'.format(batch_size))
logging.debug('epochs{}'.format(epochs))
logging.debug('input_shape{}'.format(input_shape))
## Build the VGG16 network
model =applications.VGG16(include_top=False, weights='imagenet',input_shape=input_shape)
#model =applications.inception_v3.InceptionV3(include_top=False, weights='imagenet',input_shape=input_shape)
for train_val in ['train','validation']:
with open('bottleneck/btl_' +train_val + '.txt', 'w') as f_image:
for class_name inclass_names:
dataset_train_class_path = os.path.join(dataset_path, train_val,class_name)
logging.debug('dataset_train_class_path{}'.format(dataset_train_class_path))
images_list = []
images_name_list =[]
images_path_name =sorted(glob.glob(dataset_train_class_path + '/*.jpg'))
logging.debug('images_path_name{}'.format(len(images_path_name)))
for index, image inenumerate(images_path_name):
#logging.debug('image {}'.format(image))
img =Image.open(image)
img = preprocess_image(img)
current_batch_size = len(images_list)
#logging.debug('current_batch_size {}'.format(current_batch_size))
images_list.append(img)
image_name = image.split('/')[-1].split('.jpg')[0]
images_name_list.append(image)
images_list_arr= np.array(images_list)
# TODO: Skippingn last images of a class which do not sum up to batch_size
if(current_batch_size < batch_size-1):
continue
X =images_list_arr
bottleneck_features_train_class= model.predict(X, batch_size)
#bottleneck_features_train_class = model.predict(X, nb_train_class_samples //batch_size)
## Savebottleneck file
btl_save_file_name = btl_path + train_val + '/btl_' + train_val + '_' +class_name + '.' + str(index).zfill(7) + '.npy'
logging.info('btl_save_file_name {}'.format(btl_save_file_name))
np.save(open(btl_save_file_name, 'w'), bottleneck_features_train_class)
for name inimages_name_list:
f_image.write(str(name) + '\n')
images_list = []
images_name_list= []
def train_model():
## Build network
model =applications.VGG16(include_top=False, weights='imagenet',input_shape=input_shape)
#model =applications.inception_v3.InceptionV3(include_top=False, weights='imagenet', input_shape=input_shape)
# Get sorted bottleneck filenames in a list
btl_train_names =sorted(glob.glob(btl_train_path + '/*.npy'))
btl_val_names =sorted(glob.glob(btl_val_path + '/*.npy'))
## Train Labels
btl_train_list = []
train_labels_class = []
train_labels_iou = []
# Load bottleneckfiles to create validation set
val_data = []
model = create_model(True,False, input_shape_btl_layer, len(class_names), optimizer, learn_rate, decay,momentum, activation, dropout_rate)
logging.info('train_labels_iou{}'.format(train_labels_iou.shape))
logging.info('train_labels_class{}'.format(train_labels_class.shape))
logging.info('train_data{}'.format(train_data.shape))
logging.info('val_labels_iou{}'.format(val_labels_iou.shape))
logging.info('val_labels_class{}'.format(val_labels_class.shape))
logging.info('val_data{}'.format(val_data.shape))
# TODO: class_weight_val wrong
model.fit(train_data,[train_labels_class, train_labels_iou],
class_weight=[class_weight_val, class_weight_val], #dictionary mapping classes to a weight value, used for scaling the loss function(during training only).
epochs=epochs,
batch_size=batch_size,
validation_data=(val_data, [val_labels_class, val_labels_iou]),
callbacks=callbacks_list)
# TODO: These are not the bestweights
model.save_weights(top_model_weights_path_save)
模型的使用
(1)根据模型特征分割图片:将其中不同的部位进行分割成不同的图片块
def selective_search_bbox(image):
logging.debug('image{}'.format(image))
# load image
img = skimage.io.imread(image)
#img = Image.open(image)
width, height, channels =img.shape
logging.debug('img {}'.format(img.shape))
logging.debug('img{}'.format(type(img)))
region_pixels_threshold =(width*height)/100
logging.debug('region_pixels_threshold{}'.format(region_pixels_threshold))
# perform selective search
img_lbl, regions = selectivesearch.selective_search(img,scale=500, sigma=0.9, min_size=10)
#img_lbl, regions =selectivesearch.selective_search(img)
# logging.debug('regions{}'.format(regions))
logging.debug('regions{}'.format(len(regions)))
candidates = set()
for r in regions:
# distorted rects
x, y, w, h = r['rect']
# excluding same rectangle(with different segments)
if r['rect'] in candidates:
continue
# # excluding regionssmaller than 2000 pixels
if r['size'] < region_pixels_threshold:
logging.debug('Discarding - region_pixels_threshold - {} < {} - x:{}y:{} w:{} h:{}'.format(region_pixels_threshold, r['size'], x, y, w, h))
continue
# # Orig
# if w / h > 1.2 or h / w> 1.2:
# continue
if h != 0 and w / h > 6:
logging.debug('Discarding w/h {} - x:{} y:{} w:{} h:{}'.format(w/h, x,y, w, h))
continue
if w != 0 and h / w > 6:
logging.debug('Discardingh/w {} - x:{} y:{} w:{} h:{}'.format(h/w, x, y, w, h))
continue
candidates.add(r['rect'])
(2)模型的预测:其中包括模型的初始化,图片的读入和模型的加载与可视化显示的实现
def init():
global batch_size
batch_size = batch_size_predict
logging.debug('batch_size{}'.format(batch_size))
global input_shape
input_shape = (img_width,img_height, img_channel)
logging.debug('input_shape{}'.format(input_shape))
global class_names
# TODO: Remove hardcoding ifdataset available
class_names = ['Anorak','Bomber', 'Button-Down', 'Capris', 'Chinos', 'Coat', 'Flannel', 'Hoodie','Jeans', 'Jeggings', 'Jersey', 'Kaftan', 'Parka', 'Peacoat', 'Poncho', 'Robe','Sweatshorts', 'Trunks', 'Turtleneck']
#class_names =get_subdir_list(dataset_train_path)
logging.debug('class_names{}'.format(class_names))
def get_images():
images_path_name =sorted(glob.glob(prediction_dataset_path + '/*.jpg'))
#logging.debug('images_path_name {}'.format(images_path_name))
return images_path_name
def get_bbox(images_path_name):
# TODO: Currently for 1 imageonly
for index, image inenumerate(images_path_name):
bboxes =selective_search_bbox(image)
logging.debug('bboxes {}'.format(bboxes))
return bboxes
#model = create_model_predict((input_shape), optimizer, learn_rate, decay,momentum, activation, dropout_rate)
model = create_model(False,True, input_shape, len(class_names), optimizer, learn_rate, decay, momentum,activation, dropout_rate)
images_list = []
images_name_list = []
images_name_list2 = []
prediction_class = []
prediction_iou = []
prediction_class_prob = []
prediction_class_name = []
## Folder
prediction_dataset_path='dataset_prediction/images/'
#images_path_name =sorted(glob.glob(prediction_dataset_path + '/*.jpg'))
#for image in images_path_name:
for index, image inenumerate(images_names):
logging.debug('\n\n++++++++++++++++++++++++++++++++++++++++')
image_path_name =prediction_dataset_path + image
logging.debug('image_path_name {}'.format(image_path_name))
img =Image.open(image_path_name)
logging.debug('img{}'.format(img))
logging.debug('img len{}'.format((img.size)))
#img.save('output/a' +str(index) + '.jpg')
img = preprocess_image(img)
img = np.expand_dims(img, 0)
prediction =model.predict(img, batch_size, verbose=1)
# logging.debug('prediction{}'.format(prediction))
prediction_class_=prediction[0][0]
#logging.debug('prediction_class_ {}'.format(prediction_class_))
prediction_class.append(prediction_class_)
prediction_iou_ =prediction[1][0][0]
logging.debug('prediction_iou_{}'.format(prediction_iou_))
prediction_iou.append(prediction_iou_)
prediction_class_index =np.argmax(prediction[0])
logging.debug('prediction_class_index{}'.format(prediction_class_index))
prediction_class_prob_ =prediction[0][0][prediction_class_index]
logging.debug('prediction_class_prob_{}'.format(prediction_class_prob_))
prediction_class_prob.append(prediction_class_prob_)
prediction_class_name_ =class_names[prediction_class_index]
logging.debug('prediction_class_name_{}'.format(prediction_class_name_))
prediction_class_name.append(prediction_class_name_)
images_list.append(img)
images_name_list.append(image_path_name)
#logging.debug('prediction_class {}'.format(prediction_class))
logging.debug('prediction_iou{}'.format(prediction_iou))
logging.debug('prediction_class_prob {}'.format(prediction_class_prob))
logging.debug('prediction_class_name{}'.format(prediction_class_name))
#logging.debug('images_name_list {}'.format(images_name_list))
bboxes = []
for image_path_name inimages_name_list:
bbox_=image_path_name.split('/')[-1].split('.jpg')[0].split('-')[1]
x = int(bbox_.split('_')[0])
y = int(bbox_.split('_')[1])
w = int(bbox_.split('_')[2])
h = int(bbox_.split('_')[3])
bbox = (x, y, w, h)
bboxes.append(bbox)
bboxes = set(bboxes)
logging.debug('bboxes{}'.format(bboxes))
#orig_image_path_name =['dataset_prediction/images/img_00000061.jpg']
#orig_image_path_name =['dataset_prediction/images2/shahida-parides-floral-v-neckline-long-kaftan-dress.jpg']
orig_image_path_name =sorted(glob.glob('dataset_prediction/images' + '/*.jpg'))
logging.debug('orig_image_path_name {}'.format(orig_image_path_name))
display_bbox(orig_image_path_name, bboxes, prediction_class_name,prediction_class_prob, prediction_iou, images_name_list)
logging.debug('images_list{}'.format(len(images_list)))
images_list_arr =np.array(images_list)
logging.debug('images_list_arrtype {}'.format(type(images_list_arr)))
prediction =model.predict(images_list_arr, batch_size, verbose=1)
#prediction =model.predict(predict_data, batch_size, verbose=1)
# logging.debug('\n\nprediction\n{}'.format(prediction))
logging.debug('prediction shape{} {}'.format(len(prediction), len(prediction[0])))
print('')
for index,preds inenumerate(prediction):
for index2, pred inenumerate(preds):
#print('images_name_listindex2 : {:110} '.format(images_name_list[index2]), end='')
#print('\n')
print('images_name_listindex2 : {:110} '.format(images_name_list2[index2]), end='')
for p in pred:
print('{:8f}'.format(float(p)), end='')
print('')
print('')
效果如下图所示:
源码地址链接:
https://pan.baidu.com/s/1_XCUbhup--4b11dBOtYZvg
提取码:5wna
李秋键,CSDN 博客专家,CSDN达人课作者。硕士在读于中国矿业大学,开发有taptap安卓武侠游戏一部,vip视频解析,文意转换工具,写作机器人等项目,发表论文若干,多次高数竞赛获奖等等。
更多精彩推荐