DeepFashion实现服装检测搭配

Original 李秋键 AI科技大本营 2020-12-18

作者 | 李秋键

出品 | AI科技大本营

头图 | CSDN付费下载于视觉中国

在我们日常生活中，计算机视觉扮演着十分重要的角色，尤其是在服装、珠宝、装饰等外观对人们的选择起着重大作用的领域中。因此，研究用户喜好和商品特性的视觉效果变成了一个很重要的任务。

近年来，服装等商品的搭配、推荐受到了广泛的关注，并在基于视觉的推荐问题中取得了一定的成果。但是，目前工作对于商品的表征，往往是在一个通用的视觉特征空间中，比如CNN (Convolutional Neural Networks)网络的输出层特征。这样的视觉特征表示，对商品的类别比较敏感，却难以建模商品的不同风格。

这样的视觉特征表示很难有效地用于推荐系统中，因为相似风格的商品往往会被同一个人同时购买，但在视觉特征空间中却并不相似，这就为提升推荐效果带来了难度。而在论文DeepFashion: Powering Robust Clothes Recognition and Retrieval withRich Annotations （CVPR 2016）中提出的基于FashionNet实现的服装关键点检测恰好解决了这个问题。

实验前的准备

首先我们使用的python版本是3.6.5所用到的模块如下：

opencv是将用来进行图像处理和图片保存读取等操作。
numpy模块用来处理矩阵数据的运算。
Tensorflow-gpu模块是常用的用来搭建模型和训练的深度学习框架,通过调用GPU达到加速的效果。
scikit-learn是python中常见的机器学习集成库。
PIL库可以完成对图像进行批处理、生成图像预览、图像格式转换和图像处理操作，包括图像基本处理、像素处理、颜色处理等。

网络模型的定义和训练

FashionNet的前向计算过程总共分为三个阶段：第一个阶段，将一张衣服图片输入到网络中的蓝色分支，去预测衣服的关键点是否可见和位置。第二个阶段，根据在上一步预测的关键点位置，关键点池化层（landmark pooling layer）得到衣服的局部特征。第三个阶段，将“fc6 global”层的全局特征和“fc6 local”的局部特征拼接在一起组成“fc7_fusion”，作为最终的图像特征。FashionNet引入了四种损失函数，并采用一种迭代训练的方式去优化。这些损失分别为：回归损失对应于关键点定位，softmax损失对应于关键点是否可见和衣服类别，交叉熵损失函数对应属性预测和三元组损失函数对应于衣服之间的相似度学习。作者分别从衣服分类，属性预测和衣服搜索这三个方面，将FashionNet与其他方法相比较，都取得了明显更好的效果。

（1）网络层的定义：包括优化器，分类器，网络神经元定义等。具体代码如下：

def create_model(is_input_bottleneck, is_load_weights, input_shape, output_classes, optimizer='Adagrad', learn_rate=None, decay=0.0, momentum=0.0, activation='relu', dropout_rate=0.5):
    logging.debug('input_shape {}'.format(input_shape))
    logging.debug('input_shape {}'.format(type(input_shape)))
    # Optimizer
    optimizer, learn_rate = get_optimizer(optimizer, learn_rate, decay, momentum)
    # Train
    if is_input_bottleneck is True:
        model_inputs = Input(shape=(input_shape))
        common_inputs = model_inputs
    # Predict
    else:                                                                                               #input_shape = (img_width, img_height, 3)
        base_model = applications.VGG16(weights='imagenet', include_top=False, input_shape=input_shape)
        #base_model = applications.inception_v3.InceptionV3(include_top=False, weights='imagenet', input_shape=input_shape)
        logging.debug('base_model inputs {}'.format(base_model.input))                                  # shape=(?, 224, 224, 3)
        logging.debug('base_model outputs {}'.format(base_model.output))                                # shape=(?, 7, 7, 512)
        model_inputs = base_model.input
        common_inputs = base_model.output
    ## Model Classification
    x = Flatten()(common_inputs)
    x = Dense(256, activation='tanh')(x)
    x = Dropout(dropout_rate)(x)
    predictions_class = Dense(output_classes, activation='softmax', name='predictions_class')(x)
    ## Model (Regression) IOU score
    x = Flatten()(common_inputs)
    x = Dense(256, activation='tanh')(x)
    x = Dropout(dropout_rate)(x)
    x = Dense(256, activation='tanh')(x)
    x = Dropout(dropout_rate)(x)
    predictions_iou = Dense(1, activation='sigmoid', name='predictions_iou')(x)
    ## Create Model
    model = Model(inputs=model_inputs, outputs=[predictions_class, predictions_iou])
    # logging.debug('model summary {}'.format(model.summary()))
    ## Load weights
    if is_load_weights is True:
        model.load_weights(top_model_weights_path_load, by_name=True)
    ## Compile
    model.compile(optimizer=optimizer,
                  loss={'predictions_class': 'sparse_categorical_crossentropy', 'predictions_iou': 'mean_squared_error'}, metrics=['accuracy'],
                  loss_weights={'predictions_class': predictions_class_weight, 'predictions_iou': predictions_iou_weight})
    logging.info('optimizer:{}  learn_rate:{}  decay:{}  momentum:{}  activation:{}  dropout_rate:{}'.format(
        optimizer, learn_rate, decay, momentum, activation, dropout_rate))
return model

（2）模型的初始化：

def init():

    global batch_size

    batch_size = batch_size_train

    logging.debug('batch_size{}'.format(batch_size))

    global class_names

    class_names =sorted(get_subdir_list(dataset_train_path))

    logging.debug('class_names{}'.format(class_names))

    global input_shape

    input_shape = (img_width,img_height, img_channel)

    logging.debug('input_shape{}'.format(input_shape))

    if notos.path.exists(output_path_name):

       os.makedirs(output_path_name)

    if notos.path.exists(logs_path_name):

        os.makedirs(logs_path_name)

    if not os.path.exists(btl_path):

        os.makedirs(btl_path)

    if not os.path.exists(btl_train_path):

        os.makedirs(btl_train_path)

    if notos.path.exists(btl_val_path):

        os.makedirs(btl_val_path)

（3）bottleneck文件的保存：bottleneck结构就是为了降低参数量，Bottleneck 三步走是先用PW对数据进行降维，再进行常规卷积核的卷积，最后PW对数据进行升维（类似于沙漏型）。

def save_bottleneck():

    logging.debug('class_names{}'.format(class_names))

    logging.debug('batch_size{}'.format(batch_size))

    logging.debug('epochs{}'.format(epochs))

    logging.debug('input_shape{}'.format(input_shape))

    ## Build the VGG16 network

    model =applications.VGG16(include_top=False, weights='imagenet',input_shape=input_shape)

    #model =applications.inception_v3.InceptionV3(include_top=False, weights='imagenet',input_shape=input_shape)

    for train_val in ['train','validation']:

       with open('bottleneck/btl_' +train_val + '.txt', 'w') as f_image:

            for class_name inclass_names:

               dataset_train_class_path = os.path.join(dataset_path, train_val,class_name)

                logging.debug('dataset_train_class_path{}'.format(dataset_train_class_path))

                images_list = []

                images_name_list =[]

                images_path_name =sorted(glob.glob(dataset_train_class_path + '/*.jpg'))

                logging.debug('images_path_name{}'.format(len(images_path_name)))

                for index, image inenumerate(images_path_name):

                    #logging.debug('image {}'.format(image))

                    img =Image.open(image)

                    img = preprocess_image(img)

                   current_batch_size = len(images_list)

                    #logging.debug('current_batch_size {}'.format(current_batch_size))

                   images_list.append(img)

                    image_name = image.split('/')[-1].split('.jpg')[0]

                   images_name_list.append(image)

                    images_list_arr= np.array(images_list)

                    # TODO: Skippingn last images of a class which do not sum up to batch_size

                    if(current_batch_size < batch_size-1):

                        continue

                    X =images_list_arr

                    bottleneck_features_train_class= model.predict(X, batch_size)

                    #bottleneck_features_train_class = model.predict(X, nb_train_class_samples //batch_size)

                    ## Savebottleneck file

                   btl_save_file_name = btl_path + train_val + '/btl_' + train_val + '_' +class_name + '.' + str(index).zfill(7) + '.npy'

                   logging.info('btl_save_file_name {}'.format(btl_save_file_name))

                   np.save(open(btl_save_file_name, 'w'), bottleneck_features_train_class)

                    for name inimages_name_list:

                       f_image.write(str(name) + '\n')

                    images_list = []

                    images_name_list= []

（4）模型的训练:读入搭建好的网络层和使用bottleneck files去创建验证集

def train_model():

    ## Build network

    model =applications.VGG16(include_top=False, weights='imagenet',input_shape=input_shape)

    #model =applications.inception_v3.InceptionV3(include_top=False, weights='imagenet', input_shape=input_shape)

    # Get sorted bottleneck filenames in a list

    btl_train_names =sorted(glob.glob(btl_train_path + '/*.npy'))

    btl_val_names =sorted(glob.glob(btl_val_path + '/*.npy'))

    ## Train Labels

    btl_train_list = []

    train_labels_class = []

    train_labels_iou = []

    # Load bottleneckfiles to create validation set

    val_data = []

    model = create_model(True,False, input_shape_btl_layer, len(class_names), optimizer, learn_rate, decay,momentum, activation, dropout_rate)

    logging.info('train_labels_iou{}'.format(train_labels_iou.shape))

    logging.info('train_labels_class{}'.format(train_labels_class.shape))

    logging.info('train_data{}'.format(train_data.shape))

    logging.info('val_labels_iou{}'.format(val_labels_iou.shape))

    logging.info('val_labels_class{}'.format(val_labels_class.shape))

    logging.info('val_data{}'.format(val_data.shape))

    # TODO: class_weight_val wrong

    model.fit(train_data,[train_labels_class, train_labels_iou],

           class_weight=[class_weight_val, class_weight_val],                                      #dictionary mapping classes to a weight value, used for scaling the loss function(during training only).

            epochs=epochs,

            batch_size=batch_size,

           validation_data=(val_data, [val_labels_class, val_labels_iou]),

           callbacks=callbacks_list)

    # TODO: These are not the bestweights

   model.save_weights(top_model_weights_path_save)

模型的使用

（1）根据模型特征分割图片：将其中不同的部位进行分割成不同的图片块

def selective_search_bbox(image):

    logging.debug('image{}'.format(image))

    # load image

    img = skimage.io.imread(image)

    #img = Image.open(image)

    width, height, channels =img.shape

    logging.debug('img {}'.format(img.shape))

    logging.debug('img{}'.format(type(img)))

    region_pixels_threshold =(width*height)/100

   logging.debug('region_pixels_threshold{}'.format(region_pixels_threshold))

    # perform selective search

    img_lbl, regions = selectivesearch.selective_search(img,scale=500, sigma=0.9, min_size=10)

    #img_lbl, regions =selectivesearch.selective_search(img)

    # logging.debug('regions{}'.format(regions))

    logging.debug('regions{}'.format(len(regions)))

    candidates = set()

    for r in regions:

        # distorted rects

        x, y, w, h = r['rect']

        # excluding same rectangle(with different segments)

        if r['rect'] in candidates:

            continue

        # # excluding regionssmaller than 2000 pixels

        if r['size'] < region_pixels_threshold:

           logging.debug('Discarding - region_pixels_threshold - {} < {} - x:{}y:{} w:{} h:{}'.format(region_pixels_threshold, r['size'], x, y, w, h))

            continue

        # # Orig

        # if w / h > 1.2 or h / w> 1.2:

        #     continue

        if h != 0 and w / h > 6:

           logging.debug('Discarding w/h {} - x:{} y:{} w:{} h:{}'.format(w/h, x,y, w, h))

            continue

        if w != 0 and h / w > 6:

            logging.debug('Discardingh/w {} - x:{} y:{} w:{} h:{}'.format(h/w, x, y, w, h))

            continue

        candidates.add(r['rect'])

（2）模型的预测：其中包括模型的初始化，图片的读入和模型的加载与可视化显示的实现

def init():

    global batch_size

    batch_size = batch_size_predict

    logging.debug('batch_size{}'.format(batch_size))

    global input_shape

    input_shape = (img_width,img_height, img_channel)

    logging.debug('input_shape{}'.format(input_shape))

    global class_names

    # TODO: Remove hardcoding ifdataset available

    class_names = ['Anorak','Bomber', 'Button-Down', 'Capris', 'Chinos', 'Coat', 'Flannel', 'Hoodie','Jeans', 'Jeggings', 'Jersey', 'Kaftan', 'Parka', 'Peacoat', 'Poncho', 'Robe','Sweatshorts', 'Trunks', 'Turtleneck']

    #class_names =get_subdir_list(dataset_train_path)

    logging.debug('class_names{}'.format(class_names))

def get_images():

    images_path_name =sorted(glob.glob(prediction_dataset_path + '/*.jpg'))

    #logging.debug('images_path_name {}'.format(images_path_name))

    return images_path_name

def get_bbox(images_path_name):

    # TODO: Currently for 1 imageonly

    for index, image inenumerate(images_path_name):

        bboxes =selective_search_bbox(image)

        logging.debug('bboxes {}'.format(bboxes))

        return bboxes

#model = create_model_predict((input_shape), optimizer, learn_rate, decay,momentum, activation, dropout_rate)

    model = create_model(False,True, input_shape, len(class_names), optimizer, learn_rate, decay, momentum,activation, dropout_rate)

    images_list = []

    images_name_list = []

    images_name_list2 = []

    prediction_class = []

    prediction_iou = []

    prediction_class_prob = []

    prediction_class_name = []

    ## Folder

    prediction_dataset_path='dataset_prediction/images/'

    #images_path_name =sorted(glob.glob(prediction_dataset_path + '/*.jpg'))

    #for image in images_path_name:

    for index, image inenumerate(images_names):

       logging.debug('\n\n++++++++++++++++++++++++++++++++++++++++')

        image_path_name =prediction_dataset_path + image

       logging.debug('image_path_name {}'.format(image_path_name))

        img =Image.open(image_path_name)

        logging.debug('img{}'.format(img))

        logging.debug('img len{}'.format((img.size)))

        #img.save('output/a' +str(index) + '.jpg')

        img = preprocess_image(img)

        img = np.expand_dims(img, 0)

        prediction =model.predict(img, batch_size, verbose=1)

        # logging.debug('prediction{}'.format(prediction))

       prediction_class_=prediction[0][0]

        #logging.debug('prediction_class_ {}'.format(prediction_class_))

       prediction_class.append(prediction_class_)

        prediction_iou_ =prediction[1][0][0]

        logging.debug('prediction_iou_{}'.format(prediction_iou_))

       prediction_iou.append(prediction_iou_)

        prediction_class_index =np.argmax(prediction[0])

       logging.debug('prediction_class_index{}'.format(prediction_class_index))

        prediction_class_prob_ =prediction[0][0][prediction_class_index]

       logging.debug('prediction_class_prob_{}'.format(prediction_class_prob_))

       prediction_class_prob.append(prediction_class_prob_)

        prediction_class_name_ =class_names[prediction_class_index]

       logging.debug('prediction_class_name_{}'.format(prediction_class_name_))

       prediction_class_name.append(prediction_class_name_)

        images_list.append(img)

        images_name_list.append(image_path_name)

    #logging.debug('prediction_class {}'.format(prediction_class))

    logging.debug('prediction_iou{}'.format(prediction_iou))

   logging.debug('prediction_class_prob {}'.format(prediction_class_prob))

    logging.debug('prediction_class_name{}'.format(prediction_class_name))

    #logging.debug('images_name_list {}'.format(images_name_list))

    bboxes = []

    for image_path_name inimages_name_list:

        bbox_=image_path_name.split('/')[-1].split('.jpg')[0].split('-')[1]

        x = int(bbox_.split('_')[0])

        y = int(bbox_.split('_')[1])

        w = int(bbox_.split('_')[2])

        h = int(bbox_.split('_')[3])

        bbox = (x, y, w, h)

        bboxes.append(bbox)

    bboxes = set(bboxes)

    logging.debug('bboxes{}'.format(bboxes))

    #orig_image_path_name =['dataset_prediction/images/img_00000061.jpg']

    #orig_image_path_name =['dataset_prediction/images2/shahida-parides-floral-v-neckline-long-kaftan-dress.jpg']

    orig_image_path_name =sorted(glob.glob('dataset_prediction/images' + '/*.jpg'))

   logging.debug('orig_image_path_name {}'.format(orig_image_path_name))

   display_bbox(orig_image_path_name, bboxes, prediction_class_name,prediction_class_prob, prediction_iou, images_name_list)

    logging.debug('images_list{}'.format(len(images_list)))

    images_list_arr =np.array(images_list)

    logging.debug('images_list_arrtype {}'.format(type(images_list_arr)))

    prediction =model.predict(images_list_arr, batch_size, verbose=1)

    #prediction =model.predict(predict_data, batch_size, verbose=1)

    # logging.debug('\n\nprediction\n{}'.format(prediction))

    logging.debug('prediction shape{} {}'.format(len(prediction), len(prediction[0])))

    print('')

    for index,preds inenumerate(prediction):

        for index2, pred inenumerate(preds):

            #print('images_name_listindex2 : {:110}    '.format(images_name_list[index2]), end='')

            #print('\n')

            print('images_name_listindex2 : {:110}    '.format(images_name_list2[index2]), end='')

            for p in pred:

                print('{:8f}'.format(float(p)), end='')

            print('')

        print('')

效果如下图所示：

源码地址链接：

https://pan.baidu.com/s/1_XCUbhup--4b11dBOtYZvg

提取码：5wna

作者简介

李秋键，CSDN 博客专家，CSDN达人课作者。硕士在读于中国矿业大学，开发有taptap安卓武侠游戏一部，vip视频解析，文意转换工具，写作机器人等项目，发表论文若干，多次高数竞赛获奖等等。

更多精彩推荐

故意按摩让女生“产生欲望”后发生关系，算性侵吗？

洗牌电商圈！阿哲放话全网：挑战抖音所有机制！爆全品类大牌！

阿哲现身评论区，@一修！肉肉痛哭，无限期停播！回应舆论黑料，关闭私信评论区！

登热榜！某牙电母被S，榜一求爱遭拒！柚柚阿哲合体年度走红毯！

小敏感喊话阿哲，出镜抖音！欠钱不还，小白龙再被扒借贷官司！