百度APP iOS端包体积50M优化实践(三) 资源优化
The following article is from 百度App技术 Author RichardYang
GEEK TALK
01
前言
GEEK TALK
02
Mach-O文件详解
2.1 简介
2.2 分析Mach-O文件的工具
2.2.1 MachOView分析
MachOView下载地址:http://sourceforge.net/projects/machoview/ MachOView源码地址:https://github.com/gdbinit/MachOView
Contents of (__DATA,__objc_classlist) section
0000000100008238 0x100009980
isa 0x1000099a8
superclass 0x0 _OBJC_CLASS_$_UIViewController
cache 0x0 __objc_empty_cache
vtable 0x0
data 0x1000083e8
flags 0x90
instanceStart 8
instanceSize 8
reserved 0x0
ivarLayout 0x0
name 0x100007349 ViewController
baseMethods 0x1000082d8
entsize 24
count 11
name 0x100006424 test4
types 0x1000073e4 v16@0:8
imp 0x100004c58
name 0x1000063b4 viewDidLoad
*****
下面列举otool常见命令:
2.3 查看文件格式
~ % file /Users/ycx/Desktop/demo.app/demo
/Users/ycx/Desktop/demo.app/demo: Mach-O 64-bit executable arm64
~ % lipo -info /Users/ycx/Desktop/demo.app/demo
Non-fat file: /Users/ycx/Desktop/demo.app/demo is architecture: arm64
2.4 文件结构
2.4.1 总体结构
2.4.2 Header(头部)
2.4.2.1 数据结构
Header(头部): 用于描述当前Mach-O文件的基本信息(CPU类型、文件类型等),XNU代码路径:EXTERNAL_HEADERS/mach-o/loader.h,数据结构如下所示:
struct mach_header_64 {
uint32_t magic; /* mach magic number identifier */
cpu_type_t cputype; /* cpu specifier */
cpu_subtype_t cpusubtype; /* machine specifier */
uint32_t filetype; /* type of file */
uint32_t ncmds; /* number of load commands */
uint32_t sizeofcmds; /* the size of all the load commands */
uint32_t flags; /* flags */
uint32_t reserved; /* reserved */
};
2.4.2.2 查看字段值
% otool -hv demo
demo:
Mach header
magic cputype cpusubtype caps filetype ncmds sizeofcmds flags
MH_MAGIC_64 ARM64 ALL 0x00 EXECUTE 22 3040 NOUNDEFS DYLDLINK TWOLEVEL PIE
2.4.2.3 字段具体含义
各个字段具体含义如下所示:
2.4.3 LoadCommands(加载命令)
2.4.3.1 数据结构
struct load_command {
uint32_t cmd; /* type of load command */
uint32_t cmdsize; /* total size of command in bytes */
};
/* Constants for the cmd field of all load commands, the type */
#define LC_SEGMENT 0x1 /* segment of this file to be mapped */
#define LC_SYMTAB 0x2 /* link-edit stab symbol table info */
#define LC_SYMSEG 0x3 /* link-edit gdb symbol table info (obsolete) */
#define LC_THREAD 0x4 /* thread */
#define LC_UNIXTHREAD 0x5 /* unix thread (includes a stack) */
#define LC_LOADFVMLIB 0x6 /* load a specified fixed VM shared library */
#define LC_IDFVMLIB 0x7 /* fixed VM shared library identification */
#define LC_IDENT 0x8 /* object identification info (obsolete) */
#define LC_FVMFILE 0x9 /* fixed VM file inclusion (internal use) */
#define LC_PREPAGE 0xa /* prepage command (internal use) */
#define LC_DYSYMTAB 0xb /* dynamic link-edit symbol table info */
#define LC_LOAD_DYLIB 0xc /* load a dynamically linked shared library */
#define LC_ID_DYLIB 0xd /* dynamically linked shared lib ident */
#define LC_LOAD_DYLINKER 0xe /* load a dynamic linker */
#define LC_ID_DYLINKER 0xf /* dynamic linker identification */
#define LC_PREBOUND_DYLIB 0x10 /* modules prebound for a dynamically */
*****
2.4.3.2 查看字段值
2.4.3.3 cmd类型及其具体作用 常见的cmd类型及其具体作用如下面表格所示:
2.4.3.4 LC_SEGMENT_64
2.4.3.4.1 数据结构
在众多cmd命令中,我们需要重点关注的是LC_SEGMENT/LC_SEGMENT_64,LC_SEGMENT是32位,LC_SEGMENT_64是64位,目前主流机型是LC_SEGMENT_64。LC_SEGMENT_64作用是如何将Data中的各个Segment加载入内存中,而和我们APP相关的代码及数据,大部分位于各个Segment中。其数据结构名称是segment_command_64,XNU代码路径:EXTERNAL_HEADERS/mach-o/loader.h,源码如下所示:
struct segment_command_64 { /* for 64-bit architectures */
uint32_t cmd; /* LC_SEGMENT_64 */
uint32_t cmdsize; /* includes sizeof section_64 structs */
char segname[16]; /* segment name */
uint64_t vmaddr; /* memory address of this segment */
uint64_t vmsize; /* memory size of this segment */
uint64_t fileoff; /* file offset of this segment */
uint64_t filesize; /* amount to map from the file */
vm_prot_t maxprot; /* maximum VM protection */
vm_prot_t initprot; /* initial VM protection */
uint32_t nsects; /* number of sections in segment */
uint32_t flags; /* flags */
};
2.4.3.4.4 _DATA
2.4.4 Data(数据段)
GEEK TALK
03
资源优化
3.1 简介
3.2 大资源优化
3.2.1 获取大资源
def findBigResources(path,threshold):
pathDir = os.listdir(path)
for allDir in pathDir:
child = os.path.join('%s%s' % (path, allDir))
if os.path.isfile(child):
# 获取读到的文件的后缀
end = os.path.splitext(child)[-1]
# 过滤掉dylib系统库和asset.car
if end != ".dylib" and end != ".car":
temp = os.path.getsize(child)
# 转换单位:B -> KB
fileLen = temp / 1024
if fileLen > threshold:
#print(end)
print(child + " length is " + str(fileLen));
else:
# 递归遍历子目录
child = child + "/"
findBigResources(child,threshold)
3.2.2 优化方法
异步下载:只要APP首次启动时不需要加载该资源,或者即使首次启动需要加载但是使用频率不高,那么该资源就可以走异步下载; 资源压缩:当APP首次启动需要加载且频率较高的情况下,可以对大块资源先进行压缩内置APP,启动阶段异步线程解压再使用;
3.2 无用的配置文件
3.3.1 获取配置文件
def findProfileResources(path):
pathDir = os.listdir(path)
for allDir in pathDir:
child = os.path.join('%s%s' % (path, allDir))
if os.path.isfile(child):
# 获取读到的文件的后缀
end = os.path.splitext(child)[-1]
if end != ".dylib" and end != ".car" and end != ".png" and end != ".webp" and end != ".gif" and end != ".js" and end != ".css":
print(child + " 后缀 " + end)
else:
# 递归遍历子目录
child = child + "/"
findProfileResources(child)
lines = os.popen('/usr/bin/otool -v -s __TEXT __cstring %s' % path).readlines()
3.3.3 获取无用配置文件
前面获取的集合做diff,获取无用配置文件,确认无误后删除以减少包体积。如果你的资源名是拼接使用的,就无法命中,所以删除资源一定要逐个确认。
3.3.4 JS&CSS无用文件排查
JS&CSS文件具有特殊性,OC代码可以引用,HTML文件也可以加载引用,图片也是这种情况,但是上面提到的mach-o文件中TEXT字段只能覆盖OC文件的引用方式,而HTML加载才是主流场景,为此针对这种case百度APP采用跟无用图片检测类似的解决方案。
3.4 重复资源优化
def get_file_library(path, file_dict):
pathDir = os.listdir(path)
for allDir in pathDir:
child = os.path.join('%s/%s' % (path, allDir))
if os.path.isfile(child):
md5 = img_to_md5(child)
# 将md5存入字典
key = md5
file_dict.setdefault(key, []).append(allDir)
continue
get_file_library(child, file_dict)
def img_to_md5(path):
fd = open(path, 'rb')
fmd5 = hashlib.md5(fd.read()).hexdigest()
fd.close()
return fmd5
GEEK TALK
04
总结
END