Swift 类似HandyJSON解析Struct

  • HandyJSON
  • 从源码解析Struct
  • 获取TargetStructMetadata
  • 获取TargetStructDescriptor
  • 实现TargetRelativeDirectPointer
  • FieldDescriptor和FieldRecord
  • fieldOffsetVectorOffset计算偏移量
  • 代码的验证


HandyJSON

HandyJSON是阿里开发的一个在swift上把JSON数据转化为对应model的框架。与其他流行的Swift JSON库相比,HandyJSON的特点是,它支持纯swift类,使用也简单。它反序列化时(把JSON转换为Model)不要求ModelNSObject继承(因为它不是基于KVC机制),也不要求你为Model定义一个Mapping函数。只要你定义好Model类,声明它服从HandyJSON协议,HandyJSON就能自行以各个属性的属性名为Key,从JSON串中解析值。不过因为HandyJSON是基于swiftmetadata来做的,如果swiftmetadata的结构改了,HandyJSON可能就直接不能用了。当然阿里一直在维护这个框架,swift的源码有变化,相信框架也是相对于有改变的。
HandyJSON的github

从源码解析Struct

获取TargetStructMetadata

由于HandyJSON是基于swiftmetadata来做的,说道解析解析struct,那就不得不去了解metadata。接下来,我们会从源码的角度去寻找metadata
首先,我们从源码Metadata.h中搜索StructMetadata相关信息,会发现其真正类型是TargetStructMetadata

using StructMetadata = TargetStructMetadata<InProcess>;

接着,我们查看TargetStructMetadata的结构会发现,TargetStructMetadata继承自TargetValueMetadataTargetValueMetadata继承自TargetMetadata

struct TargetStructMetadata : public TargetValueMetadata<Runtime> {
struct TargetValueMetadata : public TargetMetadata<Runtime> {

那么,我们就可以通过这个继承链去还原TargetStructMetadata的结构。
从代码中我们可以看出,TargetStructMetadata的第一个属性是Kind,除了这个属性还有一个description,用于记录描述文件。

struct TargetMetadata {
	......
	private:
	  /// The kind. Only valid for non-class metadata; getKind() must be used to get
	  /// the kind value.
	  StoredPointer Kind;
	......
}

struct TargetValueMetadata : public TargetMetadata<Runtime> {
  using StoredPointer = typename Runtime::StoredPointer;
  TargetValueMetadata(MetadataKind Kind,
                      const TargetTypeContextDescriptor<Runtime> *description)
      : TargetMetadata<Runtime>(Kind), Description(description) {}
  //用于记录元数据的描述
  /// An out-of-line description of the type.
  TargetSignedPointer<Runtime, const TargetValueTypeDescriptor<Runtime> * __ptrauth_swift_type_descriptor> Description;
  ......
}

这样我们就可以得到TargetStructMetadata的结构为

struct TargetStructMetadata {
	// StoredPointer Kind; 64位系统下  using StoredPointer = uint64_t; 即为Int
    var kind: Int  
    //暂且先定义为UnsafeMutablePointer,后面会分析typeDescriptor的结构 T就是泛型
    var typeDescriptor: UnsafeMutablePointer<T>
}

获取TargetStructDescriptor

接下来我们解析Description的相关信息。从源码中可得TargetStructDescriptorDescription的结构。

const TargetStructDescriptor<Runtime> *getDescription() const {
    return llvm::cast<TargetStructDescriptor<Runtime>>(this->Description);
  }

我们查找TargetStructDescriptor可以得到,其继承自TargetValueTypeDescriptor,含有两个属性NumFields(记录属性的count)和FieldOffsetVectorOffset(记录属性在metadata中的偏移量)

class TargetStructDescriptor final
    : public TargetValueTypeDescriptor<Runtime>,
      public TrailingGenericContextObjects<TargetStructDescriptor<Runtime>,
                            TargetTypeGenericContextDescriptorHeader,
                            /*additional trailing objects*/
                            TargetForeignMetadataInitialization<Runtime>,
                            TargetSingletonMetadataInitialization<Runtime>,
                            TargetCanonicalSpecializedMetadatasListCount<Runtime>,
                            TargetCanonicalSpecializedMetadatasListEntry<Runtime>,
                            TargetCanonicalSpecializedMetadatasCachingOnceToken<Runtime>> {
	......
	  /// The number of stored properties in the struct.
  /// If there is a field offset vector, this is its length.
  uint32_t NumFields; //记录属性的count
  /// The offset of the field offset vector for this struct's stored
  /// properties in its metadata, if any. 0 means there is no field offset
  /// vector.
  uint32_t FieldOffsetVectorOffset; //记录属性在metadata中的偏移量

TargetValueTypeDescriptor继承自TargetTypeContextDescriptorTargetTypeContextDescriptor含有三个属性:Name(类型的名称)、AccessFunctionPtr(指向此类型的元数据访问函数的指针)和Fields(指向类型的字段描述符的指针)。

class TargetValueTypeDescriptor
    : public TargetTypeContextDescriptor<Runtime> {
public:
  static bool classof(const TargetContextDescriptor<Runtime> *cd) {
    return cd->getKind() == ContextDescriptorKind::Struct ||
           cd->getKind() == ContextDescriptorKind::Enum;
  }
};
class TargetTypeContextDescriptor
    : public TargetContextDescriptor<Runtime> {
public:
  /// The name of the type.
  // 类型的名称
  TargetRelativeDirectPointer<Runtime, const char, /*nullable*/ false> Name;

  /// A pointer to the metadata access function for this type.
  ///
  /// The function type here is a stand-in. You should use getAccessFunction()
  /// to wrap the function pointer in an accessor that uses the proper calling
  /// convention for a given number of arguments.
  // 指向此类型的元数据访问函数的指针
  TargetRelativeDirectPointer<Runtime, MetadataResponse(...),
                              /*Nullable*/ true> AccessFunctionPtr;
  
  /// A pointer to the field descriptor for the type, if any.
  // 指向类型的字段描述符的指针
  TargetRelativeDirectPointer<Runtime, const reflection::FieldDescriptor,
                              /*nullable*/ true> Fields;
	......
}

TargetTypeContextDescriptor又继承自基类TargetContextDescriptorTargetContextDescriptor包含两个属性:Flags(用于表示描述context的标志,包含kindversion)和Parent(用于表示父类的context,如果是在顶层,则表示没有父类,则为NULL)。

/// Base class for all context descriptors.
template<typename Runtime>
struct TargetContextDescriptor {
  /// Flags describing the context, including its kind and format version.
  // 用于表示描述context的标志,包含kind和version
  ContextDescriptorFlags Flags;
  
  /// The parent context, or null if this is a top-level context.
  // 用于表示父类的context,如果是在顶层,则表示没有父类,则为NULL
  TargetRelativeContextPointer<Runtime> Parent;
  ......
}

从这里开始,TargetStructDescriptor就已经明了了,我们就可以写出TargetStructDescriptor的相关结构,同时修正TargetStructMetadata中的泛型T。

struct TargetStructMetadata {
    var kind: Int
    var typeDescriptor: UnsafeMutablePointer<TargetStructDescriptor>
}

struct TargetStructDescriptor {
	// 用于表示描述context的标志,包含kind和version
    var flags: Int32 // ContextDescriptorFlags Int32
    // 用于表示父类的context,如果是在顶层,则表示没有父类,则为NULL
    var parent: TargetRelativeContextPointer<UnsafeRawPointer> // Relative 相对地址
    // 类型的名称
    var name: TargetRelativeDirectPointer<CChar> // Relative 相对地址
    // 指向此类型的元数据访问函数的指针
    var accessFunctionPointer: TargetRelativeDirectPointer<UnsafeRawPointer> //  Relative 相对地址
    // 指向类型的字段描述符的指针
    var fieldDescriptor: TargetRelativeDirectPointer<FieldDescriptor> //  Relative 相对地址
    // 记录属性的count
    var numFields: Int32
    // 记录属性在metadata中的偏移量
    var fieldOffsetVectorOffset: Int32
}

// 下面是一些属性的类型解析
/// Common flags stored in the first 32-bit word of any context descriptor.
// flags 就是 Int32
struct ContextDescriptorFlags {
	private:
	  uint32_t Value;
}

实现TargetRelativeDirectPointer

对于相对地址TargetRelativeDirectPointer,我们从源码中搜索TargetRelativeDirectPointer可得出TargetRelativeDirectPointer就是RelativeDirectPointer

template <typename Runtime, typename Pointee, bool Nullable = true>
using TargetRelativeDirectPointer
  = typename Runtime::template RelativeDirectPointer<Pointee, Nullable>;

接着在RelativePointer.h找到RelativeDirectPointer,发现RelativeDirectPointer继承自基类RelativeDirectPointerImpl,其包含一个属性RelativeOffset(偏移量)。并且其含有通过偏移量获取真实内存的方法。

template <typename T, bool Nullable = true, typename Offset = int32_t,
          typename = void>
class RelativeDirectPointer;

/// A direct relative reference to an object that is not a function pointer.
// offset传入Int32
template <typename T, bool Nullable, typename Offset>
class RelativeDirectPointer<T, Nullable, Offset,
    typename std::enable_if<!std::is_function<T>::value>::type>
    : private RelativeDirectPointerImpl<T, Nullable, Offset>
{
	......
}

/// A relative reference to a function, intended to reference private metadata
/// functions for the current executable or dynamic library image from
/// position-independent constant data.
template<typename T, bool Nullable, typename Offset>
class RelativeDirectPointerImpl {
	private:
  /// The relative offset of the function's entry point from *this.
  Offset RelativeOffset;
  ......
  // 通过偏移量计算 同时还返回泛型T类型
  PointerTy get() const & {
    // Check for null.
    if (Nullable && RelativeOffset == 0)
      return nullptr;
    
    // The value is addressed relative to `this`.
    uintptr_t absolute = detail::applyRelativeOffset(this, RelativeOffset);
    return reinterpret_cast<PointerTy>(absolute);
  }
  ......
}

/// Apply a relative offset to a base pointer. The offset is applied to the base
/// pointer using sign-extended, wrapping arithmetic.
// 通过偏移量计算
template<typename BasePtrTy, typename Offset>
static inline uintptr_t applyRelativeOffset(BasePtrTy *basePtr, Offset offset) {
  static_assert(std::is_integral<Offset>::value &&
                std::is_signed<Offset>::value,
                "offset type should be signed integer");

  auto base = reinterpret_cast<uintptr_t>(basePtr);
  // We want to do wrapping arithmetic, but with a sign-extended
  // offset. To do this in C, we need to do signed promotion to get
  // the sign extension, but we need to perform arithmetic on unsigned values,
  // since signed overflow is undefined behavior.
  auto extendOffset = (uintptr_t)(intptr_t)offset;
  // 指针地址+存放的offset(偏移地址) -- 内存平移获取值
  return base + extendOffset;
}

那么我们就可以TargetRelativeDirectPointer的结构:

// 传入泛型Pointee
struct TargetRelativeDirectPointer<Pointee> {
    var offset: Int32
    
    // 通过偏移量计算内存
    mutating func getmeasureRelativeOffset() -> UnsafeMutablePointer<Pointee> {
        let offset = self.offset
        
        return withUnsafePointer(to: &self) { p in
        	// 使用advanced偏移offset,再重新绑定成Pointee类型
            return UnsafeMutablePointer(mutating: UnsafeRawPointer(p).advanced(by: numericCast(offset)).assumingMemoryBound(to: Pointee.self))
        }
    }
}

同时我们就可以修正TargetStructDescriptor为:

struct TargetStructDescriptor {
	// 用于表示描述context的标志,包含kind和version
    var flags: Int32
    // 用于表示父类的context,如果是在顶层,则表示没有父类,则为NULL
    var parent: Int32// 由于不去解析,暂时定义为Int32
    // 类型的名称
    var name: TargetRelativeDirectPointer<CChar>
    // 指向此类型的元数据访问函数的指针
    var accessFunctionPointer: TargetRelativeDirectPointer<UnsafeRawPointer>
    // 指向类型的字段描述符的指针
    var fieldDescriptor: TargetRelativeDirectPointer<FieldDescriptor>
    // 记录属性的count
    var numFields: Int32
    // 记录属性在metadata中的偏移量
    var fieldOffsetVectorOffset: Int32
}

// TargetRelativeContextPointer暂时不解析,通过源码分析可得暂时解析为Int32
template<typename Runtime,
         template<typename _Runtime> class Context = TargetContextDescriptor>
using TargetRelativeContextPointer =
  RelativeIndirectablePointer<const Context<Runtime>,
                              /*nullable*/ true, int32_t,
                              TargetSignedContextPointer<Runtime, Context>>;

FieldDescriptor和FieldRecord

再下一步,我们开始解析FieldDescriptor,源码中FieldDescriptor如下:

// Field descriptors contain a collection of field records for a single
// class, struct or enum declaration.
class FieldDescriptor {
  const FieldRecord *getFieldRecordBuffer() const {
    return reinterpret_cast<const FieldRecord *>(this + 1);
  }

public:
  const RelativeDirectPointer<const char> MangledTypeName;
  const RelativeDirectPointer<const char> Superclass;

  FieldDescriptor() = delete;

  const FieldDescriptorKind Kind;
  const uint16_t FieldRecordSize;
  const uint32_t NumFields;
  ......
  // 获取所有属性,每个属性用FieldRecord封装
  llvm::ArrayRef<FieldRecord> getFields() const {
    return {getFieldRecordBuffer(), NumFields};
  }
  ......
}

// FieldDescriptorKin就是 Int16
enum class FieldDescriptorKind : uint16_t {
	......
}

FieldRecord在源码中的结构为:

class FieldRecord {
  const FieldRecordFlags Flags;

public:
  const RelativeDirectPointer<const char> MangledTypeName;
  const RelativeDirectPointer<const char> FieldName;
  ......
}

// Field records describe the type of a single stored property or case member
// of a class, struct or enum.
// FieldRecordFlags 就是Int32
class FieldRecordFlags {
  using int_type = uint32_t;
  ......
}

fieldOffsetVectorOffset计算偏移量

最后还有fieldOffsetVectorOffset(记录属性在metadata中的偏移量)的计算,来获取属性再metadata中的偏移量。源码中能得到的资料是:

// StoredPointer 是Int32 即会返回一个Int32
  /// Get a pointer to the field offset vector, if present, or null.
  const StoredPointer *getFieldOffsets() const {
    assert(isTypeMetadata());
    auto offset = getDescription()->getFieldOffsetVectorOffset();
    if (offset == 0)
      return nullptr;
    auto asWords = reinterpret_cast<const void * const*>(this);
    return reinterpret_cast<const StoredPointer *>(asWords + offset);
  }

但是以这个逻辑去处理,获取的数据是不对的,所以我从HandyJSON的源码中找到了这个:

// 当时64位是 offset 会乘以2
return Int(UnsafePointer<Int32>(pointer)[vectorOffset * (is64BitPlatform ? 2 : 1) + $0])

分析到这里,我们就得到了一个比较清晰地结构线,如下:

// 通过偏移量计算内存地址 传入泛型Pointee
struct TargetRelativeDirectPointer<Pointee> {
    var offset: Int32
    
    // 通过偏移量计算内存
    mutating func getmeasureRelativeOffset() -> UnsafeMutablePointer<Pointee> {
        let offset = self.offset
        
        return withUnsafePointer(to: &self) { p in
        	// 使用advanced偏移offset,再重新绑定成Pointee类型
            return UnsafeMutablePointer(mutating: UnsafeRawPointer(p).advanced(by: numericCast(offset)).assumingMemoryBound(to: Pointee.self))
        }
    }
}

struct TargetStructMetadata {
    var kind: Int
    var typeDescriptor: UnsafeMutablePointer<TargetStructDescriptor>
}


struct TargetStructDescriptor {
    var flags: Int32
    var parent: Int32
    var name: TargetRelativeDirectPointer<CChar>
    var accessFunctionPointer: TargetRelativeDirectPointer<UnsafeRawPointer>
    var fieldDescriptor: TargetRelativeDirectPointer<FieldDescriptor>
    var numFields: Int32
    var fieldOffsetVectorOffset: Int32
    
    func getFieldOffsets(_ metadata: UnsafeRawPointer) -> UnsafePointer<Int32> {
        print(metadata)
        return metadata.assumingMemoryBound(to: Int32.self).advanced(by: numericCast(self.fieldOffsetVectorOffset) * 2)
    }
    
    // 计算元型时使用
    var genericArgumentOffset: Int {
        return 2
    }
}

struct FieldDescriptor {
    var MangledTypeName: TargetRelativeDirectPointer<CChar>
    var Superclass: TargetRelativeDirectPointer<CChar>
    var kind: UInt16
    var fieldRecordSize: Int16
    var numFields: Int32
    var fields: FieldRecordBuffer<FieldRecord>
}

struct FieldRecord {
    var fieldRecordFlags: Int32
    var mangledTypeName: TargetRelativeDirectPointer<CChar>
    var fieldName: TargetRelativeDirectPointer<UInt8>
}

// 获取FieldRecord
struct FieldRecordBuffer<Element> {
    var element: Element
    
    mutating func buffer(n: Int) -> UnsafeBufferPointer<Element> {
        return withUnsafePointer(to: &self) {
            let ptr = $0.withMemoryRebound(to: Element.self, capacity: 1) { start in
                return start
            }
            return UnsafeBufferPointer(start: ptr, count: n)
        }
    }
    
    mutating func index(of i: Int) -> UnsafeMutablePointer<Element> {
        return withUnsafePointer(to: &self) {
            return UnsafeMutablePointer(mutating: UnsafeRawPointer($0).assumingMemoryBound(to: Element.self).advanced(by: i))
        }
    }
}

代码的验证

下面我们就代码来验证我们得到的这个结构。

protocol BrigeProtocol {}

extension BrigeProtocol {
	// 通过协议重新绑定类型 返回出去
    static func get(from pointor: UnsafeRawPointer) -> Any {
    	// Self就是真实的类型
        pointor.assumingMemoryBound(to: Self.self).pointee
    }
}

struct BrigeMetadataStruct {
    let type: Any.Type
    let witness: Int
}

func custom(type: Any.Type) -> BrigeProtocol.Type {
    let container = BrigeMetadataStruct(type: type, witness: 0)
    let cast = unsafeBitCast(container, to: BrigeProtocol.Type.self)
    return cast
}
// LLPerson结构体
struct LLPerson {
    var age: Int = 18
    var name: String = "LL"
    var nameTwo: String = "LLLL"
}
// 创建一个实例
var p = LLPerson()
// LLPerson的metadata按位塞入TargetStructMetadata这个metadata中,LLPerson.self就是UnsafeMutablePointer<TargetStructMetadata>.self
let ptr = unsafeBitCast(LLPerson.self as Any.Type, to: UnsafeMutablePointer<TargetStructMetadata>.self)

// 拿到结构体名称
let namePtr = ptr.pointee.typeDescriptor.pointee.name.getmeasureRelativeOffset()
print("当前 struct name: \(String(cString: namePtr))")
// 拿到属性个数
let numFields = ptr.pointee.typeDescriptor.pointee.numFields
print("当前类属性个数: \(numFields)")

// 拿到属性再metadata中的偏移量
let offsets = ptr.pointee.typeDescriptor.pointee.getFieldOffsets(UnsafeRawPointer(ptr).assumingMemoryBound(to: Int.self))

print("----------- start fetch field -------------")

for i in 0..<numFields {
    // 获取属性名
    let fieldName = ptr.pointee.typeDescriptor.pointee.fieldDescriptor.getmeasureRelativeOffset().pointee.fields.index(of: Int(i)).pointee.fieldName.getmeasureRelativeOffset()
    print("----- field \(String(cString: fieldName))  -----")

    // 拿到属性对应的偏移量 按字节偏移的
    let fieldOffset = offsets[Int(i)]
    print("\(String(cString: fieldName)) 的偏移量是:\(fieldOffset)字节")
    // 这是swift混写过的类型名称 需要把它转成真正的类型名称
    let typeMangleName = ptr.pointee.typeDescriptor.pointee.fieldDescriptor.getmeasureRelativeOffset().pointee.fields.index(of: Int(i)).pointee.mangledTypeName.getmeasureRelativeOffset()
//    print("\(String(cString: typeMangleName))")
    let genericVector = UnsafeRawPointer(ptr).advanced(by: ptr.pointee.typeDescriptor.pointee.genericArgumentOffset * MemoryLayout<UnsafeRawPointer>.size).assumingMemoryBound(to: Any.Type.self)
    // 需要用到这个库函数 swift_getTypeByMangledNameInContext 传递四个参数
    let fieldType = swift_getTypeByMangledNameInContext(
        typeMangleName, // 混写过后的名称
        256,            // 混写过后的名称信息长度,需要计算 HandyJSON中直接 256
        UnsafeRawPointer(ptr.pointee.typeDescriptor), // 上下文 typeDescriptor中
        UnsafeRawPointer(genericVector).assumingMemoryBound(to: Optional<UnsafeRawPointer>.self)) //当前的泛型参数 还原符号信息

    // 将fieldType按位塞入Any
    let type = unsafeBitCast(fieldType, to: Any.Type.self)
    // 通过协议桥接获取我们的真实类型信息
    let value = custom(type: type)

    //获取实例对象p的指针 需要转换成UnsafeRawPointer 并且绑定成1字节即Int8类型,
    //因为后面是按字节计算偏移量的,不转换,会以结构体的长度偏移
    let instanceAddress = withUnsafePointer(to: &p){return UnsafeRawPointer($0).assumingMemoryBound(to: Int8.self)}

    print("fieldTyoe: \(type) \nfieldValue: \(value.get(from: instanceAddress.advanced(by: Int(fieldOffset))))")
}

print("----------- end fetch field -------------")

打印信息:

swift和js swift和js很像_ios


从内存地址我们也可以看出属性的布局信息。

swift和js swift和js很像_json_02