什么是Protobuf?



proto文件进行序列化和反序列化springboot_javascriptProtobuf(Protocol Buffer)是 Google 开发的一套数据存储传输协议,作用就是将数据进行序列化后再传输,Protobuf 编码是二进制的,它不是可读的,也不容易手动修改,因此它增加了分析或修改数据的难度。同时Protobuf 能够把数据压缩得很小,从而提高传输效率。通俗的理解就是Protobuf跟json序列化是类似的,只不过实现的方法不同而已。



安装Protobuf



proto文件进行序列化和反序列化springboot_javascript

点击下载对应的版本,然后解压,并加入环境变量。


proto文件进行序列化和反序列化springboot_javascript_03



序列化与反序列化



proto文件进行序列化和反序列化springboot_javascriptProtobuf序列化需要开发人员在 .proto 文件中自定义消息格式,使用protobuf 编译器(protoc)选择需要的语言生成消息处理文件,也可以在

官网一键生成,用生成的文件就能进行序列化与反序列化。


proto文件进行序列化和反序列化springboot_javascript下面将举例说明如何通过js逆向来进行反序列化,目标网址:aHR0cHM6Ly93d3cueGlhb2hvbmdzaHUuY29tL2V4cGxvcmUvNjRkYzg2OGEwMDAwMDAwMDBhMDFiZDgz。


proto文件进行序列化和反序列化springboot_javascript打开目标网址,F12抓包,collect接口的请求参数是base64编码的,


proto文件进行序列化和反序列化springboot_反序列化_07


解码后的数据是这样的,

춐]
6discovery-undefined0.0.00:
xhs-pc-webB3.5.2pm
 5bc331f43e6e73244d2b51c2999b1e02HyYjdqDYqjyF8yYjdqDYq2I24qyKAfI4WlxWh7idWx1y1vK28SqduD0888yW2yWj8DDiqd0qy"
61c3e3e9000000001000d0df*264dc868a000000000a01bd83p:B
$2cd55f67-ae5a-446a-9571-cb81e171d8360J167Xຊִx˅1BJ
$9bab7cd2-3eae-4469-9553-06cc2e5c8492oMozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36<https://www.xiaohongshu.com/explore/64dc868a000000000a01bd83"/explore/:noteId*Lhttps://www.xiaohongshu.com/explore/64e19dc2000000000103c666?m_source=pinpaiZ"
64dc868a000000000a01bd83rlink

可以看出有一些乱码在里面,这个时候其实还无法判断是否用了protobuf序列化,一些网站可以查看协议头的content-type,如下图所示就是使用protobuf。

proto文件进行序列化和反序列化springboot_javascript_08


但是目标网站对序列化结果进行了base64编码,所以协议头的content-type跟正常的请求是一样的。

proto文件进行序列化和反序列化springboot_ecmascript_09


这种情况就得通过动态调试来看看这到底是什么玩意,查看调用堆栈,定位到可疑代码,在此处打上断点。

proto文件进行序列化和反序列化springboot_ecmascript_10


单步跟进去,图示位置打上断点。

proto文件进行序列化和反序列化springboot_反序列化_11


单步跟进,来到关键位置,到这里特征就很明显了,”proto“、”serializeBinary“等关键字就是protobuf的显著特征。

proto文件进行序列化和反序列化springboot_javascript_12


接下来就可以根据源码中的规律来自定义proto文件,在此之前需要了解一下proto文件的语法格式以及数据类型,篇幅有限大佬们可以查看别的教程,本文只侧重逆向部分。

编写.proto文件


proto文件进行序列化和反序列化springboot_javascript如下图所示,目标网站的消息格式是一个Tracker消息里有很多的子消息,有APP、Mobile、Device等。

proto文件进行序列化和反序列化springboot_序列化_14


我们可以根据这个写出最外层的proto,


syntax = "proto3";
package xhs;
message Tracker {
  repeated APP app = 1;
  repeated Mobile mobile = 2;
  repeated Device device = 3;
  repeated User user = 4;
  repeated Network network = 5;
  repeated Page page = 6;
  repeated Event event = 7;
  repeated Browser browser = 9;
  repeated NoteTarget noteTarget = 11;
  repeated NoteCommentTarget noteCommentTarget = 12;
  repeated TagTarget tagTarget = 13;
  repeated UserTarget userTarget = 14;
  repeated MallBannerTarget mallBannerTarget = 15;
  repeated MallGoodsTarget mallGoodsTarget = 16;
  repeated MallVendorTarget mallVendorTarget = 17;
  repeated MallCouponTarget mallCouponTarget = 18;
  repeated SearchTarget searchTarget = 30;
  repeated BrandingUserTarget brandingUserTarget = 40;
  repeated BrowserTarget browserTarget = 51;
  repeated ChannelTabTarget channelTabTarget = 100;
  repeated MessageTarget messageTarget = 151;
  repeated AdsTarget adsTarget = 152;
  repeated HeyTarget heyTarget = 153;
  repeated DebugTarget debugTarget = 154;
  repeated ActivityTarget activityTarget = 157;
  repeated LiveTarget liveTarget = 164;
  repeated CircleTarget circleTarget = 167;
  repeated GrowthPetTaskTarget growthPetTaskTarget = 195;
  repeated HideType hideType = 197;
  repeated WebTarget webTarget = 219;

}

然后单步进入proto.App.serializeBinaryToWriter,写出App的proto。

proto文件进行序列化和反序列化springboot_开发语言_15

message APP {
  enum NameTracker {
    DEFAULT_1 = 0;
		IOST = 1;
		ANDRT = 2;
		RNT = 3;
		MPT = 4;
		WAPT = 5;
		WXMPT = 6;
		BDMPT = 7;
		TTMPT = 8;
		QQMPT = 9;
		APMPT = 10;
		MINI_ANDRT = 11;
  }
  NameTracker nameTracker = 1;
  string AppVersion = 2;
  string TrackerVersion = 3;
  string SessionId = 4;
  string AppMarket = 5;
  enum Platform {
    DEFAULT_13 = 0;
		IOS = 1;
		ANDROID = 2;
		REACTNATIVE = 3;
		MOBILEBROWSER = 4;
		WECHATBROWSER = 5;
		WECHATMINIPROGRAM = 6;
		PC = 7;
		IOSBROWSER = 8;
		ANDROIDBROWSER =  9;
		FLUTTER = 10;
  };
  Platform platform = 6;
  string ArtifactName = 7;
  string ArtifactVersion = 8;
  enum AppMode {
    app_mode = 0;
  };
  AppMode appMode = 9;
  string LaunchId = 10;
  string MpScene = 11;
  string AppStartMode = 12;
  string BuildVersion = 13;
  int32 EventSeqIdInSession = 14;
  bool DarkMode = 15;
  string StartupId = 16;
  enum Orientation {
    DEFAULT_60 = 0;
		PORTRAIT = 1;
		LANDSCAPE = 2;
		LANDSCAPE_SPLIT = 3;
		PORTRAIT_SPLIT = 4;
		PORTRAIT_SPLIT_MAGIC = 5;
		LANDSCAPE_SPLIT_MAGIC = 6;
		LANDSCAPE_MAGIC = 7;
		PORTRAIT_MAGIC = 8;
  };
  Orientation orientation = 17;
  string BuildId = 1001;
  string Package = 1002;
  string AppName = 1003;
  string SdkName = 1004;
  string SdkVersion = 1005;
  enum Environment {
    DEFAULT_64 = 0;
		ENVIRONMENT_DEVELOP = 1;
		ENVIRONMENT_RELEASE = 2;
  };
  Environment environment = 1006;
  int64 ColdStartId = 1007;
  bool IsTeenagerMode = 1008;
  string DeviceType = 1009;

}

enum 数据类型就是提前为字段预设定一些值,可以通过关键字搜索在源码中找到预设的值。

proto文件进行序列化和反序列化springboot_ecmascript_16


依葫芦画瓢就能写出完整的.proto文件,这个时候我们就可以生成任何语言的消息处理文件,以python为例,写好之后执行命令”protoc --python_out=. ./collect.proto“就会生成一个py文件,测试一下反序列化,

import base64
from utils import collect_pb2
a = 'jgXsthBdCjYIBRITZGlzY292ZXJ5LXVuZGVmaW5lZBoFMC4wLjAwBzoKeGhzLXBjLXdlYkIFMy41LjJwgwESABptCiA1YmMzMzFmNDNlNmU3MzI0NGQyYjUxYzI5OTliMWUwMooBSHlZamRxRFlxanlGOHlZamRxRFlxMkkyNHF5S0FmSTRXbHhXaDdpZFd4MXkxdksyOFNxZHVEMDg4OHlXMnlXajhERGlxZDBxeSIaChg2MWMzZTNlOTAwMDAwMDAwMTAwMGQwZGYqAggEMh0I+xcSGDY0ZGM4NjhhMDAwMDAwMDAwYTAxYmQ4MzpECiRjMThhYzliYS1mY2JiLTQ3YTYtOTMwOC1hMTM4MGVmZTQ1YzIgATAfSgMxMzFY4NOG49T0gAN4z8UBiALs1eGxojFKtQIKJDliYWI3Y2QyLTNlYWUtNDQ2OS05NTUzLTA2Y2MyZTVjODQ5MhJvTW96aWxsYS81LjAgKFdpbmRvd3MgTlQgMTAuMDsgV2luNjQ7IHg2NCkgQXBwbGVXZWJLaXQvNTM3LjM2IChLSFRNTCwgbGlrZSBHZWNrbykgQ2hyb21lLzExNS4wLjAuMCBTYWZhcmkvNTM3LjM2GjxodHRwczovL3d3dy54aWFvaG9uZ3NodS5jb20vZXhwbG9yZS82NGRjODY4YTAwMDAwMDAwMGEwMWJkODMiEC9leHBsb3JlLzpub3RlSWQqTGh0dHBzOi8vd3d3LnhpYW9ob25nc2h1LmNvbS9leHBsb3JlLzY0ZTE5ZGMyMDAwMDAwMDAwMTAzYzY2Nj9tX3NvdXJjZT1waW5wYWlaIgoYNjRkYzg2OGEwMDAwMDAwMDBhMDFiZDgzEAFyBGxpbms='
b = base64.urlsafe_b64decode(a)
tracker = collect_pb2.Tracker()
tracker.ParseFromString(b[4::])
print(tracker)

proto文件进行序列化和反序列化springboot_ecmascript_17


此时已经可以成功的反序列化了,需要特殊说明的是base解码的时候必须要用urlsafe_b64decode方法,因为原始数据里面有url,解码后的字节数据去掉了前面4个字节,因为在编码的时候在前面加了四个无用字节。

proto文件进行序列化和反序列化springboot_javascript_18

很多教程会说用fd抓包下载bin,然后命令行 protoc --decode_raw < 1.bin执行,解析protobuf数据结构,根据这个结构写proto,这种方法只适合大佬用,对于刚接触protobuf的人来说如果看到这种教程就会掉入无底深坑。
本文只用来交流学习,关键信息均已脱敏,如有侵权请联系删除。