webrtc的Probe源码分析(M92)

webrtc中的主动带宽探测是gcc拥塞控制中极为重要的一环,其在快速有效的确定网络带宽这一关键环节中起到了十分重要的作用。这里就结合新版webrtc源码(M92)对涉及到的相关类功能进行细致的分析。

1 BitrateProber

1.1 基本ProberConfig

BitrateProberConfig::BitrateProberConfig(
    const WebRtcKeyValueConfig* key_value_config)
    : min_probe_packets_sent("min_probe_packets_sent", 5),
      min_probe_delta("min_probe_delta", TimeDelta::Millis(1)),
      min_probe_duration("min_probe_duration", TimeDelta::Millis(15)),
      max_probe_delay("max_probe_delay", TimeDelta::Millis(10)),
      abort_delayed_probes("abort_delayed_probes", true) {
  ParseFieldTrial(
      {&min_probe_packets_sent, &min_probe_delta, &min_probe_duration,
       &max_probe_delay, &abort_delayed_probes},
      key_value_config->Lookup("WebRTC-Bwe-ProbingConfiguration"));
  ParseFieldTrial(
      {&min_probe_packets_sent, &min_probe_delta, &min_probe_duration,
       &max_probe_delay, &abort_delayed_probes},
      key_value_config->Lookup("WebRTC-Bwe-ProbingBehavior"));
}

每个probe cluster至少包括5个packets(4个间隔delta), 每个delta不小于1ms。最小的探测duration是15ms,也就是说一次探测的发送报文是按照bitrate发送至少15ms的报文。
max_probe_delay是最大探测延迟为10ms。

1.2 创建探测cluster

根据探测bitrate确定一个cluster,另外如果clusters_中有cluster从创建时间到现在超过5s,则探测cluster超期,因此从clusters_中remove掉。

void BitrateProber::CreateProbeCluster(DataRate bitrate,
                                       Timestamp now,
                                       int cluster_id) {
  RTC_DCHECK(probing_state_ != ProbingState::kDisabled);
  RTC_DCHECK_GT(bitrate, DataRate::Zero());

  total_probe_count_++;
  //如果一个cluster创建的时间距离现在过去了5s,则该cluster不再进行探测(估计是因为状态发生变化,过去的cluster参数已经不合理了)
  //1. 清理掉过去的probe cluster
  while (!clusters_.empty() &&
         now - clusters_.front().created_at > kProbeClusterTimeout) {
    clusters_.pop();
    total_failed_probe_count_++;
  }
  //2. 创建新的probe cluster(bitrate  cluster_id)
  ProbeCluster cluster;
  cluster.created_at = now;
  //不能少于5个报文
  cluster.pace_info.probe_cluster_min_probes = config_.min_probe_packets_sent;
  //字节数为bitrate * 15ms的字节量
  cluster.pace_info.probe_cluster_min_bytes =
      (bitrate * config_.min_probe_duration.Get()).bytes();
  RTC_DCHECK_GE(cluster.pace_info.probe_cluster_min_bytes, 0);
  //探测bitrate
  cluster.pace_info.send_bitrate_bps = bitrate.bps();
  //probe cluster的id
  cluster.pace_info.probe_cluster_id = cluster_id;
  //加入队列
  clusters_.push(cluster);

  RTC_LOG(LS_INFO) << "Probe cluster (bitrate:min bytes:min packets): ("
                   << cluster.pace_info.send_bitrate_bps << ":"
                   << cluster.pace_info.probe_cluster_min_bytes << ":"
                   << cluster.pace_info.probe_cluster_min_probes << ")";
  // If we are already probing, continue to do so. Otherwise set it to
  // kInactive and wait for OnIncomingPacket to start the probing.
  if (probing_state_ != ProbingState::kActive)
    probing_state_ = ProbingState::kInactive;
}
// A probe cluster consists of a set of probes. Each probe in turn can be
  // divided into a number of packets to accommodate the MTU on the network.
  struct ProbeCluster {
    PacedPacketInfo pace_info;

    int sent_probes = 0;
    int sent_bytes = 0;
    Timestamp created_at = Timestamp::MinusInfinity();
    Timestamp started_at = Timestamp::MinusInfinity();
    int retries = 0;
  };

其中pace_info记录了probe的基本参数;而sent_probes则是已经发送的probe报文个数,sent_bytes则是已经发送的probe字节数。相当于维护一个cluster的基本状态。

1.3 下次探测时间

(1)在PacingController的PrcoessPacket中
如果is_probing,也就是出于探测状态,每发送一个探测报文,就会调用BitrateProber的ProbeSent函数,其中size_t bytes是本次发送的字节数。
ProbeSent函数会更新当前cluster已经发送的字节数和packet数量,并计算下次发送时间,如果发送的字节数和报文数已经达到基本要求,则完成本cluster的探测。

void BitrateProber::ProbeSent(Timestamp now, size_t bytes) {
  RTC_DCHECK(probing_state_ == ProbingState::kActive);
  RTC_DCHECK_GT(bytes, 0);

  if (!clusters_.empty()) {
    ProbeCluster* cluster = &clusters_.front();
    if (cluster->sent_probes == 0) {
      RTC_DCHECK(cluster->started_at.IsInfinite());
      cluster->started_at = now;
    }
    //更新已经发送的字节数和报文数量
    cluster->sent_bytes += static_cast<int>(bytes);
    cluster->sent_probes += 1;
    //计算下个probe packet的发送时间
    next_probe_time_ = CalculateNextProbeTime(*cluster);
    //如果发送的字节数和发送的duration已经满足最小要求,则cluster完毕,pop
    if (cluster->sent_bytes >= cluster->pace_info.probe_cluster_min_bytes &&
        cluster->sent_probes >= cluster->pace_info.probe_cluster_min_probes) {
      RTC_HISTOGRAM_COUNTS_100000("WebRTC.BWE.Probing.ProbeClusterSizeInBytes",
                                  cluster->sent_bytes);
      RTC_HISTOGRAM_COUNTS_100("WebRTC.BWE.Probing.ProbesPerCluster",
                               cluster->sent_probes);
      RTC_HISTOGRAM_COUNTS_10000("WebRTC.BWE.Probing.TimePerProbeCluster",
                                 (now - cluster->started_at).ms());

      clusters_.pop();
    }
    //如果clusters已经空了,则probing_state置为挂起状态
    if (clusters_.empty())
      probing_state_ = ProbingState::kSuspended;
  }
}

下个探测时间(时刻)的计算

//计算下一个probe packet的发送时间
Timestamp BitrateProber::CalculateNextProbeTime(
    const ProbeCluster& cluster) const {
  RTC_CHECK_GT(cluster.pace_info.send_bitrate_bps, 0);
  RTC_CHECK(cluster.started_at.IsFinite());

  // Compute the time delta from the cluster start to ensure probe bitrate stays
  // close to the target bitrate. Result is in milliseconds.
  DataSize sent_bytes = DataSize::Bytes(cluster.sent_bytes);
  DataRate send_bitrate =
      DataRate::BitsPerSec(cluster.pace_info.send_bitrate_bps);
  TimeDelta delta = sent_bytes / send_bitrate;
  return cluster.started_at + delta;
}

已经发送的字节数和探测bitrate就可以计算出已经探测的时间长度delta,加上开始时间就是下一次发送的时刻。

2 ProbeController

ProbeController的主要功能是负责产生probe cluster。主要分为以下几种情况:

2.1 kInit

会对初始带宽的3倍和6倍进行探测。

std::vector<ProbeClusterConfig> ProbeController::InitiateExponentialProbing(
    int64_t at_time_ms) {
  RTC_DCHECK(network_available_);
  RTC_DCHECK(state_ == State::kInit);
  RTC_DCHECK_GT(start_bitrate_bps_, 0);

  // When probing at 1.8 Mbps ( 6x 300), this represents a threshold of
  // 1.2 Mbps to continue probing.
  std::vector<int64_t> probes = {static_cast<int64_t>(
      config_.first_exponential_probe_scale * start_bitrate_bps_)};
  if (config_.second_exponential_probe_scale) {
    probes.push_back(config_.second_exponential_probe_scale.Value() *
                     start_bitrate_bps_);
  }
  //上面加入了两个probes bitrate(900kb和1.8M)
  //InitiateProbing负责构造probe cluster
  return InitiateProbing(at_time_ms, probes, true);
}
std::vector<ProbeClusterConfig> ProbeController::InitiateProbing(
    int64_t now_ms,
    std::vector<int64_t> bitrates_to_probe,
    bool probe_further) {
  int64_t max_probe_bitrate_bps =
      max_bitrate_bps_ > 0 ? max_bitrate_bps_ : kDefaultMaxProbingBitrateBps;
  if (limit_probes_with_allocateable_rate_ &&
      max_total_allocated_bitrate_ > 0) {
    // If a max allocated bitrate has been configured, allow probing up to 2x
    // that rate. This allows some overhead to account for bursty streams,
    // which otherwise would have to ramp up when the overshoot is already in
    // progress.
    // It also avoids minor quality reduction caused by probes often being
    // received at slightly less than the target probe bitrate.
    max_probe_bitrate_bps =
        std::min(max_probe_bitrate_bps, max_total_allocated_bitrate_ * 2);
  }

  //1. 生成probe clusters
  std::vector<ProbeClusterConfig> pending_probes;
  for (int64_t bitrate : bitrates_to_probe) {
    RTC_DCHECK_GT(bitrate, 0);

    if (bitrate > max_probe_bitrate_bps) {
      bitrate = max_probe_bitrate_bps;
      probe_further = false;
    }

    ProbeClusterConfig config;
    config.at_time = Timestamp::Millis(now_ms);
    config.target_data_rate =
        DataRate::BitsPerSec(rtc::dchecked_cast<int>(bitrate));
    config.target_duration = TimeDelta::Millis(kMinProbeDurationMs);
    config.target_probe_count = kMinProbePacketsSent;
    config.id = next_probe_cluster_id_;
    next_probe_cluster_id_++;
    MaybeLogProbeClusterCreated(event_log_, config);
    pending_probes.push_back(config);
  }
  time_last_probing_initiated_ms_ = now_ms;
  //2. 是否要进一步探测
  if (probe_further) {
    state_ = State::kWaitingForProbingResult;
    //如果要进一步probe,需要满足的最小bps=上次探测*0.7
    //也就是实际的带宽达到了0.7 * last_probe_bitrate
    min_bitrate_to_probe_further_bps_ =
        (*(bitrates_to_probe.end() - 1)) * config_.further_probe_threshold;
  } else {
    //探测完毕,不用探测了
    state_ = State::kProbingComplete;
    min_bitrate_to_probe_further_bps_ = kExponentialProbingDisabled;
  }
  return pending_probes;
}

2.2 周期性探测

当编码器输出码率过低,处于alr的状态时,可以启动alr周期性探测。

std::vector<ProbeClusterConfig> ProbeController::Process(int64_t at_time_ms) {
  //1. 超过了1s(结果应该出来了)
  if (at_time_ms - time_last_probing_initiated_ms_ >
      kMaxWaitingTimeForProbingResultMs) {
    mid_call_probing_waiting_for_result_ = false;
    //没有等到结果,超时了
    if (state_ == State::kWaitingForProbingResult) {
      RTC_LOG(LS_INFO) << "kWaitingForProbingResult: timeout";
      state_ = State::kProbingComplete;
      min_bitrate_to_probe_further_bps_ = kExponentialProbingDisabled;
    }
  }
  //2. 周期性alr probing
  if (enable_periodic_alr_probing_ && state_ == State::kProbingComplete) {
    // Probe bandwidth periodically when in ALR state.
    if (alr_start_time_ms_ && estimated_bitrate_bps_ > 0) {
      //计算下一次探测时间(5s间隔)
      int64_t next_probe_time_ms =
          std::max(*alr_start_time_ms_, time_last_probing_initiated_ms_) +
          config_.alr_probing_interval->ms();
      //如果当前时间已经过了探测时间,则启动探测,其中alr_probe_scale = 2
      if (at_time_ms >= next_probe_time_ms) {
        return InitiateProbing(at_time_ms,
                               {static_cast<int64_t>(estimated_bitrate_bps_ *
                                                     config_.alr_probe_scale)},
                               true);
      }
    }
  }
  return std::vector<ProbeClusterConfig>();
}

周期性探测是当前估计带宽的2倍,alr_probe_scale = 2。

2.3 带宽触发检测(当前带宽较大)

当前带宽没有到最大容量,且已经满足了探测条件。

std::vector<ProbeClusterConfig> ProbeController::SetEstimatedBitrate(
    int64_t bitrate_bps,
    int64_t at_time_ms) {
  if (mid_call_probing_waiting_for_result_ &&
      bitrate_bps >= mid_call_probing_succcess_threshold_) {
    RTC_HISTOGRAM_COUNTS_10000("WebRTC.BWE.MidCallProbing.Success",
                               mid_call_probing_bitrate_bps_ / 1000);
    RTC_HISTOGRAM_COUNTS_10000("WebRTC.BWE.MidCallProbing.ProbedKbps",
                               bitrate_bps / 1000);
    mid_call_probing_waiting_for_result_ = false;
  }
  std::vector<ProbeClusterConfig> pending_probes;
  //1. 还可以进一步探测,因为没有到达最大容量
  if (state_ == State::kWaitingForProbingResult) {
    // Continue probing if probing results indicate channel has greater
    // capacity.
    RTC_LOG(LS_INFO) << "Measured bitrate: " << bitrate_bps
                     << " Minimum to probe further: "
                     << min_bitrate_to_probe_further_bps_;
    //估计的bitrate已经大于了设定的进一步探测的最小值,则启动新的探测
    if (min_bitrate_to_probe_further_bps_ != kExponentialProbingDisabled &&
        bitrate_bps > min_bitrate_to_probe_further_bps_) {
      pending_probes = InitiateProbing(
          at_time_ms,
          {static_cast<int64_t>(config_.further_exponential_probe_scale *
                                bitrate_bps)},
          true);
    }
  }
  //2. 记录大幅度的码率下降的时间以及大小
  if (bitrate_bps < kBitrateDropThreshold * estimated_bitrate_bps_) {
    time_of_last_large_drop_ms_ = at_time_ms;
    bitrate_before_last_large_drop_bps_ = estimated_bitrate_bps_;
  }

  estimated_bitrate_bps_ = bitrate_bps;
  return pending_probes;
}

如果带宽大幅度下降,则会启动请求探测。

2.4 带宽触发(当前带宽大幅下降)

//如果估计带宽大幅下降,则经过一段时间后,重新根据下降前的带宽进行一次主动探测
std::vector<ProbeClusterConfig> ProbeController::RequestProbe(
    int64_t at_time_ms) {
  // Called once we have returned to normal state after a large drop in
  // estimated bandwidth. The current response is to initiate a single probe
  // session (if not already probing) at the previous bitrate.
  //
  // If the probe session fails, the assumption is that this drop was a
  // real one from a competing flow or a network change.
  bool in_alr = alr_start_time_ms_.has_value();
  bool alr_ended_recently =
      (alr_end_time_ms_.has_value() &&
       at_time_ms - alr_end_time_ms_.value() < kAlrEndedTimeoutMs);
  if (in_alr || alr_ended_recently || in_rapid_recovery_experiment_) {
    if (state_ == State::kProbingComplete) {
      //大幅度下降之前的bitrate * 0.85作为新的探测bitrate
      uint32_t suggested_probe_bps =
          kProbeFractionAfterDrop * bitrate_before_last_large_drop_bps_;
      uint32_t min_expected_probe_result_bps =
          (1 - kProbeUncertainty) * suggested_probe_bps;
      int64_t time_since_drop_ms = at_time_ms - time_of_last_large_drop_ms_;
      int64_t time_since_probe_ms = at_time_ms - last_bwe_drop_probing_time_ms_;
      if (min_expected_probe_result_bps > estimated_bitrate_bps_ &&
          time_since_drop_ms < kBitrateDropTimeoutMs &&
          time_since_probe_ms > kMinTimeBetweenAlrProbesMs) {
        RTC_LOG(LS_INFO) << "Detected big bandwidth drop, start probing.";
        // Track how often we probe in response to bandwidth drop in ALR.
        RTC_HISTOGRAM_COUNTS_10000(
            "WebRTC.BWE.BweDropProbingIntervalInS",
            (at_time_ms - last_bwe_drop_probing_time_ms_) / 1000);
        last_bwe_drop_probing_time_ms_ = at_time_ms;
        //启动新的探测
        return InitiateProbing(at_time_ms, {suggested_probe_bps}, false);
      }
    }
  }
  return std::vector<ProbeClusterConfig>();
}

3 ProbeBitrateEstimator

通过一个结构AggregatedCluster来进行记录:

struct AggregatedCluster {
    int num_probes = 0;
    Timestamp first_send = Timestamp::PlusInfinity();
    Timestamp last_send = Timestamp::MinusInfinity();
    Timestamp first_receive = Timestamp::PlusInfinity();
    Timestamp last_receive = Timestamp::MinusInfinity();
    DataSize size_last_send = DataSize::Zero();
    DataSize size_first_receive = DataSize::Zero();
    DataSize size_total = DataSize::Zero();
  };

absl::optional<DataRate> ProbeBitrateEstimator::HandleProbeAndEstimateBitrate(
    const PacketResult& packet_feedback) {
  int cluster_id = packet_feedback.sent_packet.pacing_info.probe_cluster_id;
  RTC_DCHECK_NE(cluster_id, PacedPacketInfo::kNotAProbe);

  EraseOldClusters(packet_feedback.receive_time);

  //1. 找到对应的cluster
  AggregatedCluster* cluster = &clusters_[cluster_id];
  //2. 更新相关结构信息
  if (packet_feedback.sent_packet.send_time < cluster->first_send) {
    cluster->first_send = packet_feedback.sent_packet.send_time;
  }
  if (packet_feedback.sent_packet.send_time > cluster->last_send) {
    cluster->last_send = packet_feedback.sent_packet.send_time;
    cluster->size_last_send = packet_feedback.sent_packet.size;
  }
  if (packet_feedback.receive_time < cluster->first_receive) {
    cluster->first_receive = packet_feedback.receive_time;
    cluster->size_first_receive = packet_feedback.sent_packet.size;
  }
  if (packet_feedback.receive_time > cluster->last_receive) {
    cluster->last_receive = packet_feedback.receive_time;
  }
  //探测size和探测包数更新
  cluster->size_total += packet_feedback.sent_packet.size;
  cluster->num_probes += 1;

  RTC_DCHECK_GT(
      packet_feedback.sent_packet.pacing_info.probe_cluster_min_probes, 0);
  RTC_DCHECK_GT(packet_feedback.sent_packet.pacing_info.probe_cluster_min_bytes,
                0);
  //至少要接收到4个包才可以进行计算
  int min_probes =
      packet_feedback.sent_packet.pacing_info.probe_cluster_min_probes *
      kMinReceivedProbesRatio;
  //最小接收到的字节数
  DataSize min_size =
      DataSize::Bytes(
          packet_feedback.sent_packet.pacing_info.probe_cluster_min_bytes) *
      kMinReceivedBytesRatio;
  //还没有满足计算条件
  if (cluster->num_probes < min_probes || cluster->size_total < min_size)
    return absl::nullopt;

  //计算发送间隔和接收间隔
  TimeDelta send_interval = cluster->last_send - cluster->first_send;
  TimeDelta receive_interval = cluster->last_receive - cluster->first_receive;

  if (send_interval <= TimeDelta::Zero() || send_interval > kMaxProbeInterval ||
      receive_interval <= TimeDelta::Zero() ||
      receive_interval > kMaxProbeInterval) {
    RTC_LOG(LS_INFO) << "Probing unsuccessful, invalid send/receive interval"
                        " [cluster id: "
                     << cluster_id
                     << "] [send interval: " << ToString(send_interval)
                     << "]"
                        " [receive interval: "
                     << ToString(receive_interval) << "]";
    if (event_log_) {
      event_log_->Log(std::make_unique<RtcEventProbeResultFailure>(
          cluster_id, ProbeFailureReason::kInvalidSendReceiveInterval));
    }
    return absl::nullopt;
  }
  // Since the |send_interval| does not include the time it takes to actually
  // send the last packet the size of the last sent packet should not be
  // included when calculating the send bitrate.
  //计算发送size和速率
  RTC_DCHECK_GT(cluster->size_total, cluster->size_last_send);
  DataSize send_size = cluster->size_total - cluster->size_last_send;
  DataRate send_rate = send_size / send_interval;

  // Since the |receive_interval| does not include the time it takes to
  // actually receive the first packet the size of the first received packet
  // should not be included when calculating the receive bitrate.
  RTC_DCHECK_GT(cluster->size_total, cluster->size_first_receive);
  //计算接收size和rate
  DataSize receive_size = cluster->size_total - cluster->size_first_receive;
  DataRate receive_rate = receive_size / receive_interval;

  //接收速率和发送速率之比不应该大于2.0
  double ratio = receive_rate / send_rate;
  if (ratio > kMaxValidRatio) {
    RTC_LOG(LS_INFO) << "Probing unsuccessful, receive/send ratio too high"
                        " [cluster id: "
                     << cluster_id << "] [send: " << ToString(send_size)
                     << " / " << ToString(send_interval) << " = "
                     << ToString(send_rate)
                     << "]"
                        " [receive: "
                     << ToString(receive_size) << " / "
                     << ToString(receive_interval) << " = "
                     << ToString(receive_rate)
                     << " ]"
                        " [ratio: "
                     << ToString(receive_rate) << " / " << ToString(send_rate)
                     << " = " << ratio << " > kMaxValidRatio ("
                     << kMaxValidRatio << ")]";
    if (event_log_) {
      event_log_->Log(std::make_unique<RtcEventProbeResultFailure>(
          cluster_id, ProbeFailureReason::kInvalidSendReceiveRatio));
    }
    return absl::nullopt;
  }
  RTC_LOG(LS_INFO) << "Probing successful"
                      " [cluster id: "
                   << cluster_id << "] [send: " << ToString(send_size) << " / "
                   << ToString(send_interval) << " = " << ToString(send_rate)
                   << " ]"
                      " [receive: "
                   << ToString(receive_size) << " / "
                   << ToString(receive_interval) << " = "
                   << ToString(receive_rate) << "]";
  //取发送速率和接收速率的最小值
  DataRate res = std::min(send_rate, receive_rate);
  // If we're receiving at significantly lower bitrate than we were sending at,
  // it suggests that we've found the true capacity of the link. In this case,
  // set the target bitrate slightly lower to not immediately overuse.
  if (receive_rate < kMinRatioForUnsaturatedLink * send_rate) {
    RTC_DCHECK_GT(send_rate, receive_rate);
    res = kTargetUtilizationFraction * receive_rate;
  }
  if (event_log_) {
    event_log_->Log(
        std::make_unique<RtcEventProbeResultSuccess>(cluster_id, res.bps()));
  }
  //估计的速率
  estimated_data_rate_ = res;
  return estimated_data_rate_;
}

综上分析,在webrtc中,主动探测随时都有可能发生,初始的时候肯定会进行主动探测,以更为迅速的确定初始带宽;如果启用alr probing,则可以启动5s为单位的周期性探测;另外在带宽骤降的时候以及带宽骤升的时候都要进行主动探测,其主要目的都是为了尽快的确定真实的带宽到底为多少。