k8s源码学习 client-go的Informer机制

原创

梅花香自苦寒来 2022-04-04 16:40:45 博主文章分类：GO、K8s ©著作权

文章标签 clinet-go informer DeltaFIFO Reflector Indexer 文章分类 kubernetes 云计算

©著作权归作者所有：来自51CTO博客作者梅花香自苦寒来的原创作品，请联系作者获取转载授权，否则将追究法律责任

Informer机制

在Kubernetes系统中，组件之间通过HTTP协议进行通信，在不依赖任何中间件的情况下需要保证消息的实时性、可靠性、顺序性等。那么Kubernetes是如何做到的呢？答案就是Informer机制。Kubernetes的其他组件都是通过client-go的Informer机制与Kubernetes API Server进行通信的。

1. Informer机制架构设计

Informer机制架构设计，Informer运行原理如图

k8s源码学习 client-go的Informer机制_Indexer

在Informer架构设计中，有多个核心组件，分别介绍如下。

1.Reflector，Reflector用于监控（Watch）指定的Kubernetes资源，当监控的资源发生变化时，触发相应的变更事件，例如Added（资源添加）事件、Updated（资源更新）事件、Deleted（资源删除）事件，并将其资源对象存放到本地缓存DeltaFIFO中。

2.DeltaFIFO，DeltaFIFO可以分开理解，FIFO是一个先进先出的队列，它拥有队列操作的基本方法，例如Add、Update、Delete、List、Pop、Close等，而Delta是一个资源对象存储，它可以保存资源对象的操作类型，例如Added（添加）操作类型、Updated（更新）操作类型、Deleted（删除）操作类型、Sync（同步）操作类型等。

3.Indexer，Indexer是client-go用来存储资源对象并自带索引功能的本地存储，Reflector从DeltaFIFO中将消费出来的资源对象存储至Indexer。Indexer与Etcd集群中的数据完全保持一致。client-go可以很方便地从本地存储中读取相应的资源对象数据，而无须每次从远程Etcd集群中读取，以减轻Kubernetes API Server和Etcd集群的压力。

直接阅读Informer机制代码会比较晦涩，通过Informers Example代码示例来理解Informer，印象会更深刻。Informers Example代码示例如下：

package main
import ( "log" "time"
"k8s.io/apimachinery/pkg/apis/meta/v1" 
"k8s.io/client-go/informers" 
"k85.io/client-go/kubernetes" 
"k8s.io/client-go/tools/cache" 
"k8s.io/client-go/tools/clientemd"
)
func main() {
  config, err := clientcmd.BuildConfigFromFlags("", "/root/. kube/config") 
  if err != nil {
    panic(err) 
  }
  clientset, err := kubernetes.NewForConfig(config)
  if err != nil { panic(err)
}
stopCh := make(chan struct{}) 
defer close(stopCh)
sharedInformers := informers.NewSharedInformerFactory(clientset, time.Minute) 
informer := sharedInformers.Core().V1().Pods().Informer() 
informer.AddEventHandler(cache.ResourceEventHandlerFuncs(
AddFunc: func(obj interface(}) {
  mObj := obj.(v1.Object)
  log.Printf("New Pod Added to Store: %s", mObj.GetName()
},
UpdateFunc: func(old0bj, newObj interface{}) {
  oObj: = old0bj.(v1.Object)
  nObj := newObj.(v1.Object)
  log.Printf("%s Pod Updated to %s", nObj.GetName(), nObj.GetName())
}.
DeleteFunc: func(obj interface{}) (
  mObj := obj.(vl.Object)
  log.Printf("Pod Deleted from Store: s", mObj.GetName())
},
})
informer.Run (stopCh) 
}

首先通过kubernetes.NewForConfig创建clientset对象，Informer需要通过ClientSet与Kubernetes API Server进行交互。另外，创建stopCh对象，该对象用于在程序进程退出之前通知Informer提前退出，因为Informer是一个持久运行的goroutine。

informers.NewSharedInformerFactory函数实例化了SharedInformer对象，它接收两个参数：第1个参数clientset是用于与Kubernetes API Server交互的客户端，第2个参数time.Minute用于设置多久进行一次resync（重新同步），resync会周期性地执行List操作，将所有的资源存放在Informer Store中，如果该参数为0，则禁用resync功能。

在Informers Example代码示例中，通过sharedInformers.Core().V1().Pods().Informer可以得到具体Pod资源的informer对象。通过informer.AddEventHandler函数可以为Pod资源添加资源事件回调方法，支持3种资源事件回调方法，分别介绍如下。

● AddFunc ：当创建Pod资源对象时触发的事件回调方法。

● UpdateFunc ：当更新Pod资源对象时触发的事件回调方法。

● DeleteFunc ：当删除Pod资源对象时触发的事件回调方法。

在正常的情况下，Kubernetes的其他组件在使用Informer机制时触发资源事件回调方法，将资源对象推送到WorkQueue或其他队列中，在Informers Example代码示例中，我们直接输出触发的资源事件。最后通过informer.Run函数运行当前的Informer，内部为Pod资源类型创建Informer。

通过Informer机制可以很容易地监控我们所关心的资源事件，例如，当监控Kubernetes Pod资源时，如果Pod资源发生了Added（资源添加）事件、Updated（资源更新）事件、Deleted（资源删除）事件，就通知client-go，告知Kubernetes资源事件变更了并且需要进行相应的处理。

1.资源Informer

每一个Kubernetes资源上都实现了Informer机制。每一个Informer上都会实现Informer和Lister方法，例如PodInformer，代码示例如下：vendor/k8s.io/client-go/informers/core/v1/pod.go

// PodInformer provides access to a shared informer and lister for
// Pods.
type PodInformer interface {
  Informer() cache.SharedIndexInformer
  Lister() v1.PodLister
}

调用不同资源的Informer，代码示例如下

podInformer := sharedInformer.Core().V1().Pods().Informer () 
nodeInformer := sharedInformer.Node().V1beta1().RuntimeClasses().

定义不同资源的Informer，允许监控不同资源的资源事件，例如，监听Node资源对象，当Kubernetes集群中有新的节点（Node）加入时，client-go能够及时收到资源对象的变更信息。

2.Shared Informer共享机制

Informer也被称为Shared Informer，它是可以共享使用的。在用client-go编写代码程序时，若同一资源的Informer被实例化了多次，每个Informer使用一个Reflector，那么会运行过多相同的ListAndWatch，太多重复的序列化和反序列化操作会导致Kubernetes API Server负载过重。

Shared Informer可以使同一类资源Informer共享一个Reflector，这样可以节约很多资源。通过map数据结构实现共享的Informer机制。Shared Informer定义了一个map数据结构，用于存放所有Informer的字段，代码示例如下：

代码路径：vendor/k8s.io/client-go/informers/factory.go

type sharedInformerFactory struct {
  client           kubernetes.Interface
  namespace        string
  tweakListOptions internalinterfaces.TweakListOptionsFunc
  lock             sync.Mutex
  defaultResync    time.Duration
  customResync     map[reflect.Type]time.Duration

  informers map[reflect.Type]cache.SharedIndexInformer
  // startedInformers is used for tracking which informers have been started.
  // This allows Start() to be called multiple times safely.
  startedInformers map[reflect.Type]bool
}

// InternalInformerFor returns the SharedIndexInformer for obj using an internal
// client.
func (f *sharedInformerFactory) InformerFor(obj runtime.Object, newFunc internalinterfaces.NewInformerFunc) cache.SharedIndexInformer {
  f.lock.Lock()
  defer f.lock.Unlock()

  informerType := reflect.TypeOf(obj)
  informer, exists := f.informers[informerType]
  if exists {
    return informer
  }

  resyncPeriod, exists := f.customResync[informerType]
  if !exists {
    resyncPeriod = f.defaultResync
  }

  informer = newFunc(f.client, resyncPeriod)
  f.informers[informerType] = informer

  return informer
}

// Start initializes all requested informers.
func (f *sharedInformerFactory) Start(stopCh <-chan struct{}) {
  f.lock.Lock()
  defer f.lock.Unlock()

  for informerType, informer := range f.informers {
    if !f.startedInformers[informerType] {
      go informer.Run(stopCh)
      f.startedInformers[informerType] = true
    }
  }
}

informers字段中存储了资源类型和对应于SharedIndexInformer的映射关系。InformerFor函数添加了不同资源的Informer，在添加过程中如果已经存在同类型的资源Informer，则返回当前Informer，不再继续添加。

最后通过Shared Informer的Start方法使f.informers中的每个informer通过goroutine持久运行。

2. Reflector

Informer可以对Kubernetes API Server的资源执行监控（Watch）操作，资源类型可以是Kubernetes内置资源，也可以是CRD自定义资源，其中最核心的功能是Reflector。Reflector用于监控指定资源的Kubernetes资源，当监控的资源发生变化时，触发相应的变更事件，例如Added（资源添加）事件、Updated（资源更新）事件、Deleted（资源删除）事件，并将其资源对象存放到本地缓存DeltaFIFO中。

通过NewReflector实例化Reflector对象，实例化过程中须传入ListerWatcher数据接口对象，它拥有List和Watch方法，用于获取及监控资源列表。只要实现了List和Watch方法的对象都可以称为ListerWatcher。Reflector对象通过Run函数启动监控并处理监控事件。而在Reflector源码实现中，其中最主要的是ListAndWatch函数，它负责获取资源列表（List）和监控（Watch）指定的Kubernetes API Server资源。

ListAndWatch函数实现可分为两部分：第1部分获取资源列表数据，第2部分监控资源对象。

1.获取资源列表数据

ListAndWatch List在程序第一次运行时获取该资源下所有的对象数据并将其存储至DeltaFIFO中。以Informers Example代码示例为例，在其中，我们获取的是所有Pod的资源数据，ListAndWatch List流程：

k8s源码学习 client-go的Informer机制_DeltaFIFO_02

（1）r.listerWatcher.List用于获取资源下的所有对象的数据，例如，获取所有Pod的资源数据。获取资源数据是由options的ResourceVersion（资源版本号）参数控制的，如果ResourceVersion为0，则表示获取所有Pod的资源数据；如果ResourceVersion非0，则表示根据资源版本号继续获取，功能有些类似于文件传输过程中的“断点续传”，当传输过程中遇到网络故障导致中断，下次再连接时，会根据资源版本号继续传输未完成的部分。可以使本地缓存中的数据与Etcd集群中的数据保持一致。

（2）listMetaInterface.GetResourceVersion用于获取资源版本号，ResourceVersion （资源版本号）非常重要，Kubernetes中所有的资源都拥有该字段，它标识当前资源对象的版本号。每次修改当前资源对象时，Kubernetes API Server都会更改ResourceVersion，使得client-go执行Watch操作时可以根据ResourceVersion来确定当前资源对象是否发生变化。更多关于ResourceVersion资源版本号的内容，请参考6.5.2节“ResourceVersion资源版本号”。

（3）meta.ExtractList用于将资源数据转换成资源对象列表，将runtime.Object对象转换成[]runtime.Object对象。因为r.listerWatcher.List获取的是资源下的所有对象的数据，例如所有的Pod资源数据，所以它是一个资源列表。

（4） r.syncWith用于将资源对象列表中的资源对象和资源版本号存储至DeltaFIFO中，并会替换已存在的对象。

（5）r.setLastSyncResourceVersion用于设置最新的资源版本号。ListAndWatch List代码示例如下：代码路径：vendor/k8s.io/client-go/tools/cache/reflector.go

// ListAndWatch first lists all items and get the resource version at the moment of call,
// and then use the resource version to watch.
// It returns error if ListAndWatch didn't even try to initialize watch.
func (r *Reflector) ListAndWatch(stopCh <-chan struct{}) error {
  klog.V(3).Infof("Listing and watching %v from %s", r.expectedTypeName, r.name)
  var resourceVersion string

  options := metav1.ListOptions{ResourceVersion: r.relistResourceVersion()}

  if err := func() error {
    initTrace := trace.New("Reflector ListAndWatch", trace.Field{"name", r.name})
    defer initTrace.LogIfLong(10 * time.Second)
    var list runtime.Object
    var paginatedResult bool
    var err error
    listCh := make(chan struct{}, 1)
    panicCh := make(chan interface{}, 1)
    go func() {
      defer func() {
        if r := recover(); r != nil {
          panicCh <- r
        }
      }()
      // Attempt to gather list in chunks, if supported by listerWatcher, if not, the first
      // list request will return the full response.
      pager := pager.New(pager.SimplePageFunc(func(opts metav1.ListOptions) (runtime.Object, error) {
        return r.listerWatcher.List(opts)
      }))
      switch {
      case r.WatchListPageSize != 0:
        pager.PageSize = r.WatchListPageSize
      case r.paginatedResult:
        // We got a paginated result initially. Assume this resource and server honor
        // paging requests (i.e. watch cache is probably disabled) and leave the default
        // pager size set.
      case options.ResourceVersion != "" && options.ResourceVersion != "0":
        // User didn't explicitly request pagination.
        //
        // With ResourceVersion != "", we have a possibility to list from watch cache,
        // but we do that (for ResourceVersion != "0") only if Limit is unset.
        // To avoid thundering herd on etcd (e.g. on master upgrades), we explicitly
        // switch off pagination to force listing from watch cache (if enabled).
        // With the existing semantic of RV (result is at least as fresh as provided RV),
        // this is correct and doesn't lead to going back in time.
        //
        // We also don't turn off pagination for ResourceVersion="0", since watch cache
        // is ignoring Limit in that case anyway, and if watch cache is not enabled
        // we don't introduce regression.
        pager.PageSize = 0
      }

      list, paginatedResult, err = pager.List(context.Background(), options)
      if isExpiredError(err) || isTooLargeResourceVersionError(err) {
        r.setIsLastSyncResourceVersionUnavailable(true)
        // Retry immediately if the resource version used to list is unavailable.
        // The pager already falls back to full list if paginated list calls fail due to an "Expired" error on
        // continuation pages, but the pager might not be enabled, the full list might fail because the
        // resource version it is listing at is expired or the cache may not yet be synced to the provided
        // resource version. So we need to fallback to resourceVersion="" in all to recover and ensure
        // the reflector makes forward progress.
        list, paginatedResult, err = pager.List(context.Background(), metav1.ListOptions{ResourceVersion: r.relistResourceVersion()})
      }
      close(listCh)
    }()
    select {
    case <-stopCh:
      return nil
    case r := <-panicCh:
      panic(r)
    case <-listCh:
    }
    initTrace.Step("Objects listed", trace.Field{"error", err})
    if err != nil {
      klog.Warningf("%s: failed to list %v: %v", r.name, r.expectedTypeName, err)
      return fmt.Errorf("failed to list %v: %v", r.expectedTypeName, err)
    }

    // We check if the list was paginated and if so set the paginatedResult based on that.
    // However, we want to do that only for the initial list (which is the only case
    // when we set ResourceVersion="0"). The reasoning behind it is that later, in some
    // situations we may force listing directly from etcd (by setting ResourceVersion="")
    // which will return paginated result, even if watch cache is enabled. However, in
    // that case, we still want to prefer sending requests to watch cache if possible.
    //
    // Paginated result returned for request with ResourceVersion="0" mean that watch
    // cache is disabled and there are a lot of objects of a given type. In such case,
    // there is no need to prefer listing from watch cache.
    if options.ResourceVersion == "0" && paginatedResult {
      r.paginatedResult = true
    }

    r.setIsLastSyncResourceVersionUnavailable(false) // list was successful
    listMetaInterface, err := meta.ListAccessor(list)
    if err != nil {
      return fmt.Errorf("unable to understand list result %#v: %v", list, err)
    }
    resourceVersion = listMetaInterface.GetResourceVersion()
    initTrace.Step("Resource version extracted")
    items, err := meta.ExtractList(list)
    if err != nil {
      return fmt.Errorf("unable to understand list result %#v (%v)", list, err)
    }
    initTrace.Step("Objects extracted")
    if err := r.syncWith(items, resourceVersion); err != nil {
      return fmt.Errorf("unable to sync list result: %v", err)
    }
    initTrace.Step("SyncWith done")
    r.setLastSyncResourceVersion(resourceVersion)
    initTrace.Step("Resource version updated")
    return nil
  }(); err != nil {
    return err
  }

  resyncerrc := make(chan error, 1)
  cancelCh := make(chan struct{})
  defer close(cancelCh)
  go func() {
    resyncCh, cleanup := r.resyncChan()
    defer func() {
      cleanup() // Call the last one written into cleanup
    }()
    for {
      select {
      case <-resyncCh:
      case <-stopCh:
        return
      case <-cancelCh:
        return
      }
      if r.ShouldResync == nil || r.ShouldResync() {
        klog.V(4).Infof("%s: forcing resync", r.name)
        if err := r.store.Resync(); err != nil {
          resyncerrc <- err
          return
        }
      }
      cleanup()
      resyncCh, cleanup = r.resyncChan()
    }
  }()

  for {
    // give the stopCh a chance to stop the loop, even in case of continue statements further down on errors
    select {
    case <-stopCh:
      return nil
    default:
    }

    timeoutSeconds := int64(minWatchTimeout.Seconds() * (rand.Float64() + 1.0))
    options = metav1.ListOptions{
      ResourceVersion: resourceVersion,
      // We want to avoid situations of hanging watchers. Stop any watchers that do not
      // receive any events within the timeout window.
      TimeoutSeconds: &timeoutSeconds,
      // To reduce load on kube-apiserver on watch restarts, you may enable watch bookmarks.
      // Reflector doesn't assume bookmarks are returned at all (if the server do not support
      // watch bookmarks, it will ignore this field).
      AllowWatchBookmarks: true,
    }

    // start the clock before sending the request, since some proxies won't flush headers until after the first watch event is sent
    start := r.clock.Now()
    w, err := r.listerWatcher.Watch(options)
    if err != nil {
      // If this is "connection refused" error, it means that most likely apiserver is not responsive.
      // It doesn't make sense to re-list all objects because most likely we will be able to restart
      // watch where we ended.
      // If that's the case begin exponentially backing off and resend watch request.
      // Do the same for "429" errors.
      if utilnet.IsConnectionRefused(err) || apierrors.IsTooManyRequests(err) {
        <-r.initConnBackoffManager.Backoff().C()
        continue
      }
      return err
    }

    if err := r.watchHandler(start, w, &resourceVersion, resyncerrc, stopCh); err != nil {
      if err != errorStopRequested {
        switch {
        case isExpiredError(err):
          // Don't set LastSyncResourceVersionUnavailable - LIST call with ResourceVersion=RV already
          // has a semantic that it returns data at least as fresh as provided RV.
          // So first try to LIST with setting RV to resource version of last observed object.
          klog.V(4).Infof("%s: watch of %v closed with: %v", r.name, r.expectedTypeName, err)
        case apierrors.IsTooManyRequests(err):
          klog.V(2).Infof("%s: watch of %v returned 429 - backing off", r.name, r.expectedTypeName)
          <-r.initConnBackoffManager.Backoff().C()
          continue
        default:
          klog.Warningf("%s: watch of %v ended with: %v", r.name, r.expectedTypeName, err)
        }
      }
      return nil
    }
  }
}

r.listerWatcher.List函数实际调用了Pod Informer下的ListFunc函数，它通过ClientSet客户端与Kubernetes API Server交互并获取Pod资源列表数据，代码示例如下：

代码路径：k8s.io/client-go/informers/core/v1/pod.go (ListFunc函数)

ListFunc

func NewFilteredPodInformer(client kubernetes.Interface, namespace string, resyncPeriod time.Duration, indexers cache.Indexers, tweakListOptions internalinterfaces.TweakListOptionsFunc) cache.SharedIndexInformer {
  return cache.NewSharedIndexInformer(
    &cache.ListWatch{
      ListFunc: func(options metav1.ListOptions) (runtime.Object, error) {
        if tweakListOptions != nil {
          tweakListOptions(&options)
        }
        return client.CoreV1().Pods(namespace).List(context.TODO(), options)
      },
      WatchFunc: func(options metav1.ListOptions) (watch.Interface, error) {
        if tweakListOptions != nil {
          tweakListOptions(&options)
        }
        return client.CoreV1().Pods(namespace).Watch(context.TODO(), options)
      },
    },
    &corev1.Pod{},
    resyncPeriod,
    indexers,
  )
}

2.监控资源对象

Watch（监控）操作通过HTTP协议与Kubernetes API Server建立长连接，接收Kubernetes API Server发来的资源变更事件。Watch操作的实现机制使用HTTP协议的分块传输编码（Chunked Transfer Encoding）。当client-go调用Kubernetes API Server时，Kubernetes API Server在Response的HTTP Header中设置Transfer-Encoding的值为chunked，表示采用分块传输编码，客户端收到该信息后，便与服务端进行连接，并等待下一个数据块（即资源的事件信息）。

ListAndWatch Watch代码示例如下：代码路径：vendor/k8s.io/client-go/tools/cache/reflector.go

func (r *Reflector) ListAndWatch(stopCh <-chan struct{}) error {
  ....
  for {
      // give the stopCh a chance to stop the loop, even in case of continue statements further down on errors
      select {
      case <-stopCh:
        return nil
      default:
      }

      timeoutSeconds := int64(minWatchTimeout.Seconds() * (rand.Float64() + 1.0))
      options = metav1.ListOptions{
        ResourceVersion: resourceVersion,
        // We want to avoid situations of hanging watchers. Stop any watchers that do not
        // receive any events within the timeout window.
        TimeoutSeconds: &timeoutSeconds,
        // To reduce load on kube-apiserver on watch restarts, you may enable watch bookmarks.
        // Reflector doesn't assume bookmarks are returned at all (if the server do not support
        // watch bookmarks, it will ignore this field).
        AllowWatchBookmarks: true,
      }

      // start the clock before sending the request, since some proxies won't flush headers until after the first watch event is sent
      start := r.clock.Now()
      w, err := r.listerWatcher.Watch(options)
      if err != nil {
        // If this is "connection refused" error, it means that most likely apiserver is not responsive.
        // It doesn't make sense to re-list all objects because most likely we will be able to restart
        // watch where we ended.
        // If that's the case begin exponentially backing off and resend watch request.
        // Do the same for "429" errors.
        if utilnet.IsConnectionRefused(err) || apierrors.IsTooManyRequests(err) {
          <-r.initConnBackoffManager.Backoff().C()
          continue
        }
        return err
      }

      if err := r.watchHandler(start, w, &resourceVersion, resyncerrc, stopCh); err != nil {
        if err != errorStopRequested {
          switch {
          case isExpiredError(err):
            // Don't set LastSyncResourceVersionUnavailable - LIST call with ResourceVersion=RV already
            // has a semantic that it returns data at least as fresh as provided RV.
            // So first try to LIST with setting RV to resource version of last observed object.
            klog.V(4).Infof("%s: watch of %v closed with: %v", r.name, r.expectedTypeName, err)
          case apierrors.IsTooManyRequests(err):
            klog.V(2).Infof("%s: watch of %v returned 429 - backing off", r.name, r.expectedTypeName)
            <-r.initConnBackoffManager.Backoff().C()
            continue
          default:
            klog.Warningf("%s: watch of %v ended with: %v", r.name, r.expectedTypeName, err)
          }
        }
        return nil
      }
    }
}

r.listerWatcher.Watch函数实际调用了Pod Informer下的WatchFunc函数，它通过ClientSet客户端与Kubernetes API Server建立长连接，监控指定资源的变更事件，代码示例如下：

代码路径：k8s.io/client-go/informers/core/v1/pod.go

WatchFunc: func(options metav1.ListOptions) (watch.Interface, error) {
        if tweakListOptions != nil {
          tweakListOptions(&options)
        }
        return client.CoreV1().Pods(namespace).Watch(context.TODO(), options)
      },

r.watchHandler用于处理资源的变更事件。当触发Added（资源添加）事件、Updated （资源更新）事件、Deleted（资源删除）事件时，将对应的资源对象更新到本地缓存DeltaFIFO中并更新ResourceVersion资源版本号。r.watchHandler代码示例如下：

for {
    // give the stopCh a chance to stop the loop, even in case of continue statements further down on errors
    select {
    case <-stopCh:
      return nil
    default:
    }

    timeoutSeconds := int64(minWatchTimeout.Seconds() * (rand.Float64() + 1.0))
    options = metav1.ListOptions{
      ResourceVersion: resourceVersion,
      // We want to avoid situations of hanging watchers. Stop any watchers that do not
      // receive any events within the timeout window.
      TimeoutSeconds: &timeoutSeconds,
      // To reduce load on kube-apiserver on watch restarts, you may enable watch bookmarks.
      // Reflector doesn't assume bookmarks are returned at all (if the server do not support
      // watch bookmarks, it will ignore this field).
      AllowWatchBookmarks: true,
    }

    // start the clock before sending the request, since some proxies won't flush headers until after the first watch event is sent
    start := r.clock.Now()
    w, err := r.listerWatcher.Watch(options)
    if err != nil {
      // If this is "connection refused" error, it means that most likely apiserver is not responsive.
      // It doesn't make sense to re-list all objects because most likely we will be able to restart
      // watch where we ended.
      // If that's the case begin exponentially backing off and resend watch request.
      // Do the same for "429" errors.
      if utilnet.IsConnectionRefused(err) || apierrors.IsTooManyRequests(err) {
        <-r.initConnBackoffManager.Backoff().C()
        continue
      }
      return err
    }

    if err := r.watchHandler(start, w, &resourceVersion, resyncerrc, stopCh); err != nil {
      if err != errorStopRequested {
        switch {
        case isExpiredError(err):
          // Don't set LastSyncResourceVersionUnavailable - LIST call with ResourceVersion=RV already
          // has a semantic that it returns data at least as fresh as provided RV.
          // So first try to LIST with setting RV to resource version of last observed object.
          klog.V(4).Infof("%s: watch of %v closed with: %v", r.name, r.expectedTypeName, err)
        case apierrors.IsTooManyRequests(err):
          klog.V(2).Infof("%s: watch of %v returned 429 - backing off", r.name, r.expectedTypeName)
          <-r.initConnBackoffManager.Backoff().C()
          continue
        default:
          klog.Warningf("%s: watch of %v ended with: %v", r.name, r.expectedTypeName, err)
        }
      }
      return nil
    }
  }

3. DeltaFIFO

DeltaFIFO可以分开理解，FIFO是一个先进先出的队列，它拥有队列操作的基本方法，例如Add、Update、Delete、List、Pop、Close等，而Delta是一个资源对象存储，它可以保存资源对象的操作类型，例如Added（添加）操作类型、Updated（更新）操作类型、Deleted（删除）操作类型、Sync（同步）操作类型等。DeltaFIFO结构代码示例如下：

代码路径：vendor/k8s.io/client-go/tools/cache/delta_fifo.go

type DeltaFIFO struct {
  // lock/cond protects access to 'items' and 'queue'.
  lock sync.RWMutex
  cond sync.Cond

  // `items` maps a key to a Deltas.
  // Each such Deltas has at least one Delta.
  items map[string]Deltas

  // `queue` maintains FIFO order of keys for consumption in Pop().
  // There are no duplicates in `queue`.
  // A key is in `queue` if and only if it is in `items`.
  queue []string

  // populated is true if the first batch of items inserted by Replace() has been populated
  // or Delete/Add/Update/AddIfNotPresent was called first.
  populated bool
  // initialPopulationCount is the number of items inserted by the first call of Replace()
  initialPopulationCount int

  // keyFunc is used to make the key used for queued item
  // insertion and retrieval, and should be deterministic.
  keyFunc KeyFunc

  // knownObjects list keys that are "known" --- affecting Delete(),
  // Replace(), and Resync()
  knownObjects KeyListerGetter

  // Used to indicate a queue is closed so a control loop can exit when a queue is empty.
  // Currently, not used to gate any of CRUD operations.
  closed bool

  // emitDeltaTypeReplaced is whether to emit the Replaced or Sync
  // DeltaType when Replace() is called (to preserve backwards compat).
  emitDeltaTypeReplaced bool
}

// Delta is a member of Deltas (a list of Delta objects) which
// in its turn is the type stored by a DeltaFIFO. It tells you what
// change happened, and the object's state after* that change.
//
// [*] Unless the change is a deletion, and then you'll get the final
// state of the object before it was deleted.
type Delta struct {
  Type   DeltaType
  Object interface{}
}

// Deltas is a list of one or more 'Delta's to an individual object.
// The oldest delta is at index 0, the newest delta is the last one.
type Deltas []Delta

DeltaFIFO与其他队列最大的不同之处是，它会保留所有关于资源对象（obj）的操作类型，队列中会存在拥有不同操作类型的同一个资源对象，消费者在处理该资源对象时能够了解该资源对象所发生的事情。queue字段存储资源对象的key，该key通过KeyOf函数计算得到。items字段通过map数据结构的方式存储，value存储的是对象的Deltas数组。DeltaFIFO存储结构如图

k8s源码学习 client-go的Informer机制_informer_03

DeltaFIFO本质上是一个先进先出的队列，有数据的生产者和消费者，其中生产者是Reflector调用的Add方法，消费者是Controller调用的Pop方法。下面分析DeltaFIFO的核心功能：生产者方法、消费者方法及Resync机制。

1.生产者方法

DeltaFIFO队列中的资源对象在Added（资源添加）事件、Updated（资源更新）事件、Deleted（资源删除）事件中都调用了queueActionLocked函数，它是DeltaFIFO实现的关键，代码示例如下：vendor/k8s.io/client-go/tools/cache/delta_fifo.go

// queueActionLocked appends to the delta list for the object.
// Caller must lock first.
func (f *DeltaFIFO) queueActionLocked(actionType DeltaType, obj interface{}) error {
  id, err := f.KeyOf(obj)
  if err != nil {
    return KeyError{obj, err}
  }
  oldDeltas := f.items[id]
  newDeltas := append(oldDeltas, Delta{actionType, obj})
  newDeltas = dedupDeltas(newDeltas)

  if len(newDeltas) > 0 {
    if _, exists := f.items[id]; !exists {
      f.queue = append(f.queue, id)
    }
    f.items[id] = newDeltas
    f.cond.Broadcast()
  } else {
    // This never happens, because dedupDeltas never returns an empty list
    // when given a non-empty list (as it is here).
    // If somehow it happens anyway, deal with it but complain.
    if oldDeltas == nil {
      klog.Errorf("Impossible dedupDeltas for id=%q: oldDeltas=%#+v, obj=%#+v; ignoring", id, oldDeltas, obj)
      return nil
    }
    klog.Errorf("Impossible dedupDeltas for id=%q: oldDeltas=%#+v, obj=%#+v; breaking invariant by storing empty Deltas", id, oldDeltas, obj)
    f.items[id] = newDeltas
    return fmt.Errorf("Impossible dedupDeltas for id=%q: oldDeltas=%#+v, obj=%#+v; broke DeltaFIFO invariant by storing empty Deltas", id, oldDeltas, obj)
  }
  return nil
}

queueActionLocked代码执行流程如下。

（1）通过f.KeyOf函数计算出资源对象的key。

（2）把newDeltas append到oldDeltas，并通过dedupDeltas函数进行去重操作。

（3）更新构造后的Delta并通过cond.Broadcast通知所有消费者解除阻塞。

2.消费者方法

Pop方法作为消费者方法使用，从DeltaFIFO的头部取出最早进入队列中的资源对象数据。Pop方法须传入process函数，用于接收并处理对象的回调方法，代码示例如下:vendor/k8s.io/client-go/tools/cache/delta_fifo.go

func (f *DeltaFIFO) Pop(process PopProcessFunc) (interface{}, error) {
  f.lock.Lock()
  defer f.lock.Unlock()
  for {
    for len(f.queue) == 0 {
      // When the queue is empty, invocation of Pop() is blocked until new item is enqueued.
      // When Close() is called, the f.closed is set and the condition is broadcasted.
      // Which causes this loop to continue and return from the Pop().
      if f.closed {
        return nil, ErrFIFOClosed
      }

      f.cond.Wait()
    }
    id := f.queue[0]
    f.queue = f.queue[1:]
    depth := len(f.queue)
    if f.initialPopulationCount > 0 {
      f.initialPopulationCount--
    }
    item, ok := f.items[id]
    if !ok {
      // This should never happen
      klog.Errorf("Inconceivable! %q was in f.queue but not f.items; ignoring.", id)
      continue
    }
    delete(f.items, id)
    // Only log traces if the queue depth is greater than 10 and it takes more than
    // 100 milliseconds to process one item from the queue.
    // Queue depth never goes high because processing an item is locking the queue,
    // and new items can't be added until processing finish.
    // https://github.com/kubernetes/kubernetes/issues/103789
    if depth > 10 {
      trace := utiltrace.New("DeltaFIFO Pop Process",
        utiltrace.Field{Key: "ID", Value: id},
        utiltrace.Field{Key: "Depth", Value: depth},
        utiltrace.Field{Key: "Reason", Value: "slow event handlers blocking the queue"})
      defer trace.LogIfLong(100 * time.Millisecond)
    }
    err := process(item)
    if e, ok := err.(ErrRequeue); ok {
      f.addIfNotPresent(id, item)
      err = e.Err
    }
    // Don't need to copyDeltas here, because we're transferring
    // ownership to the caller.
    return item, err
  }
}

当队列中没有数据时，通过f.cond.wait阻塞等待数据，只有收到cond.Broadcast时才说明有数据被添加，解除当前阻塞状态。如果队列中不为空，取出f.queue的头部数据，将该对象传入process回调函数，由上层消费者进行处理。如果process回调函数处理出错，则将该对象重新存入队列。

Controller的processLoop方法负责从DeltaFIFO队列中取出数据传递给process回调函数。process回调函数HandleDeltas代码路径：vendor/k8s.io/client-go/tools/cache/shared_informer.go

func (s *sharedIndexInformer) HandleDeltas(obj interface{}) error {
  s.blockDeltas.Lock()
  defer s.blockDeltas.Unlock()

  // from oldest to newest
  for _, d := range obj.(Deltas) {
    switch d.Type {
    case Sync, Replaced, Added, Updated:
      s.cacheMutationDetector.AddObject(d.Object)
      if old, exists, err := s.indexer.Get(d.Object); err == nil && exists {
        if err := s.indexer.Update(d.Object); err != nil {
          return err
        }

        isSync := false
        switch {
        case d.Type == Sync:
          // Sync events are only propagated to listeners that requested resync
          isSync = true
        case d.Type == Replaced:
          if accessor, err := meta.Accessor(d.Object); err == nil {
            if oldAccessor, err := meta.Accessor(old); err == nil {
              // Replaced events that didn't change resourceVersion are treated as resync events
              // and only propagated to listeners that requested resync
              isSync = accessor.GetResourceVersion() == oldAccessor.GetResourceVersion()
            }
          }
        }
        s.processor.distribute(updateNotification{oldObj: old, newObj: d.Object}, isSync)
      } else {
        if err := s.indexer.Add(d.Object); err != nil {
          return err
        }
        s.processor.distribute(addNotification{newObj: d.Object}, false)
      }
    case Deleted:
      if err := s.indexer.Delete(d.Object); err != nil {
        return err
      }
      s.processor.distribute(deleteNotification{oldObj: d.Object}, false)
    }
  }
  return nil
}

HandleDeltas函数作为process回调函数，当资源对象的操作类型为Added、Updated、Deleted时，将该资源对象存储至Indexer（它是并发安全的存储），并通过distribute函数将资源对象分发至SharedInformer。还记得Informers Example代码示例吗？在Informers Example代码示例中，我们通过informer.AddEventHandler函数添加了对资源事件进行处理的函数，distribute函数则将资源对象分发到该事件处理函数中。

3.Resync机制

Resync机制会将Indexer本地存储中的资源对象同步到DeltaFIFO中，并将这些资源对象设置为Sync的操作类型。Resync函数在Reflector中定时执行，它的执行周期由NewReflector函数传入的resyncPeriod参数设定。Resync→syncKeyLocked代码: vendor/k8s.io/client-go/tools/cache/delta_fifo.go

// Resync adds, with a Sync type of Delta, every object listed by
// `f.knownObjects` whose key is not already queued for processing.
// If `f.knownObjects` is `nil` then Resync does nothing.
func (f *DeltaFIFO) Resync() error {
  f.lock.Lock()
  defer f.lock.Unlock()

  if f.knownObjects == nil {
    return nil
  }

  keys := f.knownObjects.ListKeys()
  for _, k := range keys {
    if err := f.syncKeyLocked(k); err != nil {
      return err
    }
  }
  return nil
}

func (f *DeltaFIFO) syncKeyLocked(key string) error {
  obj, exists, err := f.knownObjects.GetByKey(key)
  if err != nil {
    klog.Errorf("Unexpected error %v during lookup of key %v, unable to queue object for sync", err, key)
    return nil
  } else if !exists {
    klog.Infof("Key %v does not exist in known objects store, unable to queue object for sync", key)
    return nil
  }

  // If we are doing Resync() and there is already an event queued for that object,
  // we ignore the Resync for it. This is to avoid the race, in which the resync
  // comes with the previous value of object (since queueing an event for the object
  // doesn't trigger changing the underlying store <knownObjects>.
  id, err := f.KeyOf(obj)
  if err != nil {
    return KeyError{obj, err}
  }
  if len(f.items[id]) > 0 {
    return nil
  }

  if err := f.queueActionLocked(Sync, obj); err != nil {
    return fmt.Errorf("couldn't queue object: %v", err)
  }
  return nil
}

f.knownObjects是Indexer本地存储对象，通过该对象可以获取client-go目前存储的所有资源对象，Indexer对象在NewDeltaFIFO函数实例化DeltaFIFO对象时传入。

4. Indexer

Indexer是client-go用来存储资源对象并自带索引功能的本地存储，Reflector从DeltaFIFO中将消费出来的资源对象存储至Indexer。Indexer中的数据与Etcd集群中的数据保持完全一致。client-go可以很方便地从本地存储中读取相应的资源对象数据，而无须每次都从远程Etcd集群中读取，这样可以减轻Kubernetes API Server和Etcd集群的压力。

在介绍Indexer之前，先介绍一下ThreadSafeMap。ThreadSafeMap是实现并发安全的存储。作为存储，它拥有存储相关的增、删、改、查操作方法，例如Add、Update、Delete、List、Get、Replace、Resync等。Indexer在ThreadSafeMap的基础上进行了封装，它继承了与ThreadSafeMap相关的操作方法并实现了Indexer Func等功能，例如Index、IndexKeys、GetIndexers等方法，这些方法为ThreadSafeMap提供了索引功能。Indexer存储结构如图

k8s源码学习 client-go的Informer机制_Indexer_04

1. ThreadSafeMap并发安全存储

ThreadSafeMap是一个内存中的存储，其中的数据并不会写入本地磁盘中，每次的增、删、改、查操作都会加锁，以保证数据的一致性。ThreadSafeMap将资源对象数据存储于一个map数据结构中，ThreadSafeMap结构代码示例如下：

代码路径：vendor/k8s.io/client-go/tools/cache/thread_safe_store.go

// threadSafeMap implements ThreadSafeStore
type threadSafeMap struct {
  lock  sync.RWMutex
  items map[string]interface{}

  // indexers maps a name to an IndexFunc
  indexers Indexers
  // indices maps a name to an Index
  indices Indices
}

items字段中存储的是资源对象数据，其中items的key通过keyFunc函数计算得到，计算默认使用MetaNamespaceKeyFunc函数，该函数根据资源对象计算出<namespace>/<name>格式的key，如果资源对象的<namespace>为空，则<name>作为key，而items的value用于存储资源对象。

2.Indexer索引器

在每次增、删、改ThreadSafeMap数据时，都会通过updateIndices或deleteFromIndices函数变更Indexer。Indexer被设计为可以自定义索引函数，这符合Kubernetes高扩展性的特点。Indexer有4个非常重要的数据结构，分别是Indices、Index、Indexers及IndexFunc。直接阅读相关代码会比较晦涩，通过Indexer Example代码示例来理解Indexer，印象会更深刻。Indexer Example代码示例如下

package main
import (
  "k8s.io/apimachinery/pkg/apis/meta/v1"
  "k8s.io/client-go/tools/cache"
  "k8s.io/api/core/v1"
  "fmt"
  "log"
  "strings
  "time"
)
func UsersIndexFunc (obj interface{}) ([]string, error) {
    pod := obj. (*v1. Pod)
    usersString := pod. Annotations["users"]
    return strings.Split (usersString, ", "), nil
func main () {
    index := cache.NewIndexer (cache. MetaNamespaceKeyFunc,
cache. Indexers {"byUser": UsersIndexFunc})
    pod1 := &v1.Pod(ObjectMeta: metav1.ObjectMeta(Name: "one",
Annotations: map[string]string("users": "ernie, bert"}}}
    pod2 := &v1. Pod{ObjectMeta: metav1.ObjectMeta(Name: "two",
Annotations: map[string]string{"users":"bert, oscar"}}}
    pod3 := &v1. Pod(ObjectMeta: metav1.ObjectMeta(Name: "tre",
Annotations: map[string]string("users": "ernie, elmo"}}}
index. Add (pod1)
index. Add (pod2)
index.Add (pod3)
erniePods, err := index.ByIndex ( "byUser", "ernie")
if err != nil {
panic (err) }
for _,erniePod := range erniePods{
    fmt.Println(erniePod. (*v1. Pod) .Name)
} } 

//输出 
one 
tre

首先定义一个索引器函数UsersIndexFunc，在该函数中，我们定义查询出所有Pod资源下Annotations字段的key为users的Pod。

cache.NewIndexer函数实例化了Indexer对象，该函数接收两个参数：第1个参数是KeyFunc，它用于计算资源对象的key，计算默认使用cache.MetaNamespaceKeyFunc函数；第2个参数是cache.Indexers，用于定义索引器，其中key为索引器的名称（即byUser），value为索引器。通过index.Add函数添加3个Pod资源对象。最后通过index.ByIndex函数查询byUser索引器下匹配ernie字段的Pod列表。Indexer Example代码示例最终检索出名称为one和tre的Pod。

现在再来理解Indexer的4个重要的数据结构就非常容易了，它们分别是Indexers、IndexFunc、Indices、Index，数据结构如下：代码路径：vendor/k8s.io/client-go/tools/cache/index.go

// Index maps the indexed value to a set of keys in the store that match on that value
type Index map[string]sets.String

// Indexers maps a name to an IndexFunc
type Indexers map[string]IndexFunc

// Indices maps a name to an Index
type Indices map[string]Index

// IndexFunc knows how to compute the set of indexed values for an object.
type IndexFunc func(obj interface{}) ([]string, error)

Indexer数据结构说明如下。

● Indexers ：存储索引器，key为索引器名称，value为索引器的实现函数。

● IndexFunc ：索引器函数，定义为接收一个资源对象，返回检索结果列表。

● Indices ：存储缓存器，key为缓存器名称（在Indexer Example代码示例中，缓存器命名与索引器命名相对应），value为缓存数据。

● Index ：存储缓存数据，其结构为K/V。

3.Indexer索引器核心实现

index.ByIndex函数通过执行索引器函数得到索引结果，代码示例如下：

代码路径：vendor/k8s.io/client-go/tools/cache/thread_safe_store.go

// ByIndex returns a list of the items whose indexed values in the given index include the given indexed value
func (c *threadSafeMap) ByIndex(indexName, indexedValue string) ([]interface{}, error) {
  c.lock.RLock()
  defer c.lock.RUnlock()

  indexFunc := c.indexers[indexName]
  if indexFunc == nil {
    return nil, fmt.Errorf("Index with name %s does not exist", indexName)
  }

  index := c.indices[indexName]

  set := index[indexedValue]
  list := make([]interface{}, 0, set.Len())
  for key := range set {
    list = append(list, c.items[key])
  }

  return list, nil
}

ByIndex接收两个参数：IndexName（索引器名称）和indexKey（需要检索的key）。首先从c.indexers中查找指定的索引器函数，从c.indices中查找指定的缓存器函数，然后根据需要检索的indexKey从缓存数据中查到并返回数据。

Index中的缓存数据为Set集合数据结构，Set本质与Slice相同，但Set中不存在相同元素。由于Go语言标准库没有提供Set数据结构，Go语言中的map结构类型是不能存在相同key的，所以Kubernetes将map结构类型的key作为Set数据结构，实现Set去重特性。

上一篇：k8s源码学习 client-go源码结构和客户端对象

下一篇：k8s源码学习-WorkQueue（工作队列）

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯