Skip to main content

Home/ Larvata/ Group items tagged node

Rss Feed Group items tagged

張 旭

[Elasticsearch] 分散式特性 & 分散式搜尋的機制 | 小信豬的原始部落 - 0 views

  • 水平擴展儲存空間
  • Data HA:若有 node 掛掉,資料不會遺失
  • 若是要查詢 cluster 中的 node 狀態,可以使用 GET /_cat/nodes API
  • ...39 more annotations...
  • 決定每個 shard 要被分配到哪個 data node 上
  • 為 cluster 設置多個 master node
  • 一旦發現被選中的 master node 出現問題,就會選出新的 master node
  • 每個 node 啟動時就預設是一個 master eligible node,可以透過設定 node.master: false 取消此預設設定
  • 處理 request 的 node 稱為 Coordinating Node,其功能是將 request 轉發到合適的 node 上
  • 所有的 node 都預設是 Coordinating Node
  • coordinating node 可以直接接收 search request 並處理,不需要透過 master node 轉過來
  • 可以保存資料的 node,每個 node 啟動後都會預設是 data node,可以透過設定 node.data: false 停用 data node 功能
  • 由 master node 決定如何把分片分發到不同的 data node 上
  • 每個 node 上都保存了 cluster state
  • 只有 master 才可以修改 cluster state 並負責同步給其他 node
  • 每個 node 都會詳細紀錄本身的狀態資訊
  • shard 是 Elasticsearch 分散式儲存的基礎,包含 primary shard & replica shard
  • 每一個 shard 就是一個 Lucene instance
  • primary shard 功能是將一份被索引後的資料,分散到多個 data node 上存放,實現儲存方面的水平擴展
  • primary shard 的數量在建立 index 時就會指定,後續是無法修改的,若要修改就必須要進行 reindex
  • 當 primary shard 遺失時,replica shard 就可以被 promote 成 primary shard 來保持資料完整性
  • replica shard 數量可以動態調整,讓每個 data node 上都有完整的資料
  • ES 7.0 開始,primary shard 預設為 1,replica shard 預設為 0
  • replica shard 若設定過多,會降低 cluster 整體的寫入效能
  • replica shard 必須和 primary shard 被分配在不同的 data node 上
  • 所有的 primary shard 可以在同一個 data node 上
  • 透過 GET _cluster/health/<target> 可以取得目前 cluster 的健康狀態
  • Yellow:表示 primary shard 可以正常分配,但 replica shard 分配有問題
  • 透過 GET /_cat/shards/<target> 可以取得目前的 shard 狀態
  • replica shard 無法被分配,因此 cluster 健康狀態為黃色
  • 若是擔心 reboot 機器造成 failover 動作開始執行,可以設定將 replication 延遲一段時間後再執行(透過調整 settings 中的 index.unassigned.node_left.delayed_timeout 參數),避免無謂的 data copy 動作 (此功能稱為 delay allocation)
  • 集群變紅,代表有 primary shard 丟失,這個時候會影響讀寫。
  • 如果 node 重新回來,會從 translog 中恢復沒有寫入的資料
  • 設定 index settings 之後,primary shard 數量無法隨意變更
  • 不建議直接發送請求到master節點,雖然也會工作,但是大量請求發送到 master,會有潛在的性能問題
  • shard 是 ES 中最小的工作單元
  • shard 是一個 Lucene 的 index
  • 將 Index Buffer 中的內容寫入 Segment,而這寫入的過程就稱為 Refresh
  • 當 document 被 refresh 進入到 segment 之後,就可以被搜尋到了
  • 在進行 refresh 時先將 segment 寫入 cache 以開放查詢
  • 將 document 進行索引時,同時也會寫入 transaction log,且預設都會寫入磁碟中
  • 每個 shard 都會有對應的 transaction log
  • 由於 transaction log 都會寫入磁碟中,因此當 node 從故障中恢復時,就會優先讀取 transaction log 來恢復資料
張 旭

Manage nodes in a swarm | Docker Documentation - 0 views

  • Drain means the scheduler doesn’t assign new tasks to the node. The scheduler shuts down any existing tasks and schedules them on an available node.
  • Reachable means the node is a manager node participating in the Raft consensus quorum. If the leader node becomes unavailable, the node is eligible for election as the new leader.
  • If a manager node becomes unavailable, you should either join a new manager node to the swarm or promote a worker node to be a manager.
  • ...8 more annotations...
  • docker node inspect self --pretty
  • docker node update --availability drain node
  • use node labels in service constraints
  • The labels you set for nodes using docker node update apply only to the node entity within the swarm
  • node labels can be used to limit critical tasks to nodes that meet certain requirements
  • promote a worker node to the manager role
  • demote a manager node to the worker role
  • If the last manager node leaves the swarm, the swarm becomes unavailable requiring you to take disaster recovery measures.
張 旭

Kubernetes Components | Kubernetes - 0 views

  • A Kubernetes cluster consists of a set of worker machines, called nodes, that run containerized applications
  • Every cluster has at least one worker node.
  • The control plane manages the worker nodes and the Pods in the cluster.
  • ...29 more annotations...
  • The control plane's components make global decisions about the cluster
  • Control plane components can be run on any machine in the cluster.
  • for simplicity, set up scripts typically start all control plane components on the same machine, and do not run user containers on this machine
  • The API server is the front end for the Kubernetes control plane.
  • kube-apiserver is designed to scale horizontally—that is, it scales by deploying more instances. You can run several instances of kube-apiserver and balance traffic between those instances.
  • Kubernetes cluster uses etcd as its backing store, make sure you have a back up plan for those data.
  • watches for newly created Pods with no assigned node, and selects a node for them to run on.
  • Factors taken into account for scheduling decisions include: individual and collective resource requirements, hardware/software/policy constraints, affinity and anti-affinity specifications, data locality, inter-workload interference, and deadlines.
  • each controller is a separate process, but to reduce complexity, they are all compiled into a single binary and run in a single process.
  • Node controller
  • Job controller
  • Endpoints controller
  • Service Account & Token controllers
  • The cloud controller manager lets you link your cluster into your cloud provider's API, and separates out the components that interact with that cloud platform from components that only interact with your cluster.
  • If you are running Kubernetes on your own premises, or in a learning environment inside your own PC, the cluster does not have a cloud controller manager.
  • An agent that runs on each node in the cluster. It makes sure that containers are running in a Pod.
  • The kubelet takes a set of PodSpecs that are provided through various mechanisms and ensures that the containers described in those PodSpecs are running and healthy.
  • The kubelet doesn't manage containers which were not created by Kubernetes.
  • kube-proxy is a network proxy that runs on each node in your cluster, implementing part of the Kubernetes Service concept.
  • kube-proxy maintains network rules on nodes. These network rules allow network communication to your Pods from network sessions inside or outside of your cluster.
  • kube-proxy uses the operating system packet filtering layer if there is one and it's available.
  • Kubernetes supports several container runtimes: Docker, containerd, CRI-O, and any implementation of the Kubernetes CRI (Container Runtime Interface).
  • Addons use Kubernetes resources (DaemonSet, Deployment, etc) to implement cluster features
  • namespaced resources for addons belong within the kube-system namespace.
  • all Kubernetes clusters should have cluster DNS,
  • Cluster DNS is a DNS server, in addition to the other DNS server(s) in your environment, which serves DNS records for Kubernetes services.
  • Containers started by Kubernetes automatically include this DNS server in their DNS searches.
  • Container Resource Monitoring records generic time-series metrics about containers in a central database, and provides a UI for browsing that data.
  • A cluster-level logging mechanism is responsible for saving container logs to a central log store with search/browsing interface.
張 旭

Swarm mode key concepts | Docker Documentation - 0 views

  • The cluster management and orchestration features embedded in the Docker Engine are built using SwarmKit.
  • Docker engines participating in a cluster are running in swarm mode
  • A swarm is a cluster of Docker engines, or nodes, where you deploy services
  • ...19 more annotations...
  • When you run Docker without using swarm mode, you execute container commands.
  • When you run the Docker in swarm mode, you orchestrate services.
  • You can run swarm services and standalone containers on the same Docker instances.
  • A node is an instance of the Docker engine participating in the swarm
  • You can run one or more nodes on a single physical computer or cloud server
  • To deploy your application to a swarm, you submit a service definition to a manager node.
  • Manager nodes also perform the orchestration and cluster management functions required to maintain the desired state of the swarm.
  • Manager nodes elect a single leader to conduct orchestration tasks.
  • Worker nodes receive and execute tasks dispatched from manager nodes.
  • service is the definition of the tasks to execute on the worker nodes
  • When you create a service, you specify which container image to use and which commands to execute inside running containers.
  • replicated services model, the swarm manager distributes a specific number of replica tasks among the nodes based upon the scale you set in the desired state.
  • global services, the swarm runs one task for the service on every available node in the cluster.
  • A task carries a Docker container and the commands to run inside the container
  • Manager nodes assign tasks to worker nodes according to the number of replicas set in the service scale.
  • Once a task is assigned to a node, it cannot move to another node
  • If you do not specify a port, the swarm manager assigns the service a port in the 30000-32767 range.
  • External components, such as cloud load balancers, can access the service on the PublishedPort of any node in the cluster whether or not the node is currently running the task for the service.
  • Swarm mode has an internal DNS component that automatically assigns each service in the swarm a DNS entry.
張 旭

Improving Kubernetes reliability: quicker detection of a Node down | Fatal failure - 0 views

  • when a Node gets down, the pods of the broken node are still running for some time and they still get requests, and those requests, will fail.
  • 1- The Kubelet posts its status to the masters using –node-status-update-frequency=10s 2- A node dies 3- The kube controller manager is the one monitoring the nodes, using –-node-monitor-period=5s it checks, in the masters, the node status reported by the Kubelet. 4- Kube controller manager will see the node is unresponsive, and has this grace period –node-monitor-grace-period=40s until it considers the node unhealthy.
  • node-status-update-frequency x (N-1) != node-monitor-grace-period
  • ...2 more annotations...
  • 5- Once the node is marked as unhealthy, the kube controller manager will remove its pods based on –pod-eviction-timeout=5m0s
  • 6- Kube proxy has a watcher over the API, so the very first moment the pods are evicted the proxy will notice and update the iptables of the node, removing the endpoints from the services so the failing pods won’t be accessible anymore.
張 旭

Manage swarm security with public key infrastructure (PKI) | Docker Documentation - 0 views

  • The nodes in a swarm use mutual Transport Layer Security (TLS) to authenticate, authorize, and encrypt the communications with other nodes in the swarm.
  • By default, the manager node generates a new root Certificate Authority (CA) along with a key pair, which are used to secure communications with other nodes that join the swarm.
  • The manager node also generates two tokens to use when you join additional nodes to the swarm: one worker token and one manager token.
  • ...3 more annotations...
  • Each time a new node joins the swarm, the manager issues a certificate to the node
  • By default, each node in the swarm renews its certificate every three months.
  • a cluster CA key or a manager node is compromised, you can rotate the swarm root CA so that none of the nodes trust certificates signed by the old root CA anymore.
  •  
    "The nodes in a swarm use mutual Transport Layer Security (TLS) to authenticate, authorize, and encrypt the communications with other nodes in the swarm."
張 旭

Volumes - Kubernetes - 0 views

  • On-disk files in a Container are ephemeral,
  • when a Container crashes, kubelet will restart it, but the files will be lost - the Container starts with a clean state
  • In Docker, a volume is simply a directory on disk or in another Container.
  • ...105 more annotations...
  • A Kubernetes volume, on the other hand, has an explicit lifetime - the same as the Pod that encloses it.
  • a volume outlives any Containers that run within the Pod, and data is preserved across Container restarts.
    • 張 旭
       
      Kubernetes Volume 是跟著 Pod 的生命週期在走
  • Kubernetes supports many types of volumes, and a Pod can use any number of them simultaneously.
  • To use a volume, a Pod specifies what volumes to provide for the Pod (the .spec.volumes field) and where to mount those into Containers (the .spec.containers.volumeMounts field).
  • A process in a container sees a filesystem view composed from their Docker image and volumes.
  • Volumes can not mount onto other volumes or have hard links to other volumes.
  • Each Container in the Pod must independently specify where to mount each volume
  • localnfs
  • cephfs
  • awsElasticBlockStore
  • glusterfs
  • vsphereVolume
  • An awsElasticBlockStore volume mounts an Amazon Web Services (AWS) EBS Volume into your Pod.
  • the contents of an EBS volume are preserved and the volume is merely unmounted.
  • an EBS volume can be pre-populated with data, and that data can be “handed off” between Pods.
  • create an EBS volume using aws ec2 create-volume
  • the nodes on which Pods are running must be AWS EC2 instances
  • EBS only supports a single EC2 instance mounting a volume
  • check that the size and EBS volume type are suitable for your use!
  • A cephfs volume allows an existing CephFS volume to be mounted into your Pod.
  • the contents of a cephfs volume are preserved and the volume is merely unmounted.
    • 張 旭
       
      相當於自己的 AWS EBS
  • CephFS can be mounted by multiple writers simultaneously.
  • have your own Ceph server running with the share exported
  • configMap
  • The configMap resource provides a way to inject configuration data into Pods
  • When referencing a configMap object, you can simply provide its name in the volume to reference it
  • volumeMounts: - name: config-vol mountPath: /etc/config volumes: - name: config-vol configMap: name: log-config items: - key: log_level path: log_level
  • create a ConfigMap before you can use it.
  • A Container using a ConfigMap as a subPath volume mount will not receive ConfigMap updates.
  • An emptyDir volume is first created when a Pod is assigned to a Node, and exists as long as that Pod is running on that node.
  • When a Pod is removed from a node for any reason, the data in the emptyDir is deleted forever.
  • By default, emptyDir volumes are stored on whatever medium is backing the node - that might be disk or SSD or network storage, depending on your environment.
  • you can set the emptyDir.medium field to "Memory" to tell Kubernetes to mount a tmpfs (RAM-backed filesystem)
  • volumeMounts: - mountPath: /cache name: cache-volume volumes: - name: cache-volume emptyDir: {}
  • An fc volume allows an existing fibre channel volume to be mounted in a Pod.
  • configure FC SAN Zoning to allocate and mask those LUNs (volumes) to the target WWNs beforehand so that Kubernetes hosts can access them.
  • Flocker is an open-source clustered Container data volume manager. It provides management and orchestration of data volumes backed by a variety of storage backends.
  • emptyDir
  • flocker
  • A flocker volume allows a Flocker dataset to be mounted into a Pod
  • have your own Flocker installation running
  • A gcePersistentDisk volume mounts a Google Compute Engine (GCE) Persistent Disk into your Pod.
  • Using a PD on a Pod controlled by a ReplicationController will fail unless the PD is read-only or the replica count is 0 or 1
  • A glusterfs volume allows a Glusterfs (an open source networked filesystem) volume to be mounted into your Pod.
  • have your own GlusterFS installation running
  • A hostPath volume mounts a file or directory from the host node’s filesystem into your Pod.
  • a powerful escape hatch for some applications
  • access to Docker internals; use a hostPath of /var/lib/docker
  • allowing a Pod to specify whether a given hostPath should exist prior to the Pod running, whether it should be created, and what it should exist as
  • specify a type for a hostPath volume
  • the files or directories created on the underlying hosts are only writable by root.
  • hostPath: # directory location on host path: /data # this field is optional type: Directory
  • An iscsi volume allows an existing iSCSI (SCSI over IP) volume to be mounted into your Pod.
  • have your own iSCSI server running
  • A feature of iSCSI is that it can be mounted as read-only by multiple consumers simultaneously.
  • A local volume represents a mounted local storage device such as a disk, partition or directory.
  • Local volumes can only be used as a statically created PersistentVolume.
  • Compared to hostPath volumes, local volumes can be used in a durable and portable manner without manually scheduling Pods to nodes, as the system is aware of the volume’s node constraints by looking at the node affinity on the PersistentVolume.
  • If a node becomes unhealthy, then the local volume will also become inaccessible, and a Pod using it will not be able to run.
  • PersistentVolume spec using a local volume and nodeAffinity
  • PersistentVolume nodeAffinity is required when using local volumes. It enables the Kubernetes scheduler to correctly schedule Pods using local volumes to the correct node.
  • PersistentVolume volumeMode can now be set to “Block” (instead of the default value “Filesystem”) to expose the local volume as a raw block device.
  • When using local volumes, it is recommended to create a StorageClass with volumeBindingMode set to WaitForFirstConsumer
  • An nfs volume allows an existing NFS (Network File System) share to be mounted into your Pod.
  • NFS can be mounted by multiple writers simultaneously.
  • have your own NFS server running with the share exported
  • A persistentVolumeClaim volume is used to mount a PersistentVolume into a Pod.
  • PersistentVolumes are a way for users to “claim” durable storage (such as a GCE PersistentDisk or an iSCSI volume) without knowing the details of the particular cloud environment.
  • A projected volume maps several existing volume sources into the same directory.
  • All sources are required to be in the same namespace as the Pod. For more details, see the all-in-one volume design document.
  • Each projected volume source is listed in the spec under sources
  • A Container using a projected volume source as a subPath volume mount will not receive updates for those volume sources.
  • RBD volumes can only be mounted by a single consumer in read-write mode - no simultaneous writers allowed
  • A secret volume is used to pass sensitive information, such as passwords, to Pods
  • store secrets in the Kubernetes API and mount them as files for use by Pods
  • secret volumes are backed by tmpfs (a RAM-backed filesystem) so they are never written to non-volatile storage.
  • create a secret in the Kubernetes API before you can use it
  • A Container using a Secret as a subPath volume mount will not receive Secret updates.
  • StorageOS runs as a Container within your Kubernetes environment, making local or attached storage accessible from any node within the Kubernetes cluster.
  • Data can be replicated to protect against node failure. Thin provisioning and compression can improve utilization and reduce cost.
  • StorageOS provides block storage to Containers, accessible via a file system.
  • A vsphereVolume is used to mount a vSphere VMDK Volume into your Pod.
  • supports both VMFS and VSAN datastore.
  • create VMDK using one of the following methods before using with Pod.
  • share one volume for multiple uses in a single Pod.
  • The volumeMounts.subPath property can be used to specify a sub-path inside the referenced volume instead of its root.
  • volumeMounts: - name: workdir1 mountPath: /logs subPathExpr: $(POD_NAME)
  • env: - name: POD_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.name
  • Use the subPathExpr field to construct subPath directory names from Downward API environment variables
  • enable the VolumeSubpathEnvExpansion feature gate
  • The subPath and subPathExpr properties are mutually exclusive.
  • There is no limit on how much space an emptyDir or hostPath volume can consume, and no isolation between Containers or between Pods.
  • emptyDir and hostPath volumes will be able to request a certain amount of space using a resource specification, and to select the type of media to use, for clusters that have several media types.
  • the Container Storage Interface (CSI) and Flexvolume. They enable storage vendors to create custom storage plugins without adding them to the Kubernetes repository.
  • all volume plugins (like volume types listed above) were “in-tree” meaning they were built, linked, compiled, and shipped with the core Kubernetes binaries and extend the core Kubernetes API.
  • Container Storage Interface (CSI) defines a standard interface for container orchestration systems (like Kubernetes) to expose arbitrary storage systems to their container workloads.
  • Once a CSI compatible volume driver is deployed on a Kubernetes cluster, users may use the csi volume type to attach, mount, etc. the volumes exposed by the CSI driver.
  • The csi volume type does not support direct reference from Pod and may only be referenced in a Pod via a PersistentVolumeClaim object.
  • This feature requires CSIInlineVolume feature gate to be enabled:--feature-gates=CSIInlineVolume=true
  • In-tree plugins that support CSI Migration and have a corresponding CSI driver implemented are listed in the “Types of Volumes” section above.
  • Mount propagation allows for sharing volumes mounted by a Container to other Containers in the same Pod, or even to other Pods on the same node.
  • Mount propagation of a volume is controlled by mountPropagation field in Container.volumeMounts.
  • HostToContainer - This volume mount will receive all subsequent mounts that are mounted to this volume or any of its subdirectories.
  • Bidirectional - This volume mount behaves the same the HostToContainer mount. In addition, all volume mounts created by the Container will be propagated back to the host and to all Containers of all Pods that use the same volume.
  • Edit your Docker’s systemd service file. Set MountFlags as follows:MountFlags=shared
張 旭

Creating a cluster with kubeadm | Kubernetes - 0 views

  • (Recommended) If you have plans to upgrade this single control-plane kubeadm cluster to high availability you should specify the --control-plane-endpoint to set the shared endpoint for all control-plane nodes
  • set the --pod-network-cidr to a provider-specific value.
  • kubeadm tries to detect the container runtime by using a list of well known endpoints.
  • ...12 more annotations...
  • kubeadm uses the network interface associated with the default gateway to set the advertise address for this particular control-plane node's API server. To use a different network interface, specify the --apiserver-advertise-address=<ip-address> argument to kubeadm init
  • Do not share the admin.conf file with anyone and instead grant users custom permissions by generating them a kubeconfig file using the kubeadm kubeconfig user command.
  • The token is used for mutual authentication between the control-plane node and the joining nodes. The token included here is secret. Keep it safe, because anyone with this token can add authenticated nodes to your cluster.
  • You must deploy a Container Network Interface (CNI) based Pod network add-on so that your Pods can communicate with each other. Cluster DNS (CoreDNS) will not start up before a network is installed.
  • Take care that your Pod network must not overlap with any of the host networks
  • Make sure that your Pod network plugin supports RBAC, and so do any manifests that you use to deploy it.
  • You can install only one Pod network per cluster.
  • The cluster created here has a single control-plane node, with a single etcd database running on it.
  • The node-role.kubernetes.io/control-plane label is such a restricted label and kubeadm manually applies it using a privileged client after a node has been created.
  • By default, your cluster will not schedule Pods on the control plane nodes for security reasons.
  • kubectl taint nodes --all node-role.kubernetes.io/control-plane-
  • remove the node-role.kubernetes.io/control-plane:NoSchedule taint from any nodes that have it, including the control plane nodes, meaning that the scheduler will then be able to schedule Pods everywhere.
張 旭

Logging Architecture | Kubernetes - 0 views

  • Application logs can help you understand what is happening inside your application
  • container engines are designed to support logging.
  • The easiest and most adopted logging method for containerized applications is writing to standard output and standard error streams.
  • ...26 more annotations...
  • In a cluster, logs should have a separate storage and lifecycle independent of nodes, pods, or containers. This concept is called cluster-level logging.
  • Cluster-level logging architectures require a separate backend to store, analyze, and query logs
  • Kubernetes does not provide a native storage solution for log data.
  • use kubectl logs --previous to retrieve logs from a previous instantiation of a container.
  • A container engine handles and redirects any output generated to a containerized application's stdout and stderr streams
  • The Docker JSON logging driver treats each line as a separate message.
  • By default, if a container restarts, the kubelet keeps one terminated container with its logs.
  • An important consideration in node-level logging is implementing log rotation, so that logs don't consume all available storage on the node
  • You can also set up a container runtime to rotate an application's logs automatically.
  • The two kubelet flags container-log-max-size and container-log-max-files can be used to configure the maximum size for each log file and the maximum number of files allowed for each container respectively.
  • The kubelet and container runtime do not run in containers.
  • On machines with systemd, the kubelet and container runtime write to journald. If systemd is not present, the kubelet and container runtime write to .log files in the /var/log directory.
  • System components inside containers always write to the /var/log directory, bypassing the default logging mechanism.
  • Kubernetes does not provide a native solution for cluster-level logging
  • Use a node-level logging agent that runs on every node.
  • implement cluster-level logging by including a node-level logging agent on each node.
  • the logging agent is a container that has access to a directory with log files from all of the application containers on that node.
  • the logging agent must run on every node, it is recommended to run the agent as a DaemonSet
  • Node-level logging creates only one agent per node and doesn't require any changes to the applications running on the node.
  • Containers write stdout and stderr, but with no agreed format. A node-level agent collects these logs and forwards them for aggregation.
  • Each sidecar container prints a log to its own stdout or stderr stream.
  • It is not recommended to write log entries with different formats to the same log stream
  • writing logs to a file and then streaming them to stdout can double disk usage.
  • If you have an application that writes to a single file, it's recommended to set /dev/stdout as the destination
  • it's recommended to use stdout and stderr directly and leave rotation and retention policies to the kubelet.
  • Using a logging agent in a sidecar container can lead to significant resource consumption. Moreover, you won't be able to access those logs using kubectl logs because they are not controlled by the kubelet.
張 旭

Considerations for large clusters | Kubernetes - 0 views

  • A cluster is a set of nodes (physical or virtual machines) running Kubernetes agents, managed by the control plane.
  • Kubernetes v1.23 supports clusters with up to 5000 nodes.
  • criteria: No more than 110 pods per node No more than 5000 nodes No more than 150000 total pods No more than 300000 total containers
  • ...14 more annotations...
  • In-use IP addresses
  • run one or two control plane instances per failure zone, scaling those instances vertically first and then scaling horizontally after reaching the point of falling returns to (vertical) scale.
  • Kubernetes nodes do not automatically steer traffic towards control-plane endpoints that are in the same failure zone
  • store Event objects in a separate dedicated etcd instance.
  • start and configure additional etcd instance
  • Kubernetes resource limits help to minimize the impact of memory leaks and other ways that pods and containers can impact on other components.
  • Addons' default limits are typically based on data collected from experience running each addon on small or medium Kubernetes clusters.
  • When running on large clusters, addons often consume more of some resources than their default limits.
  • Many addons scale horizontally - you add capacity by running more pods
  • The VerticalPodAutoscaler can run in recommender mode to provide suggested figures for requests and limits.
  • Some addons run as one copy per node, controlled by a DaemonSet: for example, a node-level log aggregator.
  • VerticalPodAutoscaler is a custom resource that you can deploy into your cluster to help you manage resource requests and limits for pods.
  • The cluster autoscaler integrates with a number of cloud providers to help you run the right number of nodes for the level of resource demand in your cluster.
  • The addon resizer helps you in resizing the addons automatically as your cluster's scale changes.
張 旭

Boosting your kubectl productivity ♦︎ Learnk8s - 0 views

  • kubectl is your cockpit to control Kubernetes.
  • kubectl is a client for the Kubernetes API
  • Kubernetes API is an HTTP REST API.
  • ...75 more annotations...
  • This API is the real Kubernetes user interface.
  • Kubernetes is fully controlled through this API
  • every Kubernetes operation is exposed as an API endpoint and can be executed by an HTTP request to this endpoint.
  • the main job of kubectl is to carry out HTTP requests to the Kubernetes API
  • Kubernetes maintains an internal state of resources, and all Kubernetes operations are CRUD operations on these resources.
  • Kubernetes is a fully resource-centred system
  • Kubernetes API reference is organised as a list of resource types with their associated operations.
  • This is how kubectl works for all commands that interact with the Kubernetes cluster.
  • kubectl simply makes HTTP requests to the appropriate Kubernetes API endpoints.
  • it's totally possible to control Kubernetes with a tool like curl by manually issuing HTTP requests to the Kubernetes API.
  • Kubernetes consists of a set of independent components that run as separate processes on the nodes of a cluster.
  • components on the master nodes
  • Storage backend: stores resource definitions (usually etcd is used)
  • API server: provides Kubernetes API and manages storage backend
  • Controller manager: ensures resource statuses match specifications
  • Scheduler: schedules Pods to worker nodes
  • component on the worker nodes
  • Kubelet: manages execution of containers on a worker node
  • triggers the ReplicaSet controller, which is a sub-process of the controller manager.
  • the scheduler, who watches for Pod definitions that are not yet scheduled to a worker node.
  • creating and updating resources in the storage backend on the master node.
  • The kubelet of the worker node your ReplicaSet Pods have been scheduled to instructs the configured container runtime (which may be Docker) to download the required container images and run the containers.
  • Kubernetes components (except the API server and the storage backend) work by watching for resource changes in the storage backend and manipulating resources in the storage backend.
  • However, these components do not access the storage backend directly, but only through the Kubernetes API.
    • 張 旭
       
      很精彩,相互之間都是使用 API call 溝通,良好的微服務行為。
  • double usage of the Kubernetes API for internal components as well as for external users is a fundamental design concept of Kubernetes.
  • All other Kubernetes components and users read, watch, and manipulate the state (i.e. resources) of Kubernetes through the Kubernetes API
  • The storage backend stores the state (i.e. resources) of Kubernetes.
  • command completion is a shell feature that works by the means of a completion script.
  • A completion script is a shell script that defines the completion behaviour for a specific command. Sourcing a completion script enables completion for the corresponding command.
  • kubectl completion zsh
  • /etc/bash_completion.d directory (create it, if it doesn't exist)
  • source <(kubectl completion bash)
  • source <(kubectl completion zsh)
  • autoload -Uz compinit compinit
  • the API reference, which contains the full specifications of all resources.
  • kubectl api-resources
  • displays the resource names in their plural form (e.g. deployments instead of deployment). It also displays the shortname (e.g. deploy) for those resources that have one. Don't worry about these differences. All of these name variants are equivalent for kubectl.
  • .spec
  • custom columns output format comes in. It lets you freely define the columns and the data to display in them. You can choose any field of a resource to be displayed as a separate column in the output
  • kubectl get pods -o custom-columns='NAME:metadata.name,NODE:spec.nodeName'
  • kubectl explain pod.spec.
  • kubectl explain pod.metadata.
  • browse the resource specifications and try it out with any fields you like!
  • JSONPath is a language to extract data from JSON documents (it is similar to XPath for XML).
  • with kubectl explain, only a subset of the JSONPath capabilities is supported
  • Many fields of Kubernetes resources are lists, and this operator allows you to select items of these lists. It is often used with a wildcard as [*] to select all items of the list.
  • kubectl get pods -o custom-columns='NAME:metadata.name,IMAGES:spec.containers[*].image'
  • a Pod may contain more than one container.
  • The availability zones for each node are obtained through the special failure-domain.beta.kubernetes.io/zone label.
  • kubectl get nodes -o yaml kubectl get nodes -o json
  • The default kubeconfig file is ~/.kube/config
  • with multiple clusters, then you have connection parameters for multiple clusters configured in your kubeconfig file.
  • Within a cluster, you can set up multiple namespaces (a namespace is kind of "virtual" clusters within a physical cluster)
  • overwrite the default kubeconfig file with the --kubeconfig option for every kubectl command.
  • Namespace: the namespace to use when connecting to the cluster
  • a one-to-one mapping between clusters and contexts.
  • When kubectl reads a kubeconfig file, it always uses the information from the current context.
  • just change the current context in the kubeconfig file
  • to switch to another namespace in the same cluster, you can change the value of the namespace element of the current context
  • kubectl also provides the --cluster, --user, --namespace, and --context options that allow you to overwrite individual elements and the current context itself, regardless of what is set in the kubeconfig file.
  • for switching between clusters and namespaces is kubectx.
  • kubectl config get-contexts
  • just have to download the shell scripts named kubectl-ctx and kubectl-ns to any directory in your PATH and make them executable (for example, with chmod +x)
  • kubectl proxy
  • kubectl get roles
  • kubectl get pod
  • Kubectl plugins are distributed as simple executable files with a name of the form kubectl-x. The prefix kubectl- is mandatory,
  • To install a plugin, you just have to copy the kubectl-x file to any directory in your PATH and make it executable (for example, with chmod +x)
  • krew itself is a kubectl plugin
  • check out the kubectl-plugins GitHub topic
  • The executable can be of any type, a Bash script, a compiled Go program, a Python script, it really doesn't matter. The only requirement is that it can be directly executed by the operating system.
  • kubectl plugins can be written in any programming or scripting language.
  • you can write more sophisticated plugins with real programming languages, for example, using a Kubernetes client library. If you use Go, you can also use the cli-runtime library, which exists specifically for writing kubectl plugins.
  • a kubeconfig file consists of a set of contexts
  • changing the current context means changing the cluster, if you have only a single context per cluster.
張 旭

Options for Highly Available Topology | Kubernetes - 0 views

  • With stacked control plane nodes, where etcd nodes are colocated with control plane nodes
  • A stacked HA cluster is a topology where the distributed data storage cluster provided by etcd is stacked on top of the cluster formed by the nodes managed by kubeadm that run control plane components.
  • Each control plane node runs an instance of the kube-apiserver, kube-scheduler, and kube-controller-manager
  • ...6 more annotations...
  • Each control plane node creates a local etcd member and this etcd member communicates only with the kube-apiserver of this node.
  • This topology couples the control planes and etcd members on the same nodes.
  • a stacked cluster runs the risk of failed coupling. If one node goes down, both an etcd member and a control plane instance are lost
  • An HA cluster with external etcd is a topology where the distributed data storage cluster provided by etcd is external to the cluster formed by the nodes that run control plane components.
  • etcd members run on separate hosts, and each etcd host communicates with the kube-apiserver of each control plane node.
  • This topology decouples the control plane and etcd member. It therefore provides an HA setup where losing a control plane instance or an etcd member has less impact and does not affect the cluster redundancy as much as the stacked HA topology.
張 旭

Monitor Node Health | Kubernetes - 0 views

  • Node Problem Detector is a daemon for monitoring and reporting about a node's health
  • Node Problem Detector collects information about node problems from various daemons and reports these conditions to the API server as NodeCondition and Event.
  • Node Problem Detector only supports file based kernel log. Log tools such as journald are not supported.
  • ...2 more annotations...
  • kubectl provides the most flexible management of Node Problem Detector.
  • run the Node Problem Detector in your cluster to monitor node health.
張 旭

Cluster Networking - Kubernetes - 0 views

  • Networking is a central part of Kubernetes, but it can be challenging to understand exactly how it is expected to work
  • Highly-coupled container-to-container communications
  • Pod-to-Pod communications
  • ...57 more annotations...
  • this is the primary focus of this document
    • 張 旭
       
      Cluster Networking 所關注處理的是: Pod 到 Pod 之間的連線
  • Pod-to-Service communications
  • External-to-Service communications
  • Kubernetes is all about sharing machines between applications.
  • sharing machines requires ensuring that two applications do not try to use the same ports.
  • Dynamic port allocation brings a lot of complications to the system
  • Every Pod gets its own IP address
  • do not need to explicitly create links between Pods
  • almost never need to deal with mapping container ports to host ports.
  • Pods can be treated much like VMs or physical hosts from the perspectives of port allocation, naming, service discovery, load balancing, application configuration, and migration.
  • pods on a node can communicate with all pods on all nodes without NAT
  • agents on a node (e.g. system daemons, kubelet) can communicate with all pods on that node
  • pods in the host network of a node can communicate with all pods on all nodes without NAT
  • If your job previously ran in a VM, your VM had an IP and could talk to other VMs in your project. This is the same basic model.
  • containers within a Pod share their network namespaces - including their IP address
  • containers within a Pod can all reach each other’s ports on localhost
  • containers within a Pod must coordinate port usage
  • “IP-per-pod” model.
  • request ports on the Node itself which forward to your Pod (called host ports), but this is a very niche operation
  • The Pod itself is blind to the existence or non-existence of host ports.
  • AOS is an Intent-Based Networking system that creates and manages complex datacenter environments from a simple integrated platform.
  • Cisco Application Centric Infrastructure offers an integrated overlay and underlay SDN solution that supports containers, virtual machines, and bare metal servers.
  • AOS Reference Design currently supports Layer-3 connected hosts that eliminate legacy Layer-2 switching problems.
  • The AWS VPC CNI offers integrated AWS Virtual Private Cloud (VPC) networking for Kubernetes clusters.
  • users can apply existing AWS VPC networking and security best practices for building Kubernetes clusters.
  • Using this CNI plugin allows Kubernetes pods to have the same IP address inside the pod as they do on the VPC network.
  • The CNI allocates AWS Elastic Networking Interfaces (ENIs) to each Kubernetes node and using the secondary IP range from each ENI for pods on the node.
  • Big Cloud Fabric is a cloud native networking architecture, designed to run Kubernetes in private cloud/on-premises environments.
  • Cilium is L7/HTTP aware and can enforce network policies on L3-L7 using an identity based security model that is decoupled from network addressing.
  • CNI-Genie is a CNI plugin that enables Kubernetes to simultaneously have access to different implementations of the Kubernetes network model in runtime.
  • CNI-Genie also supports assigning multiple IP addresses to a pod, each from a different CNI plugin.
  • cni-ipvlan-vpc-k8s contains a set of CNI and IPAM plugins to provide a simple, host-local, low latency, high throughput, and compliant networking stack for Kubernetes within Amazon Virtual Private Cloud (VPC) environments by making use of Amazon Elastic Network Interfaces (ENI) and binding AWS-managed IPs into Pods using the Linux kernel’s IPvlan driver in L2 mode.
  • to be straightforward to configure and deploy within a VPC
  • Contiv provides configurable networking
  • Contrail, based on Tungsten Fabric, is a truly open, multi-cloud network virtualization and policy management platform.
  • DANM is a networking solution for telco workloads running in a Kubernetes cluster.
  • Flannel is a very simple overlay network that satisfies the Kubernetes requirements.
  • Any traffic bound for that subnet will be routed directly to the VM by the GCE network fabric.
  • sysctl net.ipv4.ip_forward=1
  • Jaguar provides overlay network using vxlan and Jaguar CNIPlugin provides one IP address per pod.
  • Knitter is a network solution which supports multiple networking in Kubernetes.
  • Kube-OVN is an OVN-based kubernetes network fabric for enterprises.
  • Kube-router provides a Linux LVS/IPVS-based service proxy, a Linux kernel forwarding-based pod-to-pod networking solution with no overlays, and iptables/ipset-based network policy enforcer.
  • If you have a “dumb” L2 network, such as a simple switch in a “bare-metal” environment, you should be able to do something similar to the above GCE setup.
  • Multus is a Multi CNI plugin to support the Multi Networking feature in Kubernetes using CRD based network objects in Kubernetes.
  • NSX-T can provide network virtualization for a multi-cloud and multi-hypervisor environment and is focused on emerging application frameworks and architectures that have heterogeneous endpoints and technology stacks.
  • NSX-T Container Plug-in (NCP) provides integration between NSX-T and container orchestrators such as Kubernetes
  • Nuage uses the open source Open vSwitch for the data plane along with a feature rich SDN Controller built on open standards.
  • OpenVSwitch is a somewhat more mature but also complicated way to build an overlay network
  • OVN is an opensource network virtualization solution developed by the Open vSwitch community.
  • Project Calico is an open source container networking provider and network policy engine.
  • Calico provides a highly scalable networking and network policy solution for connecting Kubernetes pods based on the same IP networking principles as the internet
  • Calico can be deployed without encapsulation or overlays to provide high-performance, high-scale data center networking.
  • Calico can also be run in policy enforcement mode in conjunction with other networking solutions such as Flannel, aka canal, or native GCE, AWS or Azure networking.
  • Romana is an open source network and security automation solution that lets you deploy Kubernetes without an overlay network
  • Weave Net runs as a CNI plug-in or stand-alone. In either version, it doesn’t require any configuration or extra code to run, and in both cases, the network provides one IP address per pod - as is standard for Kubernetes.
  • The network model is implemented by the container runtime on each node.
張 旭

Getting Started with MariaDB Galera Cluster - MariaDB Knowledge Base - 0 views

  • most users are not going to bootstrap a server by executing "mysqld --wsrep-new-cluster" manually.
  • galera_new_cluster
  • Prerequisites
  • ...7 more annotations...
  • Once you have a cluster running and you want to add/reconnect another node to it, you must supply an address of one of the cluster members in the cluster address URL
  • The new node only needs to connect to one of the existing members
  • It will automatically retrieve the cluster map and reconnect to the rest of the nodes
  • it's better to list all nodes of the cluster so that any node can join a cluster connecting to any other node, even if one or more are down
  • The wsrep_cluster_address parameter should be added in my.cnf of each node, listing all the nodes of the cluster,
  • the minimum recommended number of nodes in a cluster is 3
  • While two of the members will be engaged in state transfer, the remaining member(s) will be able to keep on serving client requests.
張 旭

MySQL cluster vs Galera - How to make the right choice - 0 views

  • there is no “one size fits all” solution when coming to database clustering.
  • MySQL cluster contains the data nodes that store the cluster data and management node that store the cluster’s configuration.
  • MySQL clients first communicate with the management node and then connect directly to these data nodes.
  • ...13 more annotations...
  • For synchronization of data in the data nodes, MySQL cluster uses a special data engine called NDB (Network Database).
  • it uses automatic shrading aka splitting of a large database into small units.
  • MySQL cluster avoids single point failure and ensures 99.99% availability.
  • MySQL cluster can provide a response time as low as less than 3 ms.
  • Galera Cluster consists of a database server and uses the Galera Replication Plugin to manage replication.
  • a multi-master database cluster that supports synchronous replication.
  • it provides multiple, up-to-date copies of the data.
  • there is a need for instant fail-over.
  • Galera cluster allows the read and write of data in any node.
  • Galera cluster include guaranteed write consistency, automatic node provisioning, etc.
  • Upon restoring the connection, the separated nodes will sync back and rejoin the cluster automatically.
  • there is no need to have management node like MySQL cluster.
  • it gives best results with the InnoDB storage engine.
張 旭

Production environment | Kubernetes - 0 views

  • to promote an existing cluster for production use
  • Separating the control plane from the worker nodes.
  • Having enough worker nodes available
  • ...22 more annotations...
  • You can use role-based access control (RBAC) and other security mechanisms to make sure that users and workloads can get access to the resources they need, while keeping workloads, and the cluster itself, secure. You can set limits on the resources that users and workloads can access by managing policies and container resources.
  • you need to plan how to scale to relieve increased pressure from more requests to the control plane and worker nodes or scale down to reduce unused resources.
  • Managed control plane: Let the provider manage the scale and availability of the cluster's control plane, as well as handle patches and upgrades.
  • The simplest Kubernetes cluster has the entire control plane and worker node services running on the same machine.
  • You can deploy a control plane using tools such as kubeadm, kops, and kubespray.
  • Secure communications between control plane services are implemented using certificates.
  • Certificates are automatically generated during deployment or you can generate them using your own certificate authority.
  • Separate and backup etcd service: The etcd services can either run on the same machines as other control plane services or run on separate machines
  • Create multiple control plane systems: For high availability, the control plane should not be limited to a single machine
  • Some deployment tools set up Raft consensus algorithm to do leader election of Kubernetes services. If the primary goes away, another service elects itself and take over.
  • Groups of zones are referred to as regions.
  • if you installed with kubeadm, there are instructions to help you with Certificate Management and Upgrading kubeadm clusters.
  • Production-quality workloads need to be resilient and anything they rely on needs to be resilient (such as CoreDNS).
  • Add nodes to the cluster: If you are managing your own cluster you can add nodes by setting up your own machines and either adding them manually or having them register themselves to the cluster’s apiserver.
  • Set up node health checks: For important workloads, you want to make sure that the nodes and pods running on those nodes are healthy.
  • Authentication: The apiserver can authenticate users using client certificates, bearer tokens, an authenticating proxy, or HTTP basic auth.
  • Authorization: When you set out to authorize your regular users, you will probably choose between RBAC and ABAC authorization.
  • Role-based access control (RBAC): Lets you assign access to your cluster by allowing specific sets of permissions to authenticated users. Permissions can be assigned for a specific namespace (Role) or across the entire cluster (ClusterRole).
  • Attribute-based access control (ABAC): Lets you create policies based on resource attributes in the cluster and will allow or deny access based on those attributes.
  • Set limits on workload resources
  • Set namespace limits: Set per-namespace quotas on things like memory and CPU
  • Prepare for DNS demand: If you expect workloads to massively scale up, your DNS service must be ready to scale up as well.
張 旭

Deploy services to a swarm | Docker Documentation - 0 views

  • Swarm services use a declarative model, which means that you define the desired state of the service, and rely upon Docker to maintain this state.
  • To create a single-replica service with no extra configuration, you only need to supply the image name.
  • A service can be in a pending state if its image is unavailable
  • ...12 more annotations...
  • If your image is available on a private registry which requires login, use the --with-registry-auth flag
  • When you update a service, Docker stops its containers and restarts them with the new configuration.
  • When updating an existing service, the flag is --publish-add. There is also a --publish-rm flag to remove a port that was previously published.
  • To update the command an existing service runs, you can use the --args flag.
  • force the service to use a specific version of the image
  • If the manager can’t resolve the tag to a digest, each worker node is responsible for resolving the tag to a digest, and different nodes may use different versions of the image.
  • After you create a service, its image is never updated unless you explicitly run docker service update with the --image flag as described below.
  • When you run service update with the --image flag, the swarm manager queries Docker Hub or your private Docker registry for the digest the tag currently points to and updates the service tasks to use that digest.
  • You can publish a service task’s port directly on the swarm node where that service is running.
  • You can rely on the routing mesh. When you publish a service port, the swarm makes the service accessible at the target port on every node, regardless of whether there is a task for the service running on that node or not.
  • To publish a service’s ports externally to the swarm, use the --publish <PUBLISHED-PORT>:<SERVICE-PORT> flag.
  • published port on every swarm node
張 旭

[Kubernetes] Taints and Tolerations | 小信豬的原始部落 - 0 views

  • 如果有特定的 node 被加上了 taint(汙點),pod 就不會被分派到上面,除非 pod spec 有設定 toleration(容忍) 來接受這些 taint (必須全部 taint 都接受才行)
  • 假設某個 node 被設定了 effect 為 NoExecute 的 taint,那 k8s 還會把已經存在該 node 上的 pod 趕走,也不會把該 pod 分派到該 node 上。
  • taint 機制設計的目的,就是不要讓 pod 被分派到某個 node 上
  • ...1 more annotation...
  • 當 node 發生問題時(或是任何其他會造成該 node 無法繼續提供服務的情況),管理者需要考慮驅逐目前在上面運行中的 pod,可以透過加上 taint(Effect=NoExecute) 的方式達成
張 旭

Flynn: first preview release | Hacker News - 0 views

  • Etcd and Zookeeper provide essentially the same functionality. They are both a strongly consistent key/value stores that support notifications to clients of changes. These two projects are limited to service discovery
  • So lets say you had a client application that would talk to a node application that could be on any number of servers. What you could do is hard code that list into your application and randomly select one, in order to "fake" load balancing. However every time a machine went up or down you would have to update that list.
  • What Consul provides is you just tell your app to connect to "mynodeapp.consul" and then consul will give you the proper address of one of your node apps.
  • ...9 more annotations...
  • Consul and Skydock are both applications that build on top of a tool like Zookeeper and Etcd.
  • What a developer ideally wants to do is just push code and not have to worry about what servers are running what, and worry about failover and the like
  • What Flynn provides (if I get it), is a diy Heroku like platform
  • Raft is a consensus algorithm for keeping a set of distributed state machines in a consistent state.
  • a self hosted Heroku
  • Google Omega is Google's answer to Apache Mesos
  • Omega would need a service like Raft to understand what services are currently available
  • Another project that I believe may be similar to Flynn is Apache Mesos.
  • I want to use Docker, but it has no easy way to say "take this file that contains instructions and make everything". You can write Dockerfiles, but you can only use one part of the stack in them, otherwise you run into trouble.
  •  
    " So lets say you had a client application that would talk to a node application that could be on any number of servers. What you could do is hard code that list into your application and randomly select one, in order to "fake" load balancing. However every time a machine went up or down you would have to update that list. What Consul provides is you just tell your app to connect to "mynodeapp.consul" and then consul will give you the proper address of one of your node apps."
1 - 20 of 85 Next › Last »
Showing 20 items per page