Skip to main content

Home/ Larvata/ Group items tagged data

Rss Feed Group items tagged

張 旭

Kubernetes Components | Kubernetes - 0 views

  • A Kubernetes cluster consists of a set of worker machines, called nodes, that run containerized applications
  • Every cluster has at least one worker node.
  • The control plane manages the worker nodes and the Pods in the cluster.
  • ...29 more annotations...
  • The control plane's components make global decisions about the cluster
  • Control plane components can be run on any machine in the cluster.
  • for simplicity, set up scripts typically start all control plane components on the same machine, and do not run user containers on this machine
  • The API server is the front end for the Kubernetes control plane.
  • kube-apiserver is designed to scale horizontally—that is, it scales by deploying more instances. You can run several instances of kube-apiserver and balance traffic between those instances.
  • Kubernetes cluster uses etcd as its backing store, make sure you have a back up plan for those data.
  • watches for newly created Pods with no assigned node, and selects a node for them to run on.
  • Factors taken into account for scheduling decisions include: individual and collective resource requirements, hardware/software/policy constraints, affinity and anti-affinity specifications, data locality, inter-workload interference, and deadlines.
  • each controller is a separate process, but to reduce complexity, they are all compiled into a single binary and run in a single process.
  • Node controller
  • Job controller
  • Endpoints controller
  • Service Account & Token controllers
  • The cloud controller manager lets you link your cluster into your cloud provider's API, and separates out the components that interact with that cloud platform from components that only interact with your cluster.
  • If you are running Kubernetes on your own premises, or in a learning environment inside your own PC, the cluster does not have a cloud controller manager.
  • An agent that runs on each node in the cluster. It makes sure that containers are running in a Pod.
  • The kubelet takes a set of PodSpecs that are provided through various mechanisms and ensures that the containers described in those PodSpecs are running and healthy.
  • The kubelet doesn't manage containers which were not created by Kubernetes.
  • kube-proxy is a network proxy that runs on each node in your cluster, implementing part of the Kubernetes Service concept.
  • kube-proxy maintains network rules on nodes. These network rules allow network communication to your Pods from network sessions inside or outside of your cluster.
  • kube-proxy uses the operating system packet filtering layer if there is one and it's available.
  • Kubernetes supports several container runtimes: Docker, containerd, CRI-O, and any implementation of the Kubernetes CRI (Container Runtime Interface).
  • Addons use Kubernetes resources (DaemonSet, Deployment, etc) to implement cluster features
  • namespaced resources for addons belong within the kube-system namespace.
  • all Kubernetes clusters should have cluster DNS,
  • Cluster DNS is a DNS server, in addition to the other DNS server(s) in your environment, which serves DNS records for Kubernetes services.
  • Containers started by Kubernetes automatically include this DNS server in their DNS searches.
  • Container Resource Monitoring records generic time-series metrics about containers in a central database, and provides a UI for browsing that data.
  • A cluster-level logging mechanism is responsible for saving container logs to a central log store with search/browsing interface.
張 旭

[Elasticsearch] 分散式特性 & 分散式搜尋的機制 | 小信豬的原始部落 - 0 views

  • 水平擴展儲存空間
  • Data HA:若有 node 掛掉,資料不會遺失
  • 若是要查詢 cluster 中的 node 狀態,可以使用 GET /_cat/nodes API
  • ...39 more annotations...
  • 決定每個 shard 要被分配到哪個 data node 上
  • 為 cluster 設置多個 master node
  • 一旦發現被選中的 master node 出現問題,就會選出新的 master node
  • 每個 node 啟動時就預設是一個 master eligible node,可以透過設定 node.master: false 取消此預設設定
  • 處理 request 的 node 稱為 Coordinating Node,其功能是將 request 轉發到合適的 node 上
  • 所有的 node 都預設是 Coordinating Node
  • coordinating node 可以直接接收 search request 並處理,不需要透過 master node 轉過來
  • 可以保存資料的 node,每個 node 啟動後都會預設是 data node,可以透過設定 node.data: false 停用 data node 功能
  • 由 master node 決定如何把分片分發到不同的 data node 上
  • 每個 node 上都保存了 cluster state
  • 只有 master 才可以修改 cluster state 並負責同步給其他 node
  • 每個 node 都會詳細紀錄本身的狀態資訊
  • shard 是 Elasticsearch 分散式儲存的基礎,包含 primary shard & replica shard
  • 每一個 shard 就是一個 Lucene instance
  • primary shard 功能是將一份被索引後的資料,分散到多個 data node 上存放,實現儲存方面的水平擴展
  • primary shard 的數量在建立 index 時就會指定,後續是無法修改的,若要修改就必須要進行 reindex
  • 當 primary shard 遺失時,replica shard 就可以被 promote 成 primary shard 來保持資料完整性
  • replica shard 數量可以動態調整,讓每個 data node 上都有完整的資料
  • ES 7.0 開始,primary shard 預設為 1,replica shard 預設為 0
  • replica shard 若設定過多,會降低 cluster 整體的寫入效能
  • replica shard 必須和 primary shard 被分配在不同的 data node 上
  • 所有的 primary shard 可以在同一個 data node 上
  • 透過 GET _cluster/health/<target> 可以取得目前 cluster 的健康狀態
  • Yellow:表示 primary shard 可以正常分配,但 replica shard 分配有問題
  • 透過 GET /_cat/shards/<target> 可以取得目前的 shard 狀態
  • replica shard 無法被分配,因此 cluster 健康狀態為黃色
  • 若是擔心 reboot 機器造成 failover 動作開始執行,可以設定將 replication 延遲一段時間後再執行(透過調整 settings 中的 index.unassigned.node_left.delayed_timeout 參數),避免無謂的 data copy 動作 (此功能稱為 delay allocation)
  • 集群變紅,代表有 primary shard 丟失,這個時候會影響讀寫。
  • 如果 node 重新回來,會從 translog 中恢復沒有寫入的資料
  • 設定 index settings 之後,primary shard 數量無法隨意變更
  • 不建議直接發送請求到master節點,雖然也會工作,但是大量請求發送到 master,會有潛在的性能問題
  • shard 是 ES 中最小的工作單元
  • shard 是一個 Lucene 的 index
  • 將 Index Buffer 中的內容寫入 Segment,而這寫入的過程就稱為 Refresh
  • 當 document 被 refresh 進入到 segment 之後,就可以被搜尋到了
  • 在進行 refresh 時先將 segment 寫入 cache 以開放查詢
  • 將 document 進行索引時,同時也會寫入 transaction log,且預設都會寫入磁碟中
  • 每個 shard 都會有對應的 transaction log
  • 由於 transaction log 都會寫入磁碟中,因此當 node 從故障中恢復時,就會優先讀取 transaction log 來恢復資料
張 旭

Data Sources - Configuration Language | Terraform | HashiCorp Developer - 0 views

  • Each provider may offer data sources alongside its set of resource types.
  • When distinguishing from data resources, the primary kind of resource (as declared by a resource block) is known as a managed resource.
  • Each data resource is associated with a single data source, which determines the kind of object (or objects) it reads and what query constraint arguments are available.
  • ...4 more annotations...
  • Terraform reads data resources during the planning phase when possible, but announces in the plan when it must defer reading resources until the apply phase to preserve the order of operations.
  • local-only data sources exist for rendering templates, reading local files, and rendering AWS IAM policies.
  • As with managed resources, when count or for_each is present it is important to distinguish the resource itself from the multiple resource instances it creates. Each instance will separately read from its data source with its own variant of the constraint arguments, producing an indexed result.
  • Data instance arguments may refer to computed values, in which case the attributes of the instance itself cannot be resolved until all of its arguments are defined. I
張 旭

Persisting Data in Workflows: When to Use Caching, Artifacts, and Workspaces - CircleCI - 0 views

  • Repeatability is also important
  • When a CI process isn’t repeatable you’ll find yourself wasting time re-running jobs to get them to go green.
  • Workspaces persist data between jobs in a single Workflow.
  • ...9 more annotations...
  • Caching persists data between the same job in different Workflow builds.
  • Artifacts persist data after a Workflow has finished
  • When a Workspace is declared in a job, one or more files or directories can be added. Each addition creates a new layer in the Workspace filesystem. Downstreams jobs can then use this Workspace for its own needs or add more layers on top.
  • Unlike caching, Workspaces are not shared between runs as they no longer exists once a Workflow is complete.
  • Caching lets you reuse the data from expensive fetch operations from previous jobs.
  • A prime example is package dependency managers such as Yarn, Bundler, or Pip.
  • Caches are global within a project, a cache saved on one branch will be used by others so they should only be used for data that is OK to share across Branches
  • Artifacts are used for longer-term storage of the outputs of your build process.
  • If your project needs to be packaged in some form or fashion, say an Android app where the .apk file is uploaded to Google Play, that’s a great example of an artifact.
  •  
    "CircleCI 2.0 provides a number of different ways to move data into and out of jobs, persist data, and with the introduction of Workspaces, move data between jobs"
張 旭

Secrets - Kubernetes - 0 views

  • Putting this information in a secret is safer and more flexible than putting it verbatim in a PodThe smallest and simplest Kubernetes object. A Pod represents a set of running containers on your cluster. definition or in a container imageStored instance of a container that holds a set of software needed to run an application. .
  • A Secret is an object that contains a small amount of sensitive data such as a password, a token, or a key.
  • Users can create secrets, and the system also creates some secrets.
  • ...63 more annotations...
  • To use a secret, a pod needs to reference the secret.
  • A secret can be used with a pod in two ways: as files in a volumeA directory containing data, accessible to the containers in a pod. mounted on one or more of its containers, or used by kubelet when pulling images for the pod.
  • --from-file
  • You can also create a Secret in a file first, in json or yaml format, and then create that object.
  • The Secret contains two maps: data and stringData.
  • The data field is used to store arbitrary data, encoded using base64.
  • Kubernetes automatically creates secrets which contain credentials for accessing the API and it automatically modifies your pods to use this type of secret.
  • kubectl get and kubectl describe avoid showing the contents of a secret by default.
  • stringData field is provided for convenience, and allows you to provide secret data as unencoded strings.
  • where you are deploying an application that uses a Secret to store a configuration file, and you want to populate parts of that configuration file during your deployment process.
  • a field is specified in both data and stringData, the value from stringData is used.
  • The keys of data and stringData must consist of alphanumeric characters, ‘-’, ‘_’ or ‘.’.
  • Newlines are not valid within these strings and must be omitted.
  • When using the base64 utility on Darwin/macOS users should avoid using the -b option to split long lines.
  • create a Secret from generators and then apply it to create the object on the Apiserver.
  • The generated Secrets name has a suffix appended by hashing the contents.
  • base64 --decode
  • Secrets can be mounted as data volumes or be exposed as environment variablesContainer environment variables are name=value pairs that provide useful information into containers running in a Pod. to be used by a container in a pod.
  • Multiple pods can reference the same secret.
  • Each key in the secret data map becomes the filename under mountPath
  • each container needs its own volumeMounts block, but only one .spec.volumes is needed per secret
  • use .spec.volumes[].secret.items field to change target path of each key:
  • If .spec.volumes[].secret.items is used, only keys specified in items are projected. To consume all keys from the secret, all of them must be listed in the items field.
  • You can also specify the permission mode bits files part of a secret will have. If you don’t specify any, 0644 is used by default.
  • JSON spec doesn’t support octal notation, so use the value 256 for 0400 permissions.
  • Inside the container that mounts a secret volume, the secret keys appear as files and the secret values are base-64 decoded and stored inside these files.
  • Mounted Secrets are updated automatically
  • Kubelet is checking whether the mounted secret is fresh on every periodic sync.
  • cache propagation delay depends on the chosen cache type
  • A container using a Secret as a subPath volume mount will not receive Secret updates.
  • Multiple pods can reference the same secret.
  • env: - name: SECRET_USERNAME valueFrom: secretKeyRef: name: mysecret key: username
  • Inside a container that consumes a secret in an environment variables, the secret keys appear as normal environment variables containing the base-64 decoded values of the secret data.
  • An imagePullSecret is a way to pass a secret that contains a Docker (or other) image registry password to the Kubelet so it can pull a private image on behalf of your Pod.
  • a secret needs to be created before any pods that depend on it.
  • Secret API objects reside in a namespaceAn abstraction used by Kubernetes to support multiple virtual clusters on the same physical cluster. . They can only be referenced by pods in that same namespace.
  • Individual secrets are limited to 1MiB in size.
  • Kubelet only supports use of secrets for Pods it gets from the API server.
  • Secrets must be created before they are consumed in pods as environment variables unless they are marked as optional.
  • References to Secrets that do not exist will prevent the pod from starting.
  • References via secretKeyRef to keys that do not exist in a named Secret will prevent the pod from starting.
  • Once a pod is scheduled, the kubelet will try to fetch the secret value.
  • Think carefully before sending your own ssh keys: other users of the cluster may have access to the secret.
  • volumes: - name: secret-volume secret: secretName: ssh-key-secret
  • Special characters such as $, \*, and ! require escaping. If the password you are using has special characters, you need to escape them using the \\ character.
  • You do not need to escape special characters in passwords from files
  • make that key begin with a dot
  • Dotfiles in secret volume
  • .secret-file
  • a frontend container which handles user interaction and business logic, but which cannot see the private key;
  • a signer container that can see the private key, and responds to simple signing requests from the frontend
  • When deploying applications that interact with the secrets API, access should be limited using authorization policies such as RBAC
  • watch and list requests for secrets within a namespace are extremely powerful capabilities and should be avoided
  • watch and list all secrets in a cluster should be reserved for only the most privileged, system-level components.
  • additional precautions with secret objects, such as avoiding writing them to disk where possible.
  • A secret is only sent to a node if a pod on that node requires it
  • only the secrets that a pod requests are potentially visible within its containers
  • each container in a pod has to request the secret volume in its volumeMounts for it to be visible within the container.
  • In the API server secret data is stored in etcdConsistent and highly-available key value store used as Kubernetes’ backing store for all cluster data.
  • limit access to etcd to admin users
  • Base64 encoding is not an encryption method and is considered the same as plain text.
  • A user who can create a pod that uses a secret can also see the value of that secret.
  • anyone with root on any node can read any secret from the apiserver, by impersonating the kubelet.
張 旭

Controllers | Kubernetes - 0 views

  • In robotics and automation, a control loop is a non-terminating loop that regulates the state of a system.
  • controllers are control loops that watch the state of your cluster, then make or request changes where needed
  • Each controller tries to move the current cluster state closer to the desired state.
  • ...12 more annotations...
  • A controller tracks at least one Kubernetes resource type.
  • The controller(s) for that resource are responsible for making the current state come closer to that desired state.
  • in Kubernetes, a controller will send messages to the API server that have useful side effects.
  • Built-in controllers manage state by interacting with the cluster API server.
  • By contrast with Job, some controllers need to make changes to things outside of your cluster.
  • the controller makes some change to bring about your desired state, and then reports current state back to your cluster's API server. Other control loops can observe that reported data and take their own actions.
  • As long as the controllers for your cluster are running and able to make useful changes, it doesn't matter if the overall state is stable or not.
  • Kubernetes uses lots of controllers that each manage a particular aspect of cluster state.
  • a particular control loop (controller) uses one kind of resource as its desired state, and has a different kind of resource that it manages to make that desired state happen.
  • There can be several controllers that create or update the same kind of object.
  • you can have Deployments and Jobs; these both create Pods. The Job controller does not delete the Pods that your Deployment created, because there is information (labels) the controllers can use to tell those Pods apart.
  • Kubernetes comes with a set of built-in controllers that run inside the kube-controller-manager.
  •  
    "In robotics and automation, a control loop is a non-terminating loop that regulates the state of a system. "
張 旭

Container Runtimes | Kubernetes - 0 views

  • Kubernetes releases before v1.24 included a direct integration with Docker Engine, using a component named dockershim. That special direct integration is no longer part of Kubernetes
  • You need to install a container runtime into each node in the cluster so that Pods can run there.
  • Kubernetes 1.26 requires that you use a runtime that conforms with the Container Runtime Interface (CRI).
  • ...9 more annotations...
  • On Linux, control groups are used to constrain resources that are allocated to processes.
  • Both kubelet and the underlying container runtime need to interface with control groups to enforce resource management for pods and containers and set resources such as cpu/memory requests and limits.
  • When the cgroupfs driver is used, the kubelet and the container runtime directly interface with the cgroup filesystem to configure cgroups.
  • The cgroupfs driver is not recommended when systemd is the init system
  • When systemd is chosen as the init system for a Linux distribution, the init process generates and consumes a root control group (cgroup) and acts as a cgroup manager.
  • Two cgroup managers result in two views of the available and in-use resources in the system.
  • Changing the cgroup driver of a Node that has joined a cluster is a sensitive operation. If the kubelet has created Pods using the semantics of one cgroup driver, changing the container runtime to another cgroup driver can cause errors when trying to re-create the Pod sandbox for such existing Pods. Restarting the kubelet may not solve such errors.
  • The approach to mitigate this instability is to use systemd as the cgroup driver for the kubelet and the container runtime when systemd is the selected init system.
  • Kubernetes 1.26 defaults to using v1 of the CRI API. If a container runtime does not support the v1 API, the kubelet falls back to using the (deprecated) v1alpha2 API instead.
張 旭

Kubernetes Volumes Guide - Examples for NFS and Persistent Volume - 0 views

  • Persistent volumes exist beyond containers, pods, and nodes.
  • Volumes also let you share data between containers in the same pod.
  • data in that volume will be destroyed when the pod is restarted.
  • ...9 more annotations...
  • Persistent volumes are long-term storage in your Kubernetes cluster.
  • A pod uses a persistent volume claim to to get read and write access to the persistent volume.
  • NFS stands for Network File System – it's a shared filesystem that can be accessed over the network.
  • The NFS must already exist – Kubernetes doesn't run the NFS, pods in just access it.
  • what's already stored in the NFS is not deleted when a pod is destroyed. Data is persistent.
  • an NFS can be accessed from multiple pods at the same time. An NFS can be used to share data between pods!
  • volumes: - name: nfs-volume nfs: # URL for the NFS server server: 10.108.211.244 # Change this! path: /
  • volumeMounts: - name: nfs-volume mountPath: /var/nfs
  • Just add the volume to each pod, and add a volume mount to use the NFS volume from each container.
  •  
    "Persistent volumes exist beyond containers, pods, and nodes. "
張 旭

How Percona XtraBackup Works - 0 views

  • Percona XtraBackup is based on InnoDB‘s crash-recovery functionality.
  • it performs crash recovery on the files to make them a consistent, usable database again
  • InnoDB maintains a redo log, also called the transaction log. This contains a record of every change to InnoDB data.
  • ...14 more annotations...
  • When InnoDB starts, it inspects the data files and the transaction log, and performs two steps. It applies committed transaction log entries to the data files, and it performs an undo operation on any transactions that modified data but did not commit.
  • Percona XtraBackup works by remembering the log sequence number (LSN) when it starts, and then copying away the data files.
  • Percona XtraBackup runs a background process that watches the transaction log files, and copies changes from it.
  • Percona XtraBackup needs to do this continually
  • Percona XtraBackup needs the transaction log records for every change to the data files since it began execution.
  • Percona XtraBackup uses Backup locks where available as a lightweight alternative to FLUSH TABLES WITH READ LOCK.
  • Locking is only done for MyISAM and other non-InnoDB tables after Percona XtraBackup finishes backing up all InnoDB/XtraDB data and logs.
  • xtrabackup tries to avoid backup locks and FLUSH TABLES WITH READ LOCK when the instance contains only InnoDB tables. In this case, xtrabackup obtains binary log coordinates from performance_schema.log_status
  • When backup locks are supported by the server, xtrabackup first copies InnoDB data, runs the LOCK TABLES FOR BACKUP and then copies the MyISAM tables.
  • the STDERR of xtrabackup is not written in any file. You will have to redirect it to a file, e.g., xtrabackup OPTIONS 2> backupout.log
  • During the prepare phase, Percona XtraBackup performs crash recovery against the copied data files, using the copied transaction log file. After this is done, the database is ready to restore and use.
  • the tools enable you to do operations such as streaming and incremental backups with various combinations of copying the data files, copying the log files, and applying the logs to the data.
  • To restore a backup with xtrabackup you can use the --copy-back or --move-back options.
  • you may have to change the files’ ownership to mysql before starting the database server, as they will be owned by the user who created the backup.
  •  
    "Percona XtraBackup is based on InnoDB's crash-recovery functionality."
張 旭

Volumes - Kubernetes - 0 views

  • On-disk files in a Container are ephemeral,
  • when a Container crashes, kubelet will restart it, but the files will be lost - the Container starts with a clean state
  • In Docker, a volume is simply a directory on disk or in another Container.
  • ...105 more annotations...
  • A Kubernetes volume, on the other hand, has an explicit lifetime - the same as the Pod that encloses it.
  • a volume outlives any Containers that run within the Pod, and data is preserved across Container restarts.
    • 張 旭
       
      Kubernetes Volume 是跟著 Pod 的生命週期在走
  • Kubernetes supports many types of volumes, and a Pod can use any number of them simultaneously.
  • To use a volume, a Pod specifies what volumes to provide for the Pod (the .spec.volumes field) and where to mount those into Containers (the .spec.containers.volumeMounts field).
  • A process in a container sees a filesystem view composed from their Docker image and volumes.
  • Volumes can not mount onto other volumes or have hard links to other volumes.
  • Each Container in the Pod must independently specify where to mount each volume
  • localnfs
  • cephfs
  • awsElasticBlockStore
  • glusterfs
  • vsphereVolume
  • An awsElasticBlockStore volume mounts an Amazon Web Services (AWS) EBS Volume into your Pod.
  • the contents of an EBS volume are preserved and the volume is merely unmounted.
  • an EBS volume can be pre-populated with data, and that data can be “handed off” between Pods.
  • create an EBS volume using aws ec2 create-volume
  • the nodes on which Pods are running must be AWS EC2 instances
  • EBS only supports a single EC2 instance mounting a volume
  • check that the size and EBS volume type are suitable for your use!
  • A cephfs volume allows an existing CephFS volume to be mounted into your Pod.
  • the contents of a cephfs volume are preserved and the volume is merely unmounted.
    • 張 旭
       
      相當於自己的 AWS EBS
  • CephFS can be mounted by multiple writers simultaneously.
  • have your own Ceph server running with the share exported
  • configMap
  • The configMap resource provides a way to inject configuration data into Pods
  • When referencing a configMap object, you can simply provide its name in the volume to reference it
  • volumeMounts: - name: config-vol mountPath: /etc/config volumes: - name: config-vol configMap: name: log-config items: - key: log_level path: log_level
  • create a ConfigMap before you can use it.
  • A Container using a ConfigMap as a subPath volume mount will not receive ConfigMap updates.
  • An emptyDir volume is first created when a Pod is assigned to a Node, and exists as long as that Pod is running on that node.
  • When a Pod is removed from a node for any reason, the data in the emptyDir is deleted forever.
  • By default, emptyDir volumes are stored on whatever medium is backing the node - that might be disk or SSD or network storage, depending on your environment.
  • you can set the emptyDir.medium field to "Memory" to tell Kubernetes to mount a tmpfs (RAM-backed filesystem)
  • volumeMounts: - mountPath: /cache name: cache-volume volumes: - name: cache-volume emptyDir: {}
  • An fc volume allows an existing fibre channel volume to be mounted in a Pod.
  • configure FC SAN Zoning to allocate and mask those LUNs (volumes) to the target WWNs beforehand so that Kubernetes hosts can access them.
  • Flocker is an open-source clustered Container data volume manager. It provides management and orchestration of data volumes backed by a variety of storage backends.
  • emptyDir
  • flocker
  • A flocker volume allows a Flocker dataset to be mounted into a Pod
  • have your own Flocker installation running
  • A gcePersistentDisk volume mounts a Google Compute Engine (GCE) Persistent Disk into your Pod.
  • Using a PD on a Pod controlled by a ReplicationController will fail unless the PD is read-only or the replica count is 0 or 1
  • A glusterfs volume allows a Glusterfs (an open source networked filesystem) volume to be mounted into your Pod.
  • have your own GlusterFS installation running
  • A hostPath volume mounts a file or directory from the host node’s filesystem into your Pod.
  • a powerful escape hatch for some applications
  • access to Docker internals; use a hostPath of /var/lib/docker
  • allowing a Pod to specify whether a given hostPath should exist prior to the Pod running, whether it should be created, and what it should exist as
  • specify a type for a hostPath volume
  • the files or directories created on the underlying hosts are only writable by root.
  • hostPath: # directory location on host path: /data # this field is optional type: Directory
  • An iscsi volume allows an existing iSCSI (SCSI over IP) volume to be mounted into your Pod.
  • have your own iSCSI server running
  • A feature of iSCSI is that it can be mounted as read-only by multiple consumers simultaneously.
  • A local volume represents a mounted local storage device such as a disk, partition or directory.
  • Local volumes can only be used as a statically created PersistentVolume.
  • Compared to hostPath volumes, local volumes can be used in a durable and portable manner without manually scheduling Pods to nodes, as the system is aware of the volume’s node constraints by looking at the node affinity on the PersistentVolume.
  • If a node becomes unhealthy, then the local volume will also become inaccessible, and a Pod using it will not be able to run.
  • PersistentVolume spec using a local volume and nodeAffinity
  • PersistentVolume nodeAffinity is required when using local volumes. It enables the Kubernetes scheduler to correctly schedule Pods using local volumes to the correct node.
  • PersistentVolume volumeMode can now be set to “Block” (instead of the default value “Filesystem”) to expose the local volume as a raw block device.
  • When using local volumes, it is recommended to create a StorageClass with volumeBindingMode set to WaitForFirstConsumer
  • An nfs volume allows an existing NFS (Network File System) share to be mounted into your Pod.
  • NFS can be mounted by multiple writers simultaneously.
  • have your own NFS server running with the share exported
  • A persistentVolumeClaim volume is used to mount a PersistentVolume into a Pod.
  • PersistentVolumes are a way for users to “claim” durable storage (such as a GCE PersistentDisk or an iSCSI volume) without knowing the details of the particular cloud environment.
  • A projected volume maps several existing volume sources into the same directory.
  • All sources are required to be in the same namespace as the Pod. For more details, see the all-in-one volume design document.
  • Each projected volume source is listed in the spec under sources
  • A Container using a projected volume source as a subPath volume mount will not receive updates for those volume sources.
  • RBD volumes can only be mounted by a single consumer in read-write mode - no simultaneous writers allowed
  • A secret volume is used to pass sensitive information, such as passwords, to Pods
  • store secrets in the Kubernetes API and mount them as files for use by Pods
  • secret volumes are backed by tmpfs (a RAM-backed filesystem) so they are never written to non-volatile storage.
  • create a secret in the Kubernetes API before you can use it
  • A Container using a Secret as a subPath volume mount will not receive Secret updates.
  • StorageOS runs as a Container within your Kubernetes environment, making local or attached storage accessible from any node within the Kubernetes cluster.
  • Data can be replicated to protect against node failure. Thin provisioning and compression can improve utilization and reduce cost.
  • StorageOS provides block storage to Containers, accessible via a file system.
  • A vsphereVolume is used to mount a vSphere VMDK Volume into your Pod.
  • supports both VMFS and VSAN datastore.
  • create VMDK using one of the following methods before using with Pod.
  • share one volume for multiple uses in a single Pod.
  • The volumeMounts.subPath property can be used to specify a sub-path inside the referenced volume instead of its root.
  • volumeMounts: - name: workdir1 mountPath: /logs subPathExpr: $(POD_NAME)
  • env: - name: POD_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.name
  • Use the subPathExpr field to construct subPath directory names from Downward API environment variables
  • enable the VolumeSubpathEnvExpansion feature gate
  • The subPath and subPathExpr properties are mutually exclusive.
  • There is no limit on how much space an emptyDir or hostPath volume can consume, and no isolation between Containers or between Pods.
  • emptyDir and hostPath volumes will be able to request a certain amount of space using a resource specification, and to select the type of media to use, for clusters that have several media types.
  • the Container Storage Interface (CSI) and Flexvolume. They enable storage vendors to create custom storage plugins without adding them to the Kubernetes repository.
  • all volume plugins (like volume types listed above) were “in-tree” meaning they were built, linked, compiled, and shipped with the core Kubernetes binaries and extend the core Kubernetes API.
  • Container Storage Interface (CSI) defines a standard interface for container orchestration systems (like Kubernetes) to expose arbitrary storage systems to their container workloads.
  • Once a CSI compatible volume driver is deployed on a Kubernetes cluster, users may use the csi volume type to attach, mount, etc. the volumes exposed by the CSI driver.
  • The csi volume type does not support direct reference from Pod and may only be referenced in a Pod via a PersistentVolumeClaim object.
  • This feature requires CSIInlineVolume feature gate to be enabled:--feature-gates=CSIInlineVolume=true
  • In-tree plugins that support CSI Migration and have a corresponding CSI driver implemented are listed in the “Types of Volumes” section above.
  • Mount propagation allows for sharing volumes mounted by a Container to other Containers in the same Pod, or even to other Pods on the same node.
  • Mount propagation of a volume is controlled by mountPropagation field in Container.volumeMounts.
  • HostToContainer - This volume mount will receive all subsequent mounts that are mounted to this volume or any of its subdirectories.
  • Bidirectional - This volume mount behaves the same the HostToContainer mount. In addition, all volume mounts created by the Container will be propagated back to the host and to all Containers of all Pods that use the same volume.
  • Edit your Docker’s systemd service file. Set MountFlags as follows:MountFlags=shared
張 旭

MongoDB Performance Tuning: Everything You Need to Know - Stackify - 0 views

  • db.serverStatus().globalLock
  • db.serverStatus().locks
  • globalLock.currentQueue.total: This number can indicate a possible concurrency issue if it’s consistently high. This can happen if a lot of requests are waiting for a lock to be released.
  • ...35 more annotations...
  • globalLock.totalTime: If this is higher than the total database uptime, the database has been in a lock state for too long.
  • Unlike relational databases such as MySQL or PostgreSQL, MongoDB uses JSON-like documents for storing data.
  • Databases operate in an environment that consists of numerous reads, writes, and updates.
  • When a lock occurs, no other operation can read or modify the data until the operation that initiated the lock is finished.
  • locks.deadlockCount: Number of times the lock acquisitions have encountered deadlocks
  • Is the database frequently locking from queries? This might indicate issues with the schema design, query structure, or system architecture.
  • For version 3.2 on, WiredTiger is the default.
  • MMAPv1 locks whole collections, not individual documents.
  • WiredTiger performs locking at the document level.
  • When the MMAPv1 storage engine is in use, MongoDB will use memory-mapped files to store data.
  • All available memory will be allocated for this usage if the data set is large enough.
  • db.serverStatus().mem
  • mem.resident: Roughly equivalent to the amount of RAM in megabytes that the database process uses
  • If mem.resident exceeds the value of system memory and there’s a large amount of unmapped data on disk, we’ve most likely exceeded system capacity.
  • If the value of mem.mapped is greater than the amount of system memory, some operations will experience page faults.
  • The WiredTiger storage engine is a significant improvement over MMAPv1 in performance and concurrency.
  • By default, MongoDB will reserve 50 percent of the available memory for the WiredTiger data cache.
  • wiredTiger.cache.bytes currently in the cache – This is the size of the data currently in the cache.
  • wiredTiger.cache.tracked dirty bytes in the cache – This is the size of the dirty data in the cache.
  • we can look at the wiredTiger.cache.bytes read into cache value for read-heavy applications. If this value is consistently high, increasing the cache size may improve overall read performance.
  • check whether the application is read-heavy. If it is, increase the size of the replica set and distribute the read operations to secondary members of the set.
  • write-heavy, use sharding within a sharded cluster to distribute the load.
  • Replication is the propagation of data from one node to another
  • Replication sets handle this replication.
  • Sometimes, data isn’t replicated as quickly as we’d like.
  • a particularly thorny problem if the lag between a primary and secondary node is high and the secondary becomes the primary
  • use the db.printSlaveReplicationInfo() or the rs.printSlaveReplicationInfo() command to see the status of a replica set from the perspective of the secondary member of the set.
  • shows how far behind the secondary members are from the primary. This number should be as low as possible.
  • monitor this metric closely.
  • watch for any spikes in replication delay.
  • Always investigate these issues to understand the reasons for the lag.
  • One replica set is primary. All others are secondary.
  • it’s not normal for nodes to change back and forth between primary and secondary.
  • use the profiler to gain a deeper understanding of the database’s behavior.
  • Enabling the profiler can affect system performance, due to the additional activity.
  •  
    "globalLock.currentQueue.total: This number can indicate a possible concurrency issue if it's consistently high. This can happen if a lot of requests are waiting for a lock to be released."
張 旭

Incremental Backup - 0 views

  • xtrabackup supports incremental backups, which means that they can copy only the data that has changed since the last backup.
  • You can perform many incremental backups between each full backup, so you can set up a backup process such as a full backup once a week and an incremental backup every day, or full backups every day and incremental backups every hour.
  • each InnoDB page contains a log sequence number, or LSN. The LSN is the system version number for the entire database. Each page’s LSN shows how recently it was changed.
  • ...18 more annotations...
  • In full backups, two types of operations are performed to make the database consistent: committed transactions are replayed from the log file against the data files, and uncommitted transactions are rolled back.
  • You should use the --apply-log-only option to prevent the rollback phase.
  • An incremental backup copies each page whose LSN is newer than the previous incremental or full backup’s LSN.
  • Incremental backups do not actually compare the data files to the previous backup’s data files.
  • you can use --incremental-lsn to perform an incremental backup without even having the previous backup, if you know its LSN
  • Incremental backups simply read the pages and compare their LSN to the last backup’s LSN.
  • without a full backup to act as a base, the incremental backups are useless.
  • The xtrabackup binary writes a file called xtrabackup_checkpoints into the backup’s target directory. This file contains a line showing the to_lsn, which is the database’s LSN at the end of the backup.
  • from_lsn is the starting LSN of the backup and for incremental it has to be the same as to_lsn (if it is the last checkpoint) of the previous/base backup.
  • If you do not use the --apply-log-only option to prevent the rollback phase, then your incremental backups will be useless.
  • run --prepare as usual, but prevent the rollback phase
  • If you restore it and start MySQL, InnoDB will detect that the rollback phase was not performed, and it will do that in the background, as it usually does for a crash recovery upon start.
  • xtrabackup --prepare --apply-log-only --target-dir=/data/backups/base \ --incremental-dir=/data/backups/inc1
  • The final data is in /data/backups/base, not in the incremental directory.
  • Do not run xtrabackup --prepare with the same incremental backup directory (the value of –incremental-dir) more than once.
  • xtrabackup --prepare --target-dir=/data/backups/base \ --incremental-dir=/data/backups/inc2
  • --apply-log-only should be used when merging all incrementals except the last one.
  • Even if the --apply-log-only was used on the last step, backup would still be consistent but in that case server would perform the rollback phase.
張 旭

Replication - MongoDB Manual - 0 views

  • A replica set in MongoDB is a group of mongod processes that maintain the same data set.
  • Replica sets provide redundancy and high availability, and are the basis for all production deployments.
  • With multiple copies of data on different database servers, replication provides a level of fault tolerance against the loss of a single database server.
  • ...18 more annotations...
  • replication can provide increased read capacity as clients can send read operations to different servers.
  • A replica set is a group of mongod instances that maintain the same data set.
  • A replica set contains several data bearing nodes and optionally one arbiter node.
  • one and only one member is deemed the primary node, while the other nodes are deemed secondary nodes.
  • A replica set can have only one primary capable of confirming writes with { w: "majority" } write concern; although in some circumstances, another mongod instance may transiently believe itself to also be primary.
  • The secondaries replicate the primary’s oplog and apply the operations to their data sets such that the secondaries’ data sets reflect the primary’s data set
  • add a mongod instance to a replica set as an arbiter. An arbiter participates in elections but does not hold data
  • An arbiter will always be an arbiter whereas a primary may step down and become a secondary and a secondary may become the primary during an election.
  • Secondaries replicate the primary’s oplog and apply the operations to their data sets asynchronously.
  • These slow oplog messages are logged for the secondaries in the diagnostic log under the REPL component with the text applied op: <oplog entry> took <num>ms.
  • Replication lag refers to the amount of time that it takes to copy (i.e. replicate) a write operation on the primary to a secondary.
  • When a primary does not communicate with the other members of the set for more than the configured electionTimeoutMillis period (10 seconds by default), an eligible secondary calls for an election to nominate itself as the new primary.
  • The replica set cannot process write operations until the election completes successfully.
  • The median time before a cluster elects a new primary should not typically exceed 12 seconds, assuming default replica configuration settings.
  • Factors such as network latency may extend the time required for replica set elections to complete, which in turn affects the amount of time your cluster may operate without a primary.
  • Your application connection logic should include tolerance for automatic failovers and the subsequent elections.
  • MongoDB drivers can detect the loss of the primary and automatically retry certain write operations a single time, providing additional built-in handling of automatic failovers and elections
  • By default, clients read from the primary [1]; however, clients can specify a read preference to send read operations to secondaries.
張 旭

Trunk-based Development | Atlassian - 0 views

  • Trunk-based development is a version control management practice where developers merge small, frequent updates to a core “trunk” or main branch.
  • Gitflow and trunk-based development. 
  • Gitflow, which was popularized first, is a stricter development model where only certain individuals can approve changes to the main code. This maintains code quality and minimizes the number of bugs.
  • ...20 more annotations...
  • Trunk-based development is a more open model since all developers have access to the main code. This enables teams to iterate quickly and implement CI/CD.
  • Developers can create short-lived branches with a few small commits compared to other long-lived feature branching strategies.
  • Gitflow is an alternative Git branching model that uses long-lived feature branches and multiple primary branches.
  • Gitflow also has separate primary branch lines for development, hotfixes, features, and releases.
  • Trunk-based development is far more simplified since it focuses on the main branch as the source of fixes and releases.
  • Trunk-based development eases the friction of code integration.
  • trunk-based development model reduces these conflicts.
  • Adding an automated test suite and code coverage monitoring for this stream of commits enables continuous integration.
  • When new code is merged into the trunk, automated integration and code coverage tests run to validate the code quality.
  • Trunk-based development strives to keep the trunk branch “green”, meaning it's ready to deploy at any commit.
  • With continuous integration, developers perform trunk-based development in conjunction with automated tests that run after each committee to a trunk.
  • If trunk-based development was like music it would be a rapid staccato -- short, succinct notes in rapid succession, with the repository commits being the notes.
  • Instead of creating a feature branch and waiting to build out the complete specification, developers can instead create a trunk commit that introduces the feature flag and pushes new trunk commits that build out the feature specification within the flag.
  • Automated testing is necessary for any modern software project intending to achieve CI/CD.
  • Short running unit and integration tests are executed during development and upon code merge.
  • Automated tests provide a layer of preemptive code review.
  • Once a branch merges, it is best practice to delete it.
  • A repository with a large amount of active branches has some unfortunate side effects
  • Merge branches to the trunk at least once a day
  • The “continuous” in CI/CD implies that updates are constantly flowing.
張 旭

Ingress - Kubernetes - 0 views

  • An API object that manages external access to the services in a cluster, typically HTTP.
  • load balancing
  • SSL termination
  • ...62 more annotations...
  • name-based virtual hosting
  • Edge routerA router that enforces the firewall policy for your cluster.
  • Cluster networkA set of links, logical or physical, that facilitate communication within a cluster according to the Kubernetes networking model.
  • A Kubernetes ServiceA way to expose an application running on a set of Pods as a network service. that identifies a set of Pods using labelTags objects with identifying attributes that are meaningful and relevant to users. selectors.
  • Services are assumed to have virtual IPs only routable within the cluster network.
  • Ingress exposes HTTP and HTTPS routes from outside the cluster to services within the cluster.
  • Traffic routing is controlled by rules defined on the Ingress resource.
  • An Ingress can be configured to give Services externally-reachable URLs, load balance traffic, terminate SSL / TLS, and offer name based virtual hosting.
  • Exposing services other than HTTP and HTTPS to the internet typically uses a service of type Service.Type=NodePort or Service.Type=LoadBalancer.
  • You must have an ingress controller to satisfy an Ingress. Only creating an Ingress resource has no effect.
  • As with all other Kubernetes resources, an Ingress needs apiVersion, kind, and metadata fields
  • Ingress frequently uses annotations to configure some options depending on the Ingress controller,
  • Ingress resource only supports rules for directing HTTP traffic.
  • An optional host.
  • A list of paths
  • A backend is a combination of Service and port names
  • has an associated backend
  • Both the host and path must match the content of an incoming request before the load balancer directs traffic to the referenced Service.
  • HTTP (and HTTPS) requests to the Ingress that matches the host and path of the rule are sent to the listed backend.
  • A default backend is often configured in an Ingress controller to service any requests that do not match a path in the spec.
  • An Ingress with no rules sends all traffic to a single default backend.
  • Ingress controllers and load balancers may take a minute or two to allocate an IP address.
  • A fanout configuration routes traffic from a single IP address to more than one Service, based on the HTTP URI being requested.
  • nginx.ingress.kubernetes.io/rewrite-target: /
  • describe ingress
  • get ingress
  • Name-based virtual hosts support routing HTTP traffic to multiple host names at the same IP address.
  • route requests based on the Host header.
  • an Ingress resource without any hosts defined in the rules, then any web traffic to the IP address of your Ingress controller can be matched without a name based virtual host being required.
  • secure an Ingress by specifying a SecretStores sensitive information, such as passwords, OAuth tokens, and ssh keys. that contains a TLS private key and certificate.
  • Currently the Ingress only supports a single TLS port, 443, and assumes TLS termination.
  • An Ingress controller is bootstrapped with some load balancing policy settings that it applies to all Ingress, such as the load balancing algorithm, backend weight scheme, and others.
  • persistent sessions, dynamic weights) are not yet exposed through the Ingress. You can instead get these features through the load balancer used for a Service.
  • review the controller specific documentation to see how they handle health checks
  • edit ingress
  • After you save your changes, kubectl updates the resource in the API server, which tells the Ingress controller to reconfigure the load balancer.
  • kubectl replace -f on a modified Ingress YAML file.
  • Node: A worker machine in Kubernetes, part of a cluster.
  • in most common Kubernetes deployments, nodes in the cluster are not part of the public internet.
  • Edge router: A router that enforces the firewall policy for your cluster.
  • a gateway managed by a cloud provider or a physical piece of hardware.
  • Cluster network: A set of links, logical or physical, that facilitate communication within a cluster according to the Kubernetes networking model.
  • Service: A Kubernetes Service that identifies a set of Pods using label selectors.
  • An Ingress may be configured to give Services externally-reachable URLs, load balance traffic, terminate SSL / TLS, and offer name-based virtual hosting.
  • An Ingress does not expose arbitrary ports or protocols.
  • You must have an Ingress controller to satisfy an Ingress. Only creating an Ingress resource has no effect.
  • The name of an Ingress object must be a valid DNS subdomain name
  • The Ingress spec has all the information needed to configure a load balancer or proxy server.
  • Ingress resource only supports rules for directing HTTP(S) traffic.
  • An Ingress with no rules sends all traffic to a single default backend and .spec.defaultBackend is the backend that should handle requests in that case.
  • If defaultBackend is not set, the handling of requests that do not match any of the rules will be up to the ingress controller
  • A common usage for a Resource backend is to ingress data to an object storage backend with static assets.
  • Exact: Matches the URL path exactly and with case sensitivity.
  • Prefix: Matches based on a URL path prefix split by /. Matching is case sensitive and done on a path element by element basis.
  • multiple paths within an Ingress will match a request. In those cases precedence will be given first to the longest matching path.
  • Hosts can be precise matches (for example “foo.bar.com”) or a wildcard (for example “*.foo.com”).
  • No match, wildcard only covers a single DNS label
  • Each Ingress should specify a class, a reference to an IngressClass resource that contains additional configuration including the name of the controller that should implement the class.
  • secure an Ingress by specifying a Secret that contains a TLS private key and certificate.
  • The Ingress resource only supports a single TLS port, 443, and assumes TLS termination at the ingress point (traffic to the Service and its Pods is in plaintext).
  • TLS will not work on the default rule because the certificates would have to be issued for all the possible sub-domains.
  • hosts in the tls section need to explicitly match the host in the rules section.
張 旭

What is Data Definition Language (DDL) and how is it used? - 1 views

  • Data Definition Language (DDL) is used to create and modify the structure of objects in a database using predefined commands and a specific syntax.
  • DDL includes Structured Query Language (SQL) statements to create and drop databases, aliases, locations, indexes, tables and sequences.
  • Since DDL includes SQL statements to define changes in the database schema, it is considered a subset of SQL.
  • ...6 more annotations...
  • Data Manipulation Language (DML), commands are used to modify data in a database. DML statements control access to the database data.
  • DDL commands are used to create, delete or alter the structure of objects in a database but not its data.
  • DDL deals with descriptions of the database schema and is useful for creating new tables, indexes, sequences, stogroups, etc. and to define the attributes of these objects, such as data type, field length and alternate table names (aliases).
  • Data Query Language (DQL) is used to get data within the schema objects of a database and also to query it and impose order upon it.
  • DQL is also a subset of SQL. One of the most common commands in DQL is SELECT.
  • The most common command types in DDL are CREATE, ALTER and DROP.
張 旭

Considerations for large clusters | Kubernetes - 0 views

  • A cluster is a set of nodes (physical or virtual machines) running Kubernetes agents, managed by the control plane.
  • Kubernetes v1.23 supports clusters with up to 5000 nodes.
  • criteria: No more than 110 pods per node No more than 5000 nodes No more than 150000 total pods No more than 300000 total containers
  • ...14 more annotations...
  • In-use IP addresses
  • run one or two control plane instances per failure zone, scaling those instances vertically first and then scaling horizontally after reaching the point of falling returns to (vertical) scale.
  • Kubernetes nodes do not automatically steer traffic towards control-plane endpoints that are in the same failure zone
  • store Event objects in a separate dedicated etcd instance.
  • start and configure additional etcd instance
  • Kubernetes resource limits help to minimize the impact of memory leaks and other ways that pods and containers can impact on other components.
  • Addons' default limits are typically based on data collected from experience running each addon on small or medium Kubernetes clusters.
  • When running on large clusters, addons often consume more of some resources than their default limits.
  • Many addons scale horizontally - you add capacity by running more pods
  • The VerticalPodAutoscaler can run in recommender mode to provide suggested figures for requests and limits.
  • Some addons run as one copy per node, controlled by a DaemonSet: for example, a node-level log aggregator.
  • VerticalPodAutoscaler is a custom resource that you can deploy into your cluster to help you manage resource requests and limits for pods.
  • The cluster autoscaler integrates with a number of cloud providers to help you run the right number of nodes for the level of resource demand in your cluster.
  • The addon resizer helps you in resizing the addons automatically as your cluster's scale changes.
張 旭

Service | Kubernetes - 0 views

  • Each Pod gets its own IP address
  • Pods are nonpermanent resources.
  • Kubernetes Pods are created and destroyed to match the state of your cluster
  • ...23 more annotations...
  • In Kubernetes, a Service is an abstraction which defines a logical set of Pods and a policy by which to access them (sometimes this pattern is called a micro-service).
  • The set of Pods targeted by a Service is usually determined by a selector
  • If you're able to use Kubernetes APIs for service discovery in your application, you can query the API server for Endpoints, that get updated whenever the set of Pods in a Service changes.
  • A Service in Kubernetes is a REST object, similar to a Pod.
  • The name of a Service object must be a valid DNS label name
  • Kubernetes assigns this Service an IP address (sometimes called the "cluster IP"), which is used by the Service proxies
  • A Service can map any incoming port to a targetPort. By default and for convenience, the targetPort is set to the same value as the port field.
  • The default protocol for Services is TCP
  • As many Services need to expose more than one port, Kubernetes supports multiple port definitions on a Service object. Each port definition can have the same protocol, or a different one.
  • Because this Service has no selector, the corresponding Endpoints object is not created automatically. You can manually map the Service to the network address and port where it's running, by adding an Endpoints object manually
  • Endpoint IP addresses cannot be the cluster IPs of other Kubernetes Services
  • Kubernetes ServiceTypes allow you to specify what kind of Service you want. The default is ClusterIP
  • ClusterIP: Exposes the Service on a cluster-internal IP.
  • NodePort: Exposes the Service on each Node's IP at a static port (the NodePort). A ClusterIP Service, to which the NodePort Service routes, is automatically created. You'll be able to contact the NodePort Service, from outside the cluster, by requesting <NodeIP>:<NodePort>.
  • LoadBalancer: Exposes the Service externally using a cloud provider's load balancer
  • ExternalName: Maps the Service to the contents of the externalName field (e.g. foo.bar.example.com), by returning a CNAME record with its value. No proxying of any kind is set up.
  • You can also use Ingress to expose your Service. Ingress is not a Service type, but it acts as the entry point for your cluster.
  • If you set the type field to NodePort, the Kubernetes control plane allocates a port from a range specified by --service-node-port-range flag (default: 30000-32767).
  • The default for --nodeport-addresses is an empty list. This means that kube-proxy should consider all available network interfaces for NodePort.
  • you need to take care of possible port collisions yourself. You also have to use a valid port number, one that's inside the range configured for NodePort use.
  • Service is visible as <NodeIP>:spec.ports[*].nodePort and .spec.clusterIP:spec.ports[*].port
  • Choosing this value makes the Service only reachable from within the cluster.
  • NodePort: Exposes the Service on each Node's IP at a static port
張 旭

Scalable architecture without magic (and how to build it if you're not Google) - DEV Co... - 0 views

  • Don’t mess up write-first and read-first databases.
  • keep them stateless.
  • you should know how to make a scalable setup on bare metal.
  • ...29 more annotations...
  • Different programming languages are for different tasks.
  • Go or C which are compiled to run on bare metal.
  • To run NodeJS on multiple cores, you have to use something like PM2, but since this you have to keep your code stateless.
  • Python have very rich and sugary syntax that’s great for working with data while keeping your code small and expressive.
  • SQL is almost always slower than NoSQL
  • databases are often read-first or write-first
  • write-first, just like Cassandra.
  • store all of your data to your databases and leave nothing at backend
  • Functional code is stateless by default
  • It’s better to go for stateless right from the very beginning.
  • deliver exactly the same responses for same requests.
  • Sessions? Store them at Redis and allow all of your servers to access it.
  • Only the first user will trigger a data query, and all others will be receiving exactly the same data straight from the RAM
  • never, never cache user input
  • Only the server output should be cached
  • Varnish is a great cache option that works with HTTP responses, so it may work with any backend.
  • a rate limiter – if there’s not enough time have passed since last request, the ongoing request will be denied.
  • different requests blasting every 10ms can bring your server down
  • Just set up entry relations and allow your database to calculate external keys for you
  • the query planner will always be faster than your backend.
  • Backend should have different responsibilities: hashing, building web pages from data and templates, managing sessions and so on.
  • For anything related to data management or data models, move it to your database as procedures or queries.
  • a distributed database.
  • your code has to be stateless
  • Move anything related to the data to the database.
  • For load-balancing a database, go for cluster.
  • DB is balancing requests, as well as your backend.
  • Users from different continents are separated with DNS.
  • Keep is scalable, keep is stateless.
  •  
    "Don't mess up write-first and read-first databases."
張 旭

Docker for AWS persistent data volumes | Docker Documentation - 0 views

  • Cloudstor is a modern volume plugin built by Docker
  • Docker swarm mode tasks and regular Docker containers can use a volume created with Cloudstor to mount a persistent data volume.
  • Global shared Cloudstor volumes mounted by all tasks in a swarm service.
  • ...14 more annotations...
  • Workloads running in a Docker service that require access to low latency/high IOPs persistent storage, such as a database engine, can use a relocatable Cloudstor volume backed by EBS.
  • Each relocatable Cloudstor volume is backed by a single EBS volume.
  • If a swarm task using a relocatable Cloudstor volume gets rescheduled to another node within the same availability zone as the original node where the task was running, Cloudstor detaches the backing EBS volume from the original node and attaches it to the new target node automatically.
  • in a different availability zone,
  • Cloudstor transfers the contents of the backing EBS volume to the destination availability zone using a snapshot, and cleans up the EBS volume in the original availability zone.
  • Typically the snapshot-based transfer process across availability zones takes between 2 and 5 minutes unless the work load is write-heavy.
  • A swarm task is not started until the volume it mounts becomes available
  • Sharing/mounting the same Cloudstor volume backed by EBS among multiple tasks is not a supported scenario and leads to data loss.
  • a Cloudstor volume to share data between tasks, choose the appropriate EFS backed shared volume option.
  • When multiple swarm service tasks need to share data in a persistent storage volume, you can use a shared Cloudstor volume backed by EFS.
  • a volume and its contents can be mounted by multiple swarm service tasks without the risk of data loss
  • over NFS
  • the persistent data backed by EFS volumes is always available.
  • shared Cloudstor volumes only work in those AWS regions where EFS is supported.
1 - 20 of 166 Next › Last »
Showing 20 items per page