Skip to main content

Home/ Larvata/ Group items tagged failover

Rss Feed Group items tagged

crazylion lee

Keepalived for Linux - 0 views

  •  
    Keepalived is a routing software written in C. The main goal of this project is to provide simple and robust facilities for loadbalancing and high-availability to Linux system and Linux based infrastructures. Loadbalancing framework relies on well-known and widely used Linux Virtual Server (IPVS) kernel module providing Layer4 loadbalancing. Keepalived implements a set of checkers to dynamically and adaptively maintain and manage loadbalanced server pool according their health. On the other hand high-availability is achieved by VRRP protocol. VRRP is a fundamental brick for router failover. In addition, Keepalived implements a set of hooks to the VRRP finite state machine providing low-level and high-speed protocol interactions. Keepalived frameworks can be used independently or all together to provide resilient infrastructures.
張 旭

Replication - MongoDB Manual - 0 views

  • A replica set in MongoDB is a group of mongod processes that maintain the same data set.
  • Replica sets provide redundancy and high availability, and are the basis for all production deployments.
  • With multiple copies of data on different database servers, replication provides a level of fault tolerance against the loss of a single database server.
  • ...18 more annotations...
  • replication can provide increased read capacity as clients can send read operations to different servers.
  • A replica set is a group of mongod instances that maintain the same data set.
  • A replica set contains several data bearing nodes and optionally one arbiter node.
  • one and only one member is deemed the primary node, while the other nodes are deemed secondary nodes.
  • A replica set can have only one primary capable of confirming writes with { w: "majority" } write concern; although in some circumstances, another mongod instance may transiently believe itself to also be primary.
  • The secondaries replicate the primary’s oplog and apply the operations to their data sets such that the secondaries’ data sets reflect the primary’s data set
  • add a mongod instance to a replica set as an arbiter. An arbiter participates in elections but does not hold data
  • An arbiter will always be an arbiter whereas a primary may step down and become a secondary and a secondary may become the primary during an election.
  • Secondaries replicate the primary’s oplog and apply the operations to their data sets asynchronously.
  • These slow oplog messages are logged for the secondaries in the diagnostic log under the REPL component with the text applied op: <oplog entry> took <num>ms.
  • Replication lag refers to the amount of time that it takes to copy (i.e. replicate) a write operation on the primary to a secondary.
  • When a primary does not communicate with the other members of the set for more than the configured electionTimeoutMillis period (10 seconds by default), an eligible secondary calls for an election to nominate itself as the new primary.
  • The replica set cannot process write operations until the election completes successfully.
  • The median time before a cluster elects a new primary should not typically exceed 12 seconds, assuming default replica configuration settings.
  • Factors such as network latency may extend the time required for replica set elections to complete, which in turn affects the amount of time your cluster may operate without a primary.
  • Your application connection logic should include tolerance for automatic failovers and the subsequent elections.
  • MongoDB drivers can detect the loss of the primary and automatically retry certain write operations a single time, providing additional built-in handling of automatic failovers and elections
  • By default, clients read from the primary [1]; however, clients can specify a read preference to send read operations to secondaries.
張 旭

Flynn: first preview release | Hacker News - 0 views

  • Etcd and Zookeeper provide essentially the same functionality. They are both a strongly consistent key/value stores that support notifications to clients of changes. These two projects are limited to service discovery
  • So lets say you had a client application that would talk to a node application that could be on any number of servers. What you could do is hard code that list into your application and randomly select one, in order to "fake" load balancing. However every time a machine went up or down you would have to update that list.
  • What Consul provides is you just tell your app to connect to "mynodeapp.consul" and then consul will give you the proper address of one of your node apps.
  • ...9 more annotations...
  • Consul and Skydock are both applications that build on top of a tool like Zookeeper and Etcd.
  • What a developer ideally wants to do is just push code and not have to worry about what servers are running what, and worry about failover and the like
  • What Flynn provides (if I get it), is a diy Heroku like platform
  • Another project that I believe may be similar to Flynn is Apache Mesos.
  • a self hosted Heroku
  • Google Omega is Google's answer to Apache Mesos
  • Omega would need a service like Raft to understand what services are currently available
  • Raft is a consensus algorithm for keeping a set of distributed state machines in a consistent state.
  • I want to use Docker, but it has no easy way to say "take this file that contains instructions and make everything". You can write Dockerfiles, but you can only use one part of the stack in them, otherwise you run into trouble.
  •  
    " So lets say you had a client application that would talk to a node application that could be on any number of servers. What you could do is hard code that list into your application and randomly select one, in order to "fake" load balancing. However every time a machine went up or down you would have to update that list. What Consul provides is you just tell your app to connect to "mynodeapp.consul" and then consul will give you the proper address of one of your node apps."
張 旭

MySQL on Docker: Running ProxySQL as Kubernetes Service | Severalnines - 0 views

  • Using Kubernetes ConfigMap approach, ProxySQL can be clustered with immutable configuration.
  • Kubernetes handles ProxySQL recovery and balance the connections to the instances automatically.
  • Can be used with external applications outside Kubernetes.
  • ...11 more annotations...
  • load balancing, connection failover and decoupling of the application tier from the underlying database topologies.
  • ProxySQL as a Kubernetes service (centralized deployment)
  • running as a service makes ProxySQL pods live independently from the applications and can be easily scaled and clustered together with the help of Kubernetes ConfigMap.
  • ProxySQL's multi-layer configuration system makes pod clustering possible with ConfigMap.
  • create ProxySQL pods and attach a Kubernetes service to be accessed by the other pods within the Kubernetes network or externally.
  • Default to 6033 for MySQL load-balanced connections and 6032 for ProxySQL administration console.
  • separated by "---" delimiter
  • deploy two ProxySQL pods as a ReplicaSet that matches containers labelled with "app=proxysql,tier=frontend".
  • A Kubernetes service is an abstraction layer which defines the logical set of pods and a policy by which to access them
  • The range of valid ports for NodePort resource is 30000-32767.
  • ConfigMap - To store ProxySQL configuration file as a volume so it can be mounted to multiple pods and can be remounted again if the pod is being rescheduled to the other Kubernetes node.
張 旭

An Introduction to HAProxy and Load Balancing Concepts | DigitalOcean - 0 views

  • HAProxy, which stands for High Availability Proxy
  • improve the performance and reliability of a server environment by distributing the workload across multiple servers (e.g. web, application, database).
  • ACLs are used to test some condition and perform an action (e.g. select a server, or block a request) based on the test result.
  • ...28 more annotations...
  • Access Control List (ACL)
  • ACLs allows flexible network traffic forwarding based on a variety of factors like pattern-matching and the number of connections to a backend
  • A backend is a set of servers that receives forwarded requests
  • adding more servers to your backend will increase your potential load capacity by spreading the load over multiple servers
  • mode http specifies that layer 7 proxying will be used
  • specifies the load balancing algorithm
  • health checks
  • A frontend defines how requests should be forwarded to backends
  • use_backend rules, which define which backends to use depending on which ACL conditions are matched, and/or a default_backend rule that handles every other case
  • A frontend can be configured to various types of network traffic
  • Load balancing this way will forward user traffic based on IP range and port
  • Generally, all of the servers in the web-backend should be serving identical content--otherwise the user might receive inconsistent content.
  • Using layer 7 allows the load balancer to forward requests to different backend servers based on the content of the user's request.
  • allows you to run multiple web application servers under the same domain and port
  • acl url_blog path_beg /blog matches a request if the path of the user's request begins with /blog.
  • Round Robin selects servers in turns
  • Selects the server with the least number of connections--it is recommended for longer sessions
  • This selects which server to use based on a hash of the source IP
  • ensure that a user will connect to the same server
  • require that a user continues to connect to the same backend server. This persistence is achieved through sticky sessions, using the appsession parameter in the backend that requires it.
  • HAProxy uses health checks to determine if a backend server is available to process requests.
  • The default health check is to try to establish a TCP connection to the server
  • If a server fails a health check, and therefore is unable to serve requests, it is automatically disabled in the backend
  • For certain types of backends, like database servers in certain situations, the default health check is insufficient to determine whether a server is still healthy.
  • However, your load balancer is a single point of failure in these setups; if it goes down or gets overwhelmed with requests, it can cause high latency or downtime for your service.
  • A high availability (HA) setup is an infrastructure without a single point of failure
  • a static IP address that can be remapped from one server to another.
  • If that load balancer fails, your failover mechanism will detect it and automatically reassign the IP address to one of the passive servers.
張 旭

[Elasticsearch] 分散式特性 & 分散式搜尋的機制 | 小信豬的原始部落 - 0 views

  • 水平擴展儲存空間
  • Data HA:若有 node 掛掉,資料不會遺失
  • 若是要查詢 cluster 中的 node 狀態,可以使用 GET /_cat/nodes API
  • ...39 more annotations...
  • 決定每個 shard 要被分配到哪個 data node 上
  • 為 cluster 設置多個 master node
  • 一旦發現被選中的 master node 出現問題,就會選出新的 master node
  • 每個 node 啟動時就預設是一個 master eligible node,可以透過設定 node.master: false 取消此預設設定
  • 處理 request 的 node 稱為 Coordinating Node,其功能是將 request 轉發到合適的 node 上
  • 所有的 node 都預設是 Coordinating Node
  • coordinating node 可以直接接收 search request 並處理,不需要透過 master node 轉過來
  • 可以保存資料的 node,每個 node 啟動後都會預設是 data node,可以透過設定 node.data: false 停用 data node 功能
  • 由 master node 決定如何把分片分發到不同的 data node 上
  • 每個 node 上都保存了 cluster state
  • 只有 master 才可以修改 cluster state 並負責同步給其他 node
  • 每個 node 都會詳細紀錄本身的狀態資訊
  • shard 是 Elasticsearch 分散式儲存的基礎,包含 primary shard & replica shard
  • 每一個 shard 就是一個 Lucene instance
  • primary shard 功能是將一份被索引後的資料,分散到多個 data node 上存放,實現儲存方面的水平擴展
  • primary shard 的數量在建立 index 時就會指定,後續是無法修改的,若要修改就必須要進行 reindex
  • 當 primary shard 遺失時,replica shard 就可以被 promote 成 primary shard 來保持資料完整性
  • replica shard 數量可以動態調整,讓每個 data node 上都有完整的資料
  • ES 7.0 開始,primary shard 預設為 1,replica shard 預設為 0
  • replica shard 若設定過多,會降低 cluster 整體的寫入效能
  • replica shard 必須和 primary shard 被分配在不同的 data node 上
  • 所有的 primary shard 可以在同一個 data node 上
  • 透過 GET _cluster/health/<target> 可以取得目前 cluster 的健康狀態
  • Yellow:表示 primary shard 可以正常分配,但 replica shard 分配有問題
  • 透過 GET /_cat/shards/<target> 可以取得目前的 shard 狀態
  • replica shard 無法被分配,因此 cluster 健康狀態為黃色
  • 若是擔心 reboot 機器造成 failover 動作開始執行,可以設定將 replication 延遲一段時間後再執行(透過調整 settings 中的 index.unassigned.node_left.delayed_timeout 參數),避免無謂的 data copy 動作 (此功能稱為 delay allocation)
  • 集群變紅,代表有 primary shard 丟失,這個時候會影響讀寫。
  • 如果 node 重新回來,會從 translog 中恢復沒有寫入的資料
  • 設定 index settings 之後,primary shard 數量無法隨意變更
  • 不建議直接發送請求到master節點,雖然也會工作,但是大量請求發送到 master,會有潛在的性能問題
  • shard 是 ES 中最小的工作單元
  • shard 是一個 Lucene 的 index
  • 將 Index Buffer 中的內容寫入 Segment,而這寫入的過程就稱為 Refresh
  • 當 document 被 refresh 進入到 segment 之後,就可以被搜尋到了
  • 在進行 refresh 時先將 segment 寫入 cache 以開放查詢
  • 將 document 進行索引時,同時也會寫入 transaction log,且預設都會寫入磁碟中
  • 每個 shard 都會有對應的 transaction log
  • 由於 transaction log 都會寫入磁碟中,因此當 node 從故障中恢復時,就會優先讀取 transaction log 來恢復資料
1 - 6 of 6
Showing 20 items per page