Group items tagged

Filter: All | Bookmarks | Topics Simple Middle

[Elasticsearch] 分散式特性 & 分散式搜尋的機制 | 小信豬的原始部落 - 0 views

godleon.github.io/...icsearch-distributed-mechanism

elasticsearch database

shared by 張旭 on 17 Apr 21 - No Cached

水平擴展儲存空間
...

Cancel
Data HA：若有 node 掛掉，資料不會遺失
...

Cancel
若是要查詢 cluster 中的 node 狀態，可以使用 GET /_cat/nodes API
...

Cancel
...39 more annotations...
決定每個 shard 要被分配到哪個 data node 上
...

Cancel
為 cluster 設置多個 master node
...

Cancel
一旦發現被選中的 master node 出現問題，就會選出新的 master node
...

Cancel
每個 node 啟動時就預設是一個 master eligible node，可以透過設定 node.master: false 取消此預設設定
...

Cancel
處理 request 的 node 稱為 Coordinating Node，其功能是將 request 轉發到合適的 node 上
...

Cancel
所有的 node 都預設是 Coordinating Node
...

Cancel
coordinating node 可以直接接收 search request 並處理，不需要透過 master node 轉過來
...

Cancel
可以保存資料的 node，每個 node 啟動後都會預設是 data node，可以透過設定 node.data: false 停用 data node 功能
...

Cancel
由 master node 決定如何把分片分發到不同的 data node 上
...

Cancel
每個 node 上都保存了 cluster state
...

Cancel
只有 master 才可以修改 cluster state 並負責同步給其他 node
...

Cancel
每個 node 都會詳細紀錄本身的狀態資訊
...

Cancel
shard 是 Elasticsearch 分散式儲存的基礎，包含 primary shard & replica shard
...

Cancel
每一個 shard 就是一個 Lucene instance
...

Cancel
primary shard 功能是將一份被索引後的資料，分散到多個 data node 上存放，實現儲存方面的水平擴展
...

Cancel
primary shard 的數量在建立 index 時就會指定，後續是無法修改的，若要修改就必須要進行 reindex
...

Cancel
當 primary shard 遺失時，replica shard 就可以被 promote 成 primary shard 來保持資料完整性
...

Cancel
replica shard 數量可以動態調整，讓每個 data node 上都有完整的資料
...

Cancel
ES 7.0 開始，primary shard 預設為 1，replica shard 預設為 0
...

Cancel
replica shard 若設定過多，會降低 cluster 整體的寫入效能
...

Cancel
replica shard 必須和 primary shard 被分配在不同的 data node 上
...

Cancel
所有的 primary shard 可以在同一個 data node 上
...

Cancel
透過 GET _cluster/health/<target> 可以取得目前 cluster 的健康狀態
...

Cancel
Yellow：表示 primary shard 可以正常分配，但 replica shard 分配有問題
...

Cancel
透過 GET /_cat/shards/<target> 可以取得目前的 shard 狀態
...

Cancel
replica shard 無法被分配，因此 cluster 健康狀態為黃色
...

Cancel
若是擔心 reboot 機器造成 failover 動作開始執行，可以設定將 replication 延遲一段時間後再執行(透過調整 settings 中的 index.unassigned.node_left.delayed_timeout 參數)，避免無謂的 data copy 動作 (此功能稱為 delay allocation)
...

Cancel
集群變紅，代表有 primary shard 丟失，這個時候會影響讀寫。
...

Cancel
如果 node 重新回來，會從 translog 中恢復沒有寫入的資料
...

Cancel
設定 index settings 之後，primary shard 數量無法隨意變更
...

Cancel
不建議直接發送請求到master節點，雖然也會工作，但是大量請求發送到 master，會有潛在的性能問題
...

Cancel
shard 是 ES 中最小的工作單元
...

Cancel
shard 是一個 Lucene 的 index
...

Cancel
將 Index Buffer 中的內容寫入 Segment，而這寫入的過程就稱為 Refresh
...

Cancel
當 document 被 refresh 進入到 segment 之後，就可以被搜尋到了
...

Cancel
在進行 refresh 時先將 segment 寫入 cache 以開放查詢
...

Cancel
將 document 進行索引時，同時也會寫入 transaction log，且預設都會寫入磁碟中
...

Cancel
每個 shard 都會有對應的 transaction log
...

Cancel
由於 transaction log 都會寫入磁碟中，因此當 node 從故障中恢復時，就會優先讀取 transaction log 來恢復資料
...

Cancel

Logstash Alternatives: Pros & Cons of 5 Log Shippers [2019] - Sematext - 0 views

sematext.com/...logstash-alternatives

log monitor system logstash fluentd

shared by 張旭 on 05 Nov 19 - No Cached

In this case, Elasticsearch. And because Elasticsearch can be down or struggling, or the network can be down, the shipper would ideally be able to buffer and retry
...

Cancel
Logstash is typically used for collecting, parsing, and storing logs for future use as part of log management.
...

Cancel
Logstash’s biggest con or “Achille’s heel” has always been performance and resource consumption (the default heap size is 1GB).
...

Cancel
...37 more annotations...
This can be a problem for high traffic deployments, when Logstash servers would need to be comparable with the Elasticsearch ones.
...

Cancel
Filebeat was made to be that lightweight log shipper that pushes to Logstash or Elasticsearch.
...

Cancel
differences between Logstash and Filebeat are that Logstash has more functionality, while Filebeat takes less resources.
...

Cancel
Filebeat is just a tiny binary with no dependencies.
...

Cancel
For example, how aggressive it should be in searching for new files to tail and when to close file handles when a file didn’t get changes for a while.
...

Cancel
For example, the apache module will point Filebeat to default access.log and error.log paths
...

Cancel
Filebeat’s scope is very limited,
...

Cancel
Initially it could only send logs to Logstash and Elasticsearch, but now it can send to Kafka and Redis, and in 5.x it also gains filtering capabilities.
...

Cancel
Filebeat can parse JSON
...

Cancel
you can push directly from Filebeat to Elasticsearch, and have Elasticsearch do both parsing and storing.
...

Cancel
You shouldn’t need a buffer when tailing files because, just as Logstash, Filebeat remembers where it left off
...

Cancel
For larger deployments, you’d typically use Kafka as a queue instead, because Filebeat can talk to Kafka as well
...

Cancel
The default syslog daemon on most Linux distros, rsyslog can do so much more than just picking logs from the syslog socket and writing to /var/log/messages.
...

Cancel
It can tail files, parse them, buffer (on disk and in memory) and ship to a number of destinations, including Elasticsearch.
...

Cancel
rsyslog is the fastest shipper
...

Cancel
Its grammar-based parsing module (mmnormalize) works at constant speed no matter the number of rules (we tested this claim).
...

Cancel
use it as a simple router/shipper, any decent machine will be limited by network bandwidth
...

Cancel
It’s also one of the lightest parsers you can find, depending on the configured memory buffers.
...

Cancel
rsyslog requires more work to get the configuration right
...

Cancel
the main difference between Logstash and rsyslog is that Logstash is easier to use while rsyslog lighter.
...

Cancel
rsyslog fits well in scenarios where you either need something very light yet capable (an appliance, a small VM, collecting syslog from within a Docker container).
...

Cancel
rsyslog also works well when you need that ultimate performance.
...

Cancel
syslog-ng as an alternative to rsyslog (though historically it was actually the other way around).
...

Cancel
a modular syslog daemon, that can do much more than just syslog
...

Cancel
Unlike rsyslog, it features a clear, consistent configuration format and has nice documentation.
...

Cancel
Similarly to rsyslog, you’d probably want to deploy syslog-ng on boxes where resources are tight, yet you do want to perform potentially complex processing.
...

Cancel
syslog-ng has an easier, more polished feel than rsyslog, but likely not that ultimate performance
...

Cancel
Fluentd was built on the idea of logging in JSON wherever possible (which is a practice we totally agree with) so that log shippers down the line don’t have to guess which substring is which field of which type.
...

Cancel
Fluentd plugins are in Ruby and very easy to write.
...

Cancel
structured data through Fluentd, it’s not made to have the flexibility of other shippers on this list (Filebeat excluded).
...

Cancel
Fluent Bit, which is to Fluentd similar to how Filebeat is for Logstash.
...

Cancel
Fluentd is a good fit when you have diverse or exotic sources and destinations for your logs, because of the number of plugins.
...

Cancel
Splunk isn’t a log shipper, it’s a commercial logging solution
...

Cancel
Graylog is another complete logging solution, an open-source alternative to Splunk.
...

Cancel
everything goes through graylog-server, from authentication to queries.
...

Cancel
Graylog is nice because you have a complete logging solution, but it’s going to be harder to customize than an ELK stack.
...

Cancel
it depends
...

Cancel

Ruby on Rails 實戰聖經 | 網站效能 - 0 views

ihower.tw/performance.html

rails programming mysql performance

shared by 張旭 on 28 Nov 18 - No Cached

依照慣例是_count結尾，型別是integer，有預設值0。
...

Cancel
lol_dba提供了Rake任務可以幫忙找忘記加的索引。
...

Cancel
Bullet是一個外掛可以在開發時偵測N+1 queries問題。
...

Cancel
...19 more annotations...
存取資料庫是一種相對很慢的I/O的操作：每一條SQL query都得耗上時間、執行回傳的結果也會被轉成ActiveRecord物件全部放進記憶體
...

Cancel
如果需要撈出全部的資料做處理，強烈建議最好不要用all方法，因為這樣會把全部的資料一次放進記憶體中，如果資料有成千上萬筆的話，效能就墜毀了。
...

Cancel
.find_each( :batch_size => 100 )
...

Cancel
.find_in_batches( :batch_size => 100 )
...

Cancel
在Transaction交易範圍內的SQL效能會加快，因為最後只需要COMMIT一次即可
...

Cancel
Elasticsearch全文搜尋引擎和elasticsearch-rails gem
...

Cancel
QueryReviewer這個套件透過SQL EXPLAIN分析SQL query的效率
...

Cancel
必要時可以採用逆正規化的設計。犧牲空間，增加修改的麻煩，但是讓讀取這事件變得更快更簡單。
...

Cancel
將成本轉嫁到寫入，而最佳化了讀取時間
...

Cancel
在效能還沒有造成問題前，就為了優化效能而修改程式和架構，只會讓程式更混亂不好維護
...

Cancel
當效能還不會造成問題時，程式的維護性比考慮效能重要
...

Cancel
會拖慢整體效能的程式，只佔全部程式的一小部分而已，所以我們只最佳化會造成問題的程式。
...

Cancel
善用分析工具找效能瓶頸，最佳化前需要測量，最佳化後也要測量比較。
...

Cancel
rack-mini-profiler在頁面的左上角顯示花了多少時間，並且提供報表，推薦安裝
...

Cancel
如果是不需要權限控管的靜態檔案，可以直接放在public目錄下讓使用者下載。
...

Cancel
Web伺服器得先安裝好x_sendfile功能
...

Cancel
如果要讓你的Assets例如CSS, JavaScript, Images也讓使用者透過CDN下載，只要修改config/environments/production.rb的config.action_controller.asset_host為CDN網址即可。
...

Cancel
有時候「執行速度較快」的程式碼不代表好維護、好除錯的程式碼
...

Cancel
Ruby不是萬能，有時候直接呼叫外部程式是最快的作法
...

Cancel

Open Source Distributed Real Time Search & Analytics | Elasticsearch - 0 views

www.elasticsearch.org

search lucene engine distributed java

shared by crazylion lee on 04 Jun 14 - No Cached

250 GB/day of logs with Graylog: Lessons Learned - The HFT Guy - 0 views

thehftguy.com/...s-with-graylog-lessons-learned

log graylog server elasticsearch fluentd

shared by 張旭 on 25 Sep 17 - No Cached

Best practices for building Kubernetes Operators and stateful apps | Google Cloud Blog - 0 views

cloud.google.com/...es-operators-and-stateful-apps

kubernetes system operator

shared by 張旭 on 17 May 21 - No Cached

use the StatefulSet workload controller to maintain identity for each of the pods, and to use Persistent Volumes to persist data so it can survive a service restart.
...

Cancel
a way to extend Kubernetes functionality with application specific logic using custom resources and custom controllers.
...

Cancel
An Operator can automate various features of an application, but it should be specific to a single application
...

Cancel
...12 more annotations...
Kubebuilder is a comprehensive development kit for building and publishing Kubernetes APIs and Controllers using CRDs
...

Cancel
Design declarative APIs for operators, not imperative APIs. This aligns well with Kubernetes APIs that are declarative in nature.
...

Cancel
With declarative APIs, users only need to express their desired cluster state, while letting the operator perform all necessary steps to achieve it.
...

Cancel
scaling, backup, restore, and monitoring. An operator should be made up of multiple controllers that specifically handle each of the those features.
...

Cancel
the operator can have a main controller to spawn and manage application instances, a backup controller to handle backup operations, and a restore controller to handle restore operations.
...

Cancel
each controller should correspond to a specific CRD so that the domain of each controller's responsibility is clear.
...

Cancel
If you keep a log for every container, you will likely end up with unmanageable amount of logs.
...

Cancel
integrate application-specific details to the log messages such as adding a prefix for the application name.
...

Cancel
you may have to use external logging tools such as Google Stackdriver, Elasticsearch, Fluentd, or Kibana to perform the aggregations.
...

Cancel
adding labels to metrics to facilitate aggregation and analysis by monitoring systems.
...

Cancel
a more viable option is for application pods to expose a metrics HTTP endpoint for monitoring tools to scrape.
...

Cancel
A good way to achieve this is to use open-source application-specific exporters for exposing Prometheus-style metrics.
...

Cancel

1 - 6 of 6

Showing 20▼ items per page