Skip to main content

Home/ Larvata/ Group items tagged service

Rss Feed Group items tagged

張 旭

Improving Kubernetes reliability: quicker detection of a Node down | Fatal failure - 0 views

  • when a Node gets down, the pods of the broken node are still running for some time and they still get requests, and those requests, will fail.
  • 1- The Kubelet posts its status to the masters using –node-status-update-frequency=10s 2- A node dies 3- The kube controller manager is the one monitoring the nodes, using –-node-monitor-period=5s it checks, in the masters, the node status reported by the Kubelet. 4- Kube controller manager will see the node is unresponsive, and has this grace period –node-monitor-grace-period=40s until it considers the node unhealthy.
  • node-status-update-frequency x (N-1) != node-monitor-grace-period
  • ...2 more annotations...
  • 5- Once the node is marked as unhealthy, the kube controller manager will remove its pods based on –pod-eviction-timeout=5m0s
  • 6- Kube proxy has a watcher over the API, so the very first moment the pods are evicted the proxy will notice and update the iptables of the node, removing the endpoints from the services so the failing pods won’t be accessible anymore.
張 旭

HowTo/LDAP - FreeIPA - 0 views

  • The basedn in an IPA installation consists of a set of domain components (dc) for the initial domain that IPA was configured with.
  • You will only ever have one basedn, the one defined during installation.
  • find your basedn, and other interesting things, in /etc/ipa/default.conf
  • ...8 more annotations...
  • IPA uses a flat structure, storing like objects in what we call containers.
  • Users: cn=users,cn=accounts,$SUFFIX Groups: cn=groups,cn=accounts,$SUFFIX
  • Do not use the Directory Manager account to authenticate remote services to the IPA LDAP server. Use a system account
  • The reason to use an account like this rather than creating a normal user account in IPA and using that is that the system account exists only for binding to LDAP. It is not a real POSIX user, can't log into any systems and doesn't own any files.
  • This use also has no special rights and is unable to write any data in the IPA LDAP server, only read.
  • When possible, configure your LDAP client to communicate over SSL/TLS.
  • The IPA CA certificate can be found in /etc/ipa/ca.crt
  • /etc/openldap/ldap.conf
張 旭

Full Cycle Developers at Netflix - Operate What You Build - 1 views

  • Researching issues felt like bouncing a rubber ball between teams, hard to catch the root cause and harder yet to stop from bouncing between one another.
  • In the past, Edge Engineering had ops-focused teams and SRE specialists who owned the deploy+operate+support parts of the software life cycle
  • hearing about those problems second-hand
  • ...17 more annotations...
  • devs could push code themselves when needed, and also were responsible for off-hours production issues and support requests
  • What were we trying to accomplish and why weren’t we being successful?
  • These specialized roles create efficiencies within each segment while potentially creating inefficiencies across the entire life cycle.
  • Grouping differing specialists together into one team can reduce silos, but having different people do each role adds communication overhead, introduces bottlenecks, and inhibits the effectiveness of feedback loops.
  • devops principles
  • develops a system also be responsible for operating and supporting that system
  • Each development team owns deployment issues, performance bugs, capacity planning, alerting gaps, partner support, and so on.
  • Those centralized teams act as force multipliers by turning their specialized knowledge into reusable building blocks.
  • Communication and alignment are the keys to success.
  • Full cycle developers are expected to be knowledgeable and effective in all areas of the software life cycle.
  • ramping up on areas they haven’t focused on before
  • We run dev bootcamps and other forms of ongoing training to impart this knowledge and build up these skills
  • “how can I automate what is needed to operate this system?”
  • “what self-service tool will enable my partners to answer their questions without needing me to be involved?”
  • A full cycle developer thinks and acts like an SWE, SDET, and SRE. At times they create software that solves business problems, at other times they write test cases for that, and still other times they automate operational aspects of that system.
  • the need for continuous delivery pipelines, monitoring/observability, and so on.
  • Tooling and automation help to scale expertise, but no tool will solve every problem in the developer productivity and operations space
張 旭

How to configure a Kubernetes Multi-Pod Deployment - Stack Overflow - 0 views

  • A Deployment is meant to represent a single group of PODs fulfilling a single purpose together.
  • Deployments are meant to contain stateless services. If you need to store a state you need to create StatefulSet instead
  •  
    "A Deployment is meant to represent a single group of PODs fulfilling a single purpose together."
張 旭

Tagging AWS resources - AWS General Reference - 0 views

  • assign metadata to your AWS resources in the form of tags.
  • a user-defined key and value
  • Tag keys are case sensitive.
  • ...17 more annotations...
  • tag values are case sensitive.
  • Tags are accessible to many AWS services, including billing.
  • personally identifiable information (PII)
  • apply it consistently across all resource types.
  • Use automated tools to help manage resource tags.
  • Use too many tags rather than too few tags.
  • Tag policies let you specify tagging rules that define valid key names and the values that are valid for each key.
  • Name – Identify individual resources
  • Environment – Distinguish between development, test, and production resources
  • Project – Identify projects that the resource supports
  • Owner – Identify who is responsible for the resource
  • Each resource can have a maximum of 50 user created tags.
  • For each resource, each tag key must be unique, and each tag key can have only one value.
  • Tag keys and values are case sensitive.
  • decide on a strategy for capitalizing tags, and consistently implement that strategy across all resource types.
  • AWS Cost Explorer and detailed billing reports let you break down AWS costs by tag.
  • An effective tagging strategy uses standardized tags and applies them consistently and programmatically across AWS resources.
  •  
    "assign metadata to your AWS resources in the form of tags."
張 旭

The Squeaky Blog | Why we don't use a staging environment - 0 views

  • Pre-live environments are never at parity with production
  • multiple people use staging to validate their changes before release.
  • Branches are then constantly out of sync with each other, and problems often surface when you merge, rebase, and backfill hotfixes.
  • ...10 more annotations...
  • Big Bang releases
  • there is a lengthy suite of tests and checks that run before it is deployed to staging. During this period, which could end up being hours, engineers will likely pick up another task. I’ve seen people merge, and then forget that their changes are on staging, more times than I can count.
  • only merge code that is ready to go live
  • written sufficient tests and have validated our changes in development.
  • All branches are cut from main, and all changes get merged back into main.
  • If we ever have an issue in production, we always roll forward.
  • Feature flags can be enabled on a per-user basis so we can monitor performance and gather feedback
  • Experimental features can be enabled by users in their account settings.
  • we have monitoring, logging, and alarms around all of our services. We also blue/green deploy, by draining and replacing a percentage of containers.
  • Dropping your staging environment in favour of true continuous integration and deployment can create a different mindset for shipping software.
  •  
    "Pre-live environments are never at parity with production "
張 旭

Configuring a cgroup driver | Kubernetes - 0 views

  • the systemd driver is recommended for kubeadm based setups instead of the cgroupfs driver, because kubeadm manages the kubelet as a systemd service.
張 旭

Installing Addons | Kubernetes - 0 views

  • Calico is a networking and network policy provider. Calico supports a flexible set of networking options so you can choose the most efficient option for your situation, including non-overlay and overlay networks, with or without BGP. Calico uses the same engine to enforce network policy for hosts, pods, and (if using Istio & Envoy) applications at the service mesh layer.
  • Cilium is a networking, observability, and security solution with an eBPF-based data plane. Cilium provides a simple flat Layer 3 network with the ability to span multiple clusters in either a native routing or overlay/encapsulation mode, and can enforce network policies on L3-L7 using an identity-based security model that is decoupled from network addressing. Cilium can act as a replacement for kube-proxy; it also offers additional, opt-in observability and security features.
  • CoreDNS is a flexible, extensible DNS server which can be installed as the in-cluster DNS for pods.
  • ...1 more annotation...
  • The node problem detector runs on Linux nodes and reports system issues as either Events or Node conditions.
張 旭

chaifeng/ufw-docker: To fix the Docker and UFW security flaw without disabling iptables - 0 views

  • It requires to disable docker's iptables function first, but this also means that we give up docker's network management function.
  • This causes containers will not be able to access the external network.
  • such as -A POSTROUTING ! -o docker0 -s 172.17.0.0/16 -j MASQUERADE. But this only allows containers that belong to network 172.17.0.0/16 can access outside.
  • ...13 more annotations...
  • Don't need to disable Docker's iptables and let Docker to manage it's network.
  • The public network cannot access ports that published by Docker.
  • In a very convenient way to allow/deny public networks to access container ports without additional software and extra configurations
  • Enable Docker's iptables feature. Remove all changes like --iptables=false , including configuration file /etc/docker/daemon.json
  • Modify the UFW configuration file /etc/ufw/after.rules
  • There may be some unknown reasons cause the UFW rules will not take effect after restart UFW, please reboot servers.
  • If we publish a port by using option -p 8080:80, we should use the container port 80, not the host port 8080
  • allow the private networks to be able to visit each other.
  • The following rules block connection requests initiated by all public networks, but allow internal networks to access external networks.
  • Since the UDP protocol is stateless, it is not possible to block the handshake signal that initiates the connection request as TCP does.
  • For GNU/Linux we can find the local port range in the file /proc/sys/net/ipv4/ip_local_port_range. The default range is 32768 60999
  • It not only exposes ports of containers but also exposes ports of the host.
  • Cannot expose services running on hosts and containers at the same time by the same command.
  •  
    "It requires to disable docker's iptables function first, but this also means that we give up docker's network management function."
張 旭

我做系统架构的一些原则 | 酷 壳 - CoolShell - 0 views

  • 如果不说收益,只是为了技术而技术,而没有任何意义。
  • 有计划和无计划的停机做相应的解决方案
  • 经常不断的 human error
  • ...35 more annotations...
  • 运维又会分成基础运维和应用运维,开发则会分成基础核心开发和业务开发。
  • 基础运维和开发的同学更多的只是关注资源的利用率和性能,而应用运维和业务开发则更多关注的是应用和服务上的东西。
  • 有一些系统已经说不清楚是基础层的还是应用层的了,比如像服务治理上的东西,里面即有底层基础技术,也需要业务的同学来配合,包括 k8s 也样,里面即有底层的如网络这样的技术,也有需要业务配合的 readniess和 liveness 这样的健康检查,以及业务应用需要 configMap 等等 ……
  • 试想一下城市交通的优化,当城市规模到一定程度的时候,整体的性能你是无法通过优化几条路或是几条街区来完成的,你需要对整个城市做整体的功能体的规划才可能达到整体效率的提升
  • 当系统越来越复杂的时候,用户把他们的  PHP,Python, .NET,或 Node.js 的架构完全都迁移到 Java + Go 的架构上来的案例不断的发生。
  • 更为工业化的技术
  • 使用更为成熟更为工业化的技术栈,而不是自己熟悉的技术栈
  • 不要自己发明轮子,更不要魔改
  • 完全没有必要。不重新发明轮子,不魔改,不是因为自己技术不能,而是因为,这个世界早已不是自己干所有事的年代了
  • 好些公司的架构都被技术负责人个人的喜好、擅长和个人经验给绑架了,完全不是从一个客观的角度来进行技术选型
  • 全中国所有的电商平台,几百家银行,三大电信运营商,所有的保险公司,劵商的系统,医院里的系统,电子政府系统,等等,基本都是用 Java 开发的,包括 AWS 的主流语言也是 Java
  • NoSQL 的数据库在 Join 上都表现的太差
  • 为了不做 Join 就开始冗余数据,然而自己又维护不好冗余数据后带来的数据一致性的问题,导致数据上的各种错乱丢失。
  • 永远使用完备支持 ACID 的关系型数据库
  • 性能上的事,总是有解的,手段也是最多的,这个比起架构的完备性和扩展性来说真的不必太过担心。
  • 很多公司的系统既没有服从业界标准,也没有形成自己公司的标准,感觉就像一群乌合之众一样。
  • 最典型的例子就是 HTTP 调用的状态返回码。业内给你的标准是 200表示成功,3xx 跳转,4xx 表示调用端出错,5xx 表示服务端出错,我实在是不明白为什么无论成功和失败大家都喜欢返回 200,然后在 body 里指出是否error
  • Restful API 的规范。我觉得是非常重要的,这里给两个我觉得写得最好的参考:Paypal 和 Microsoft 。
  • 监控系统宁可自己死了也不能干扰实际应用。
  • 一个公司至少一年要有一次软件版本升级的review,然后形成软件版本的统一和一致
  • 架构和软件不是写好就完的,是需要不断修改不断维护的,80%的软件成本都是在维护上。
  • 通过服务发现或服务网关来降低服务依赖所带来的运维复杂度
  • 一定要使用各种软件设计的原则。比如:像SOLID这样的原则(参看《一些软件设计的原则》),IoC/DIP,SOA 或 Spring Cloud 等 架构的最佳实践(参看《SteveY对Amazon和Google平台的吐槽》中的 Service Interface 的那几条军规),分布式系统架构的相关实践(参看:《分布式系统的事务处理》,或微软件的 《Cloud Design Patterns》)……等等
  • 没有自动化测试,没有好的软件文档,没有质量好的代码,没有标准和规范
  • 以前欠下的技术债,都得要还,没打好的地基要重新打,没建配套设施都要建。这些基础设施如果不按照正确科学的方式建立的话,你是不可能有一个好的的系统
  • 与其花大力气迁就技术债务,不如直接还技术债
  • 建设没有技术债的“新城区”,并通过“防腐层 ”的架构模型,不要让技术债侵入“新城区”。
  • 如果有一天你在做技术决定的时候,开始凭自己以往的经验,那么你就已经不可能再成长了。
  • 做任何决定之前,最好花上一点时间,上网查一下相关的资料,技术博客,文章,论文等 ,同时,也看看各个公司,或是各个开源软件他们是怎么做的?然后,比较多种方案的 Pros/Cons,最终形成自己的决定
  • 对于 X-Y 问题,也就是说,用户为了解决 X问题,他觉得用 Y 可以解,于是问我 Y 怎么搞,结果搞到最后,发现原来要解决的 X 问题,这个时候最好的解决方案不是 Y,而是 Z。
  • 我很喜欢追问为什么 ,这种追问,会让客户也跟着来一起重新思考。
  • 激进并不是瞎搞,也不是见新技术就上,而是积极拥抱会改变未来的新技术
  • 不是不喜欢的就不学了,我对区块链和 Rust 我一样学习,我也知道这些技术的优势,但我不会大规模使用它们。
  • 进步永远来自于探索,探索是要付出代价的,但是收益更大。
  • 不敢冒险才是最大的冒险,不敢犯错才是最大的错误,害怕失去会让你失去的更多
« First ‹ Previous 121 - 130 of 130
Showing 20 items per page