Drain means the scheduler doesn’t assign new tasks to the node. The
scheduler shuts down any existing tasks and schedules them on an available
node.
Reachable means the node is a manager node participating in the Raft
consensus quorum. If the leader node becomes unavailable, the node is eligible for
election as the new leader.
If a manager node becomes unavailable, you should either join a
new manager node to the swarm or promote a worker node to be a
manager.
DevOps is a set of practices that automates the processes between software development and IT teams, in order that they can build, test, and release software faster and more reliably.
increased trust, faster software releases, ability to solve critical issues quickly, and better manage unplanned work.
bringing together the best of software development and IT operations.
a firm handshake between development and operations
DevOps isn’t magic, and transformations don’t happen overnight.
Infrastructure as code
Culture is the #1 success factor in DevOps.
Building a culture of shared responsibility, transparency and faster feedback is the foundation of every high performing DevOps team.
'not our problem' mentality
DevOps is that change in mindset of looking at the development process holistically and breaking down the barrier between Dev and Ops.
Speed is everything.
Lack of automated test and review cycles block the release to production and poor incident response time kills velocity and team confidence
Open communication helps Dev and Ops teams swarm on issues, fix incidents, and unblock the release pipeline faster.
Unplanned work is a reality that every team faces–a reality that most often impacts team productivity.
“cross-functional collaboration.”
All the tooling and automation in the world are useless if they aren’t accompanied by a genuine desire on the part of development and IT/Ops professionals to work together.
DevOps doesn’t solve tooling problems. It solves human problems.
Forming project- or product-oriented teams to replace function-based teams is a step in the right direction.
sharing a common goal and having a plan to reach it together
join sprint planning sessions, daily stand-ups, and sprint demos.
DevOps culture across every department
open channels of communication, and talk regularly
continuous delivery: the practice of running each code change through a gauntlet of automated tests, often facilitated by cloud-based infrastructure, then packaging up successful builds and promoting them up toward production using automated deploys.
automated deploys alert IT/Ops to server “drift” between environments, which reduces or eliminates surprises when it’s time to release.
“configuration as code.”
when DevOps uses automated deploys to send thoroughly tested code to identically provisioned environments, “Works on my machine!” becomes irrelevant.
A DevOps mindset sees opportunities for continuous improvement everywhere.
regular retrospectives
A/B testing
failure is inevitable. So you might as well set up your team to absorb it, recover, and learn from it (some call this “being anti-fragile”).
Postmortems focus on where processes fell down and how to strengthen them – not on which team member f'ed up the code.
Our engineers are responsible for QA, writing, and running their own tests to get the software out to customers.
How long did it take to go from development to deployment?
How long does it take to recover after a system failure?
service level agreements (SLAs)
Devops isn't any single person's job. It's everyone's job.
DevOps is big on the idea that the same people who build an application should be involved in shipping and running it.
developers and operators pair with each other in each phase of the application’s lifecycle.
in order to keep master a true record of known working production code the actual deployment to production should happen from the feature branch before merging it into master.
This approach works well if we seldom publish results of our work. (Maybe once every 2 weeks).
Aside from promoting ready to deploy master branch and feature branches (same as GitHub Flow) it introduces three other kinds of branches
The generate attribute is used to inform Terragrunt to generate the Terraform code for configuring the backend.
The find_in_parent_folders() helper will automatically search up the directory tree to find the root terragrunt.hcl and inherit the remote_state configuration from it.
Unlike the backend configurations, provider configurations support variables,
if you needed to modify the configuration to expose another parameter (e.g
session_name), you would have to then go through each of your modules to make this change.
instructs Terragrunt to create the file provider.tf in the working directory (where Terragrunt calls terraform)
before it calls any of the Terraform commands
large modules should be considered harmful.
it is a Bad Idea to define all of your environments (dev, stage, prod, etc), or even a large amount of infrastructure (servers, databases, load balancers, DNS, etc), in a single Terraform module.
Large modules are slow, insecure, hard to update, hard to code review, hard to test, and brittle (i.e., you have all your eggs in one basket).
Terragrunt allows you to define your Terraform code once and to promote a versioned, immutable “artifact” of that exact same code from environment to environment.
You can use role-based access control
(RBAC) and other
security mechanisms to make sure that users and workloads can get access to the
resources they need, while keeping workloads, and the cluster itself, secure.
You can set limits on the resources that users and workloads can access
by managing policies and
container resources.
you need to plan how to scale to relieve increased
pressure from more requests to the control plane and worker nodes or scale down to reduce unused
resources.
Managed control plane: Let the provider manage the scale and availability
of the cluster's control plane, as well as handle patches and upgrades.
The simplest Kubernetes cluster has the entire control plane and worker node
services running on the same machine.
You can deploy a control plane using tools such
as kubeadm, kops, and kubespray.
Secure communications between control plane services
are implemented using certificates.
Certificates are automatically generated
during deployment or you can generate them using your own certificate authority.
Separate and backup etcd service: The etcd services can either run on the
same machines as other control plane services or run on separate machines
Create multiple control plane systems: For high availability, the
control plane should not be limited to a single machine
Some deployment tools set up Raft
consensus algorithm to do leader election of Kubernetes services. If the
primary goes away, another service elects itself and take over.
Groups of zones are referred to as regions.
if you installed with kubeadm, there are instructions to help you with
Certificate Management
and Upgrading kubeadm clusters.
Production-quality workloads need to be resilient and anything they rely
on needs to be resilient (such as CoreDNS).
Add nodes to the cluster: If you are managing your own cluster you can
add nodes by setting up your own machines and either adding them manually or
having them register themselves to the cluster’s apiserver.
Set up node health checks: For important workloads, you want to make sure
that the nodes and pods running on those nodes are healthy.
Authentication: The apiserver can authenticate users using client
certificates, bearer tokens, an authenticating proxy, or HTTP basic auth.
Authorization: When you set out to authorize your regular users, you will probably choose
between RBAC and ABAC authorization.
Role-based access control (RBAC): Lets you
assign access to your cluster by allowing specific sets of permissions to authenticated users.
Permissions can be assigned for a specific namespace (Role) or across the entire cluster
(ClusterRole).
Attribute-based access control (ABAC): Lets you
create policies based on resource attributes in the cluster and will allow or deny access
based on those attributes.
Set limits on workload resources
Set namespace limits: Set per-namespace quotas on things like memory and CPU
Prepare for DNS demand: If you expect workloads to massively scale up,
your DNS service must be ready to scale up as well.