Pods are the smallest deployable units of computing
A Pod (as in a pod of whales or pea pod) is a group of one or more
containersA lightweight and portable executable image that contains software and all of its dependencies. (such as
Docker containers), with shared storage/network, and a specification
for how to run the containers.
A Pod’s contents are always co-located and
co-scheduled, and run in a shared context.
A Pod models an
application-specific “logical host”
application
containers which are relatively tightly coupled
being executed on the same physical or virtual machine would mean being
executed on the same logical host.
The shared context of a Pod is a set of Linux namespaces, cgroups, and
potentially other facets of isolation
Containers within a Pod share an IP address and port space, and
can find each other via localhost
Containers in different Pods have distinct IP addresses
and can not communicate by IPC without
special configuration.
These containers usually communicate with each other via Pod IP addresses.
Applications within a Pod also have access to shared volumesA directory containing data, accessible to the containers in a pod.
, which are defined
as part of a Pod and are made available to be mounted into each application’s
filesystem.
a Pod is modelled as
a group of Docker containers with shared namespaces and shared filesystem
volumes
Pods are considered to be relatively
ephemeral (rather than durable) entities.
Pods are created, assigned a unique ID (UID), and
scheduled to nodes where they remain until termination (according to restart
policy) or deletion.
it can be replaced by an identical Pod
When something is said to have the same lifetime as a Pod, such as a volume,
that means that it exists as long as that Pod (with that UID) exists.
uses a persistent volume for shared storage between the containers
Pods serve as unit of deployment, horizontal scaling, and
replication
The applications in a Pod all use the same network namespace (same IP and port
space), and can thus “find” each other and communicate using localhost
flat shared networking space
Containers within the Pod see the system hostname as being the same as the configured
name for the Pod.
Volumes enable data to survive
container restarts and to be shared among the applications within the Pod.
Individual Pods are not intended to run multiple instances of the same
application
The individual containers may be
versioned, rebuilt and redeployed independently.
Pods aren’t intended to be treated as durable entities.
Controllers like StatefulSet
can also provide support to stateful Pods.
When a user requests deletion of a Pod, the system records the intended grace period before the Pod is allowed to be forcefully killed, and a TERM signal is sent to the main process in each container.
Once the grace period has expired, the KILL signal is sent to those processes, and the Pod is then deleted from the API server.
grace period
Pod is removed from endpoints list for service, and are no longer considered part of the set of running Pods for replication controllers.
When the grace period expires, any processes still running in the Pod are killed with SIGKILL.
By default, all deletes are graceful within 30 seconds.
You must specify an additional flag --force along with --grace-period=0 in order to perform force deletions.
Force deletion of a Pod is defined as deletion of a Pod from the cluster state and etcd immediately.
StatefulSet Pods
Processes within the container get almost the same privileges that are available to processes outside a container.
All containers are restarted after upgrade, because the container spec hash value is changed.
The upgrade procedure on control plane nodes should be executed one node at a time.
/etc/kubernetes/admin.conf
kubeadm upgrade also automatically renews the certificates that it manages on this node.
To opt-out of certificate renewal the flag --certificate-renewal=false can be used.
for a lot of people, the name “Docker” itself is synonymous with the word “container”.
Docker created a very ergonomic (nice-to-use) tool for working with containers – also called docker.
docker is designed to be installed on a workstation or server and comes with a bunch of tools to make it easy to build and run containers as a developer, or DevOps person.
containerd: This is a daemon process that manages and runs containers.
runc: This is the low-level container runtime (the thing that actually creates and runs containers).
libcontainer, a native Go-based implementation for creating containers.
Kubernetes includes a component called dockershim, which allows it to support Docker.
Kubernetes prefers to run containers through any container runtime which supports its Container Runtime Interface (CRI).
Kubernetes will remove support for Docker directly, and prefer to use only container runtimes that implement its Container Runtime Interface.
Both containerd and CRI-O can run Docker-formatted (actually OCI-formatted) images, they just do it without having to use the docker command or the Docker daemon.
Docker images, are actually images packaged in the Open Container Initiative (OCI) format.
CRI is the API that Kubernetes uses to control the different runtimes that create and manage containers.
CRI makes it easier for Kubernetes to use different container runtimes
containerd is a high-level container runtime that came from Docker, and implements the CRI spec
containerd was separated out of the Docker project, to make Docker more modular.
CRI-O is another high-level container runtime which implements the Container Runtime Interface (CRI).
The idea behind the OCI is that you can choose between different runtimes which conform to the spec.
runc is an OCI-compatible container runtime.
A reference implementation is a piece of software that has implemented all the requirements of a specification or standard.
runc provides all of the low-level functionality for containers, interacting with existing low-level Linux features, like namespaces and control groups.
In this case, Elasticsearch. And because Elasticsearch can be down or struggling, or the network can be down, the shipper would ideally be able to buffer and retry
Logstash is typically used for collecting, parsing, and storing logs for future use as part of log management.
Logstash’s biggest con or “Achille’s heel” has always been performance and resource consumption (the default heap size is 1GB).
This can be a problem for high traffic deployments, when Logstash servers would need to be comparable with the Elasticsearch ones.
Filebeat was made to be that lightweight log shipper that pushes to Logstash or Elasticsearch.
differences between Logstash and Filebeat are that Logstash has more functionality, while Filebeat takes less resources.
Filebeat is just a tiny binary with no dependencies.
For example, how aggressive it should be in searching for new files to tail and when to close file handles when a file didn’t get changes for a while.
For example, the apache module will point Filebeat to default access.log and error.log paths
Filebeat’s scope is very limited,
Initially it could only send logs to Logstash and Elasticsearch, but now it can send to Kafka and Redis, and in 5.x it also gains filtering capabilities.
Filebeat can parse JSON
you can push directly from Filebeat to Elasticsearch, and have Elasticsearch do both parsing and storing.
You shouldn’t need a buffer when tailing files because, just as Logstash, Filebeat remembers where it left off
For larger deployments, you’d typically use Kafka as a queue instead, because Filebeat can talk to Kafka as well
The default syslog daemon on most Linux distros, rsyslog can do so much more than just picking logs from the syslog socket and writing to /var/log/messages.
It can tail files, parse them, buffer (on disk and in memory) and ship to a number of destinations, including Elasticsearch.
rsyslog is the fastest shipper
Its grammar-based parsing module (mmnormalize) works at constant speed no matter the number of rules (we tested this claim).
use it as a simple router/shipper, any decent machine will be limited by network bandwidth
It’s also one of the lightest parsers you can find, depending on the configured memory buffers.
rsyslog requires more work to get the configuration right
the main difference between Logstash and rsyslog is that Logstash is easier to use while rsyslog lighter.
rsyslog fits well in scenarios where you either need something very light yet capable (an appliance, a small VM, collecting syslog from within a Docker container).
rsyslog also works well when you need that ultimate performance.
syslog-ng as an alternative to rsyslog (though historically it was actually the other way around).
a modular syslog daemon, that can do much more than just syslog
Unlike rsyslog, it features a clear, consistent configuration format and has nice documentation.
Similarly to rsyslog, you’d probably want to deploy syslog-ng on boxes where resources are tight, yet you do want to perform potentially complex processing.
syslog-ng has an easier, more polished feel than rsyslog, but likely not that ultimate performance
Fluentd was built on the idea of logging in JSON wherever possible (which is a practice we totally agree with) so that log shippers down the line don’t have to guess which substring is which field of which type.
Fluentd plugins are in Ruby and very easy to write.
structured data through Fluentd, it’s not made to have the flexibility of other shippers on this list (Filebeat excluded).
Fluent Bit, which is to Fluentd similar to how Filebeat is for Logstash.
Fluentd is a good fit when you have diverse or exotic sources and destinations for your logs, because of the number of plugins.
Splunk isn’t a log shipper, it’s a commercial logging solution
Graylog is another complete logging solution, an open-source alternative to Splunk.
everything goes through graylog-server, from authentication to queries.
Graylog is nice because you have a complete logging solution, but it’s going to be harder to customize than an ELK stack.
in rootless mode, both the daemon and the container are running without
root privileges.
Rootless mode does not use binaries with SETUID bits or file capabilities,
except newuidmap and newgidmap, which are needed to allow multiple
UIDs/GIDs to be used in the user namespace.
expose privileged ports (< 1024)
add net.ipv4.ip_unprivileged_port_start=0 to /etc/sysctl.conf (or
/etc/sysctl.d) and run sudo sysctl --system
dockerd-rootless.sh uses slirp4netns
(if installed) or VPNKit as the network stack
by default.
These network stacks run in userspace and might have performance overhead
This error occurs when the number of available entries in /etc/subuid or
/etc/subgid is not sufficient.
This error occurs mostly when the host is running in cgroup v2. See the section
Fedora 31 or later for information on switching the host
to use cgroup v1.
--net=host doesn’t listen ports on the host network namespace
This is an expected behavior, as the daemon is namespaced inside RootlessKit’s
network namespace. Use docker run -p instead.
By default, rootless Podman runs as root within the container.
the processes in the container have the default list of namespaced capabilities which allow the processes to act like root inside of the user namespace
the directory is owned by UID 26, but UID 26 is not mapped into the container and is not the same UID that Postgres runs with while in the container.
Podman launches a container inside of the user namespace, which is mapped with the range of UIDs defined for the user in /etc/subuid and /etc/subgid
The easy solution to this problem is to chown the html directory to match the UID that Postgresql runs with inside of the container.
use the podman unshare command, which drops you into the same user namespace that rootless Podman uses
This setup also means that the processes inside of the container are running as the user’s UID. If the container process escaped the container, the process would have full access to files in your home directory based on UID separation.
SELinux would still block the access, but I have heard that some people disable SELinux.
If you run the processes within the container as a different non-root UID, however, then those processes will run as that UID. If they escape the container, they would only have world access to content in your home directory.
run a podman unshare command, or set up the directories' group ownership as owned by your UID (root inside of the container).
running containers as non-root should always be your top priority for security reasons.
Calico is a networking and network policy provider. Calico supports a flexible set of networking options so you can choose the most efficient option for your situation, including non-overlay and overlay networks, with or without BGP. Calico uses the same engine to enforce network policy for hosts, pods, and (if using Istio & Envoy) applications at the service mesh layer.
Cilium is a networking, observability, and security solution with an eBPF-based data plane. Cilium provides a simple flat Layer 3 network with the ability to span multiple clusters in either a native routing or overlay/encapsulation mode, and can enforce network policies on L3-L7 using an identity-based security model that is decoupled from network addressing. Cilium can act as a replacement for kube-proxy; it also offers additional, opt-in observability and security features.
CoreDNS is a flexible, extensible DNS server which can be installed as the in-cluster DNS for pods.
Kubernetes uses these values to uniquely identify the nodes in the cluster.
Make sure that the br_netfilter module is loaded.
you should ensure net.bridge.bridge-nf-call-iptables is set to 1 in your sysctl config,
kubeadm will not install or manage kubelet or kubectl for you, so you will
need to ensure they match the version of the Kubernetes control plane you want
kubeadm to install for you.
one minor version skew between the
kubelet and the control plane is supported, but the kubelet version may never exceed the API
server version.
Both the container runtime and the kubelet have a property called
"cgroup driver", which is important
for the management of cgroups on Linux machines.