for a lot of people, the name “Docker” itself is synonymous with the word “container”.
Docker created a very ergonomic (nice-to-use) tool for working with containers – also called docker.
docker is designed to be installed on a workstation or server and comes with a bunch of tools to make it easy to build and run containers as a developer, or DevOps person.
containerd: This is a daemon process that manages and runs containers.
runc: This is the low-level container runtime (the thing that actually creates and runs containers).
libcontainer, a native Go-based implementation for creating containers.
Kubernetes includes a component called dockershim, which allows it to support Docker.
Kubernetes prefers to run containers through any container runtime which supports its Container Runtime Interface (CRI).
Kubernetes will remove support for Docker directly, and prefer to use only container runtimes that implement its Container Runtime Interface.
Both containerd and CRI-O can run Docker-formatted (actually OCI-formatted) images, they just do it without having to use the docker command or the Docker daemon.
Docker images, are actually images packaged in the Open Container Initiative (OCI) format.
CRI is the API that Kubernetes uses to control the different runtimes that create and manage containers.
CRI makes it easier for Kubernetes to use different container runtimes
containerd is a high-level container runtime that came from Docker, and implements the CRI spec
containerd was separated out of the Docker project, to make Docker more modular.
CRI-O is another high-level container runtime which implements the Container Runtime Interface (CRI).
The idea behind the OCI is that you can choose between different runtimes which conform to the spec.
runc is an OCI-compatible container runtime.
A reference implementation is a piece of software that has implemented all the requirements of a specification or standard.
runc provides all of the low-level functionality for containers, interacting with existing low-level Linux features, like namespaces and control groups.
"Duet is self hosted so your data is always private, and it's completely brandable so that it matches your business. Best of all, its low one time fee means you will save hundreds over similar software
"
Keepalived is a routing software written in C. The main goal of this project is to provide simple and robust facilities for loadbalancing and high-availability to Linux system and Linux based infrastructures. Loadbalancing framework relies on well-known and widely used Linux Virtual Server (IPVS) kernel module providing Layer4 loadbalancing. Keepalived implements a set of checkers to dynamically and adaptively maintain and manage loadbalanced server pool according their health. On the other hand high-availability is achieved by VRRP protocol. VRRP is a fundamental brick for router failover. In addition, Keepalived implements a set of hooks to the VRRP finite state machine providing low-level and high-speed protocol interactions. Keepalived frameworks can be used independently or all together to provide resilient infrastructures.
Serverless was first used to describe applications that significantly or fully
depend on 3rd party applications / services (‘in the cloud’) to manage server-side
logic and state.
‘rich client’ applications (think single page
web apps, or mobile apps) that use the vast ecosystem of cloud accessible
databases (like Parse, Firebase), authentication services (Auth0, AWS Cognito),
etc.
Serverless can also mean applications where some amount of server-side logic
is still written by the application developer but unlike traditional architectures
is run in stateless compute containers that are event-triggered, ephemeral (may
only last for one invocation), and fully managed by a 3rd party.
‘Functions as a service
AWS Lambda is one of the most popular implementations of FaaS at present,
A good example is
Auth0 - they started initially with BaaS ‘Authentication
as a Service’, but with Auth0 Webtask they are entering the
FaaS space.
a typical ecommerce app
a backend data-processing service
with zero administration.
FaaS offerings do not require coding to a specific framework or
library.
Horizontal scaling is completely automatic, elastic, and managed by the
provider
Functions in FaaS are triggered by event types defined by the provider.
a FaaS-supported message broker
from a
deployment-unit point of view FaaS functions are stateless.
allowed the client direct access to a
subset of our database
deleted the authentication logic in the original application and have
replaced it with a third party BaaS service
The client is in fact well on its way to becoming a Single Page Application.
implement a FaaS function that responds to http requests via an
API Gateway
port the search code from the Pet Store server to the Pet Store Search
function
replaced a long lived consumer application with a
FaaS function that runs within the event driven context
server
applications - is a key difference when comparing with other modern
architectural trends like containers and PaaS
the only code that needs to
change when moving to FaaS is the ‘main method / startup’ code, in that it is
deleted, and likely the specific code that is the top-level message handler
(the ‘message listener interface’ implementation), but this might only be a change
in method signature
With FaaS you need to write the function ahead of time to assume parallelism
Most providers also allow functions to be triggered as a response to inbound
http requests, typically in some kind of API gateway
you should assume that for any given
invocation of a function none of the in-process or host state that you create
will be available to any subsequent invocation.
FaaS
functions are either naturally stateless
store
state across requests or for further input to handle a request.
certain classes of long lived task are not suited to FaaS
functions without re-architecture
if you were writing a
low-latency trading application you probably wouldn’t want to use FaaS systems
at this time
An
API Gateway is an HTTP server where routes / endpoints are defined in
configuration and each route is associated with a FaaS function.
API
Gateway will allow mapping from http request parameters to inputs arguments
for the FaaS function
API Gateways may also perform authentication, input validation,
response code mapping, etc.
the Serverless Framework makes working
with API Gateway + Lambda significantly easier than using the first principles
provided by AWS.
Apex - a project to
‘Build, deploy, and manage AWS Lambda functions with ease.'
'Serverless'
to mean the union of a couple of other ideas - 'Backend as a Service' and
'Functions as a Service'.
The broker and MQTT act as a simple, common
interface for everything to connect to
Messages in MQTT are published on topics
no need to
configure a topic, publishing on it is enough
Topics are treated as a
hierarchy, using a slash (/) as a separator.
Clients can receive messages by creating subscriptions
A
subscription may be to an explicit topic
Two
wildcards are available, + or #.
# can be used as a wildcard for all remaining levels of
hierarchy
+ can be used as a wildcard for a single level
of hierarchy
Zero length topic levels are valid, which can lead to some
slightly non-obvious behaviour.
The QoS
defines how hard the broker/client will try to ensure that a message is
received.
Messages may be sent at any QoS level, and clients may
attempt to subscribe to topics at any QoS level
the
client chooses the maximum QoS it will receive
if
a client is subscribed with QoS 2 and a message is published on QoS 0,
the client will receive it on QoS 0.
1: The broker/client will deliver the message at least once, with confirmation required.
All messages may be set to be retained.
the
broker will keep the message even after sending it to all current
subscribers
useful as a "last known good" mechanism
If clean session is set
to false, then the connection is treated as durable
when the client disconnects, any subscriptions it has will remain and
any subsequent QoS 1 or 2 messages will be stored until it connects
again in the future
If clean session is true, then all subscriptions
will be removed for the client when it disconnects
Containers are an instance of the Docker Image you specify and the first image listed in your configuration is the primary container image in which all steps run.
In this example, all steps run in the container created by the first image listed under the build job
If you experience increases in your run times due to installing additional tools during execution, it is best practice to use the Building Custom Docker Images Documentation to create a custom image with tools that are pre-loaded in the container to meet the job requirements.
Workloads running in a Docker service that require access to low latency/high
IOPs persistent storage, such as a database engine, can use a relocatable
Cloudstor volume backed by EBS.
Each relocatable Cloudstor volume is backed by a single EBS volume.
If a swarm task using a relocatable Cloudstor volume gets rescheduled to
another node within the same availability zone as the original node where the
task was running, Cloudstor detaches the backing EBS volume from the original
node and attaches it to the new target node automatically.
in a different availability zone,
Cloudstor transfers the contents of the backing EBS volume to the destination
availability zone using a snapshot, and cleans up the EBS volume in the
original availability zone.
Typically the snapshot-based transfer process across availability zones takes
between 2 and 5 minutes unless the work load is write-heavy.
A swarm task is
not started until the volume it mounts becomes available
Sharing/mounting the same Cloudstor volume backed by EBS among multiple tasks
is not a supported scenario and leads to data loss.
a Cloudstor
volume to share data between tasks, choose the appropriate EFS backed shared
volume option.
When multiple swarm service tasks need to share data in a persistent storage
volume, you can use a shared Cloudstor volume backed by EFS.
a volume and
its contents can be mounted by multiple swarm service tasks without the risk of
data loss
over NFS
the
persistent data backed by EFS volumes is always available.
shared Cloudstor
volumes only work in those AWS regions where EFS is supported.
globalLock.currentQueue.total: This number can indicate a possible concurrency issue if it’s consistently high. This can happen if a lot of requests are waiting for a lock to be released.
globalLock.totalTime: If this is higher than the total database uptime, the database has been in a lock state for too long.
Unlike relational databases such as MySQL or PostgreSQL, MongoDB uses JSON-like documents for storing data.
Databases operate in an environment that consists of numerous reads, writes, and updates.
When a lock occurs, no other operation can read or modify the data until the operation that initiated the lock is finished.
locks.deadlockCount: Number of times the lock acquisitions have encountered deadlocks
Is the database frequently locking from queries? This might indicate issues with the schema design, query structure, or system architecture.
For version 3.2 on, WiredTiger is the default.
MMAPv1 locks whole collections, not individual documents.
WiredTiger performs locking at the document level.
When the MMAPv1 storage engine is in use, MongoDB will use memory-mapped files to store data.
All available memory will be allocated for this usage if the data set is large enough.
db.serverStatus().mem
mem.resident: Roughly equivalent to the amount of RAM in megabytes that the database process uses
If mem.resident exceeds the value of system memory and there’s a large amount of unmapped data on disk, we’ve most likely exceeded system capacity.
If the value of mem.mapped is greater than the amount of system memory, some operations will experience page faults.
The WiredTiger storage engine is a significant improvement over MMAPv1 in performance and concurrency.
By default, MongoDB will reserve 50 percent of the available memory for the WiredTiger data cache.
wiredTiger.cache.bytes currently in the cache – This is the size of the data currently in the cache.
wiredTiger.cache.tracked dirty bytes in the cache – This is the size of the dirty data in the cache.
we can look at the wiredTiger.cache.bytes read into cache value for read-heavy applications. If this value is consistently high, increasing the cache size may improve overall read performance.
check whether the application is read-heavy. If it is, increase the size of the replica set and distribute the read operations to secondary members of the set.
write-heavy, use sharding within a sharded cluster to distribute the load.
Replication is the propagation of data from one node to another
Replication sets handle this replication.
Sometimes, data isn’t replicated as quickly as we’d like.
a particularly thorny problem if the lag between a primary and secondary node is high and the secondary becomes the primary
use the db.printSlaveReplicationInfo() or the rs.printSlaveReplicationInfo() command to see the status of a replica set from the perspective of the secondary member of the set.
shows how far behind the secondary members are from the primary. This number should be as low as possible.
monitor this metric closely.
watch for any spikes in replication delay.
Always investigate these issues to understand the reasons for the lag.
One replica set is primary. All others are secondary.
it’s not normal for nodes to change back and forth between primary and secondary.
use the profiler to gain a deeper understanding of the database’s behavior.
Enabling the profiler can affect system performance, due to the additional activity.
"globalLock.currentQueue.total: This number can indicate a possible concurrency issue if it's consistently high. This can happen if a lot of requests are waiting for a lock to be released."
NodePort, by design, bypasses almost all network security in Kubernetes.
NetworkPolicy resources can currently only control NodePorts by allowing or disallowing all traffic on them.
put a network filter in front of all the nodes
if a Nodeport-ranged Service is advertised to the public, it may serve as an invitation to black-hats to scan and probe
When Kubernetes creates a NodePort service, it allocates a port from a range specified in the flags that define your Kubernetes cluster. (By default, these are ports ranging from 30000-32767.)
By design, Kubernetes NodePort cannot expose standard low-numbered ports like 80 and 443, or even 8080 and 8443.
A port in the NodePort range can be specified manually, but this would mean the creation of a list of non-standard ports, cross-referenced with the applications they map to
if you want the exposed application to be highly available, everything contacting the application has to know all of your node addresses, or at least more than one.
non-standard ports.
Ingress resources use an Ingress controller (the nginx one is common but not by any means the only choice) and an external load balancer or public IP to enable path-based routing of external requests to internal Services.
With a single point of entry to expose and secure
get simpler TLS management!
consider putting a real load balancer in front of your NodePort Services before opening them up to the world
Google very recently released an alpha-stage bare-metal load balancer that, once installed in your cluster, will load-balance using BGP
NodePort Services are easy to create but hard to secure, hard to manage, and not especially friendly to others
Kubernetes is all about sharing machines between applications.
sharing machines requires ensuring that two applications do not try to use the
same ports.
Dynamic port allocation brings a lot of complications to the system
Every Pod gets its own IP address
do not need to explicitly
create links between Pods
almost never need to deal with mapping
container ports to host ports.
Pods can be treated much like VMs or physical hosts from the
perspectives of port allocation, naming, service discovery, load balancing,
application configuration, and migration.
pods on a node can communicate with all pods on all nodes without NAT
agents on a node (e.g. system daemons, kubelet) can communicate with all
pods on that node
pods in the host network of a node can communicate with all pods on all
nodes without NAT
If your job previously ran in a VM, your VM had an IP and could
talk to other VMs in your project. This is the same basic model.
containers within a Pod
share their network namespaces - including their IP address
containers within a Pod can all reach each other’s ports on localhost
containers within a Pod must coordinate port usage
“IP-per-pod” model.
request ports on the Node itself which forward to your Pod
(called host ports), but this is a very niche operation
The Pod itself is
blind to the existence or non-existence of host ports.
AOS is an Intent-Based Networking system that creates and manages complex datacenter environments from a simple integrated platform.
Cisco Application Centric Infrastructure offers an integrated overlay and underlay SDN solution that supports containers, virtual machines, and bare metal servers.
AOS Reference Design currently supports Layer-3 connected hosts that eliminate legacy Layer-2 switching problems.
The AWS VPC CNI offers integrated AWS Virtual Private Cloud (VPC) networking for Kubernetes clusters.
users can apply existing AWS VPC networking and security best practices for building Kubernetes clusters.
Using this CNI plugin allows Kubernetes pods to have the same IP address inside the pod as they do on the VPC network.
The CNI allocates AWS Elastic Networking Interfaces (ENIs) to each Kubernetes node and using the secondary IP range from each ENI for pods on the node.
Big Cloud Fabric is a cloud native networking architecture, designed to run Kubernetes in private cloud/on-premises environments.
Cilium is L7/HTTP aware and can enforce network policies on L3-L7
using an identity based security model that is decoupled from network
addressing.
CNI-Genie is a CNI plugin that enables Kubernetes to simultaneously have access to different implementations of the Kubernetes network model in runtime.
CNI-Genie also supports assigning multiple IP addresses to a pod, each from a different CNI plugin.
cni-ipvlan-vpc-k8s contains a set
of CNI and IPAM plugins to provide a simple, host-local, low latency, high
throughput, and compliant networking stack for Kubernetes within Amazon Virtual
Private Cloud (VPC) environments by making use of Amazon Elastic Network
Interfaces (ENI) and binding AWS-managed IPs into Pods using the Linux kernel’s
IPvlan driver in L2 mode.
to be straightforward to configure and deploy within a
VPC
Contiv provides configurable networking
Contrail, based on Tungsten Fabric, is a truly open, multi-cloud network virtualization and policy management platform.
DANM is a networking solution for telco workloads running in a Kubernetes cluster.
Flannel is a very simple overlay
network that satisfies the Kubernetes requirements.
Any traffic bound for that
subnet will be routed directly to the VM by the GCE network fabric.
sysctl net.ipv4.ip_forward=1
Jaguar provides overlay network using vxlan and Jaguar CNIPlugin provides one IP address per pod.
Knitter is a network solution which supports multiple networking in Kubernetes.
Kube-OVN is an OVN-based kubernetes network fabric for enterprises.
Kube-router provides a Linux LVS/IPVS-based service proxy, a Linux kernel forwarding-based pod-to-pod networking solution with no overlays, and iptables/ipset-based network policy enforcer.
If you have a “dumb” L2 network, such as a simple switch in a “bare-metal”
environment, you should be able to do something similar to the above GCE setup.
Multus is a Multi CNI plugin to support the Multi Networking feature in Kubernetes using CRD based network objects in Kubernetes.
NSX-T can provide network virtualization for a multi-cloud and multi-hypervisor environment and is focused on emerging application frameworks and architectures that have heterogeneous endpoints and technology stacks.
NSX-T Container Plug-in (NCP) provides integration between NSX-T and container orchestrators such as Kubernetes
Nuage uses the open source Open vSwitch for the data plane along with a feature rich SDN Controller built on open standards.
OpenVSwitch is a somewhat more mature but also
complicated way to build an overlay network
OVN is an opensource network virtualization solution developed by the
Open vSwitch community.
Project Calico is an open source container networking provider and network policy engine.
Calico provides a highly scalable networking and network policy solution for connecting Kubernetes pods based on the same IP networking principles as the internet
Calico can be deployed without encapsulation or overlays to provide high-performance, high-scale data center networking.
Calico can also be run in policy enforcement mode in conjunction with other networking solutions such as Flannel, aka canal, or native GCE, AWS or Azure networking.
Romana is an open source network and security automation solution that lets you deploy Kubernetes without an overlay network
Weave Net runs as a CNI plug-in
or stand-alone. In either version, it doesn’t require any configuration or extra code
to run, and in both cases, the network provides one IP address per pod - as is standard for Kubernetes.
The network model is implemented by the container runtime on each node.