CloudProfs #15: Kubernetes webinars and best practice

Welcome to issue 15 of CloudProfs, sent to subscribers on November 5. If you want to subscribe, you can be visiting here.

WHAT’S BEEN SAID AND DONE IN CLOUD THIS WEEK

Microsoft has launched Azure Container Apps, a new fully managed serverless container service. The service enables users to work without seeing or managing any underlying VMs, orchestrators, or other cloud infrastructure. The goal is for developers to build apps in the language and framework of their choosing, with deployment taken care of thorugh Azure Container Apps. The service ‘addresses specific requirements for microservices including encrypted service to service communication and the independent versioning and scaling of services’. Source.

Knative has released its 1.0 iteration to help developers deploy, run, and manage serverless, cloud-native applications to Kubernetes. The open source project has various capabilities in its 1.0 version, from a pluggable components to let users bring their own logging and monitoring, networking, and service mesh, to being able to run Knative anywhere Kubernetes runs, avoiding vendor lock-in. TriggerMesh, familiar to readers of CloudProfs #12 when it became open source, runs on Knative. Source.

Neo4j has launched a free version of its fully managed graph database, Neo4j Aura. The free tier is limited to one database and a maximum graph size of 50,000 nodes, and 175,000 relationships. The tier is therefore aimed at those developers who want to get started learning graph, prototyping, and doing early development without handing over any credit card details. Neo4j Aura offers various cloud-native functions, including always-on availability and on-demand scalability. Source.

WEBINAR OF THE WEEK 1: OBSERVING KUBERNETES WITH GRAFANA STACK

By Apurva Kadam

TL:DR: When containers and their subsequent orchestration tools came into being, Kubernetes appeared to be the clear winner surpassing all competitors. Its declarative state functions that enable setting up clusters of containers without much intervention were its modus operandi. But this also meant that monitoring Kubernetes became that much more difficult as it continues to operate even if something is wrong within. This webinar gives emphasis to monitoring Kubernetes and making the most out of container telemetric data.

—

Dashboards are one of the most coveted business intelligence tools. Maybe it is to do with the power a user feels sitting behind a screen, monitoring even the slightest changes, and making a difference with the click of a button. Perhaps the most valuable thing that dashboards afford us is observability. In Observing Kubernetes with Grafana Stack, released last month, Catherine Johnson and Eamon Ryan present us with an observability offering from the Grafana stack. This monitoring tool gives users a birds-eye view of all the containers running on Kubernetes.

The webinar starts with an introduction of the Grafana dashboard that provides users with metrics, logs, and traces capabilities with flexible plugins for each. “Grafana is more than just a dashboarding tool, it is a full open-source and composable observability stack”, explains Johnson. Grafana enables users to pull data from different sources while allowing them to deep dive with component-level dashboards visualized as ‘the first pane of glass’. To backtrack data effectively, source links are embedded in the dashboards keeping open source at the heart of its services.

Why is it even important to monitor Kubernetes? Most users know that K8s are difficult to monitor because once they are built, they stay active. So, it becomes easy to miss errors, leaks and problems that may arise within the containers. The cost of not monitoring Kubernetes includes log file buildup and parsing, unrealized optimizations, and limited cloud cost benefits. Managing increasingly complicated applications along with containers that give out a huge amount of data is an overwhelming task. If you are running on Kubernetes, simple clusters generate hundreds of gigabytes of logs, metrics, and traces per day. Making sense of all of this is where the real challenge begins. Further complications arise when different teams pick different plugins for telemetry. Dashboards have made controlling and managing this container data possible.

To capture and assess data from Kubernetes, Grafana offers solutions for the cloud that not only capture metrics, logs, and traces using dashboards but also boast synthetic monitoring and machine learning features. Along with that, the Grafana agent – a bot that provides telemetry support – can be easily setup by the users to effectively monitor Kubernetes. To help the setup process, the company adds open source Kubernetes monitoring mixin to give a baseline for dashboard, recording and alerting rule initial setup. This accompanied with quick start guides for the agent helps expedite the process.

All these offerings are well demonstrated in this webinar to help potential users understand and appreciate them. But depict the important of the correct setup, this video captures a real-life example of a hiccup in the company’s demo environment. This example demonstrates the importance of the machine learning feature that allows for adaptive learning, anomaly detection, and capacity planning. It helps solve real-world problems with simple machine learning models.

Through the glitch in the demo environment, the webinar attempts to help the user understand the importance of monitoring Kubernetes. Even with agents and baselining, it is easy to overlook a component. Having access to all the data in one place to assess any problems with the K8 clusters gives a real fighting chance to users.

TIMESTAMPS
00:00 – Introduction
01:28 – Grafana Introduction
04:39 – Kubernetes Observability Landscape
10:07 – Grafana field engineering environment
15:41 – Adding observability with Grafana cloud and demo
28:06 – Real-life example
37:31 – Next steps
38:21 – Q&A

—

RESEARCH REPORT: CLOUD SPECTATOR BLOCK STORAGE BENCHMARK

A new report from Cloud Spectator has assessed Amazon Web Services, DigitalOcean, Google Cloud Platform, Linode, Microsoft Azure and Vultr in a cloud block storage benchmark report.

The report concluded that NVMe-based block storage ‘provided the best price-performance ratio across a wide range of infrastructure providers when compared to traditional SSD-based block storage.’

Cloud Spectator has predominantly focused its research on price-performance comparison. In this report, it offers a comprehensive block storage performance assessment, as well as including a general CPU performance overview. The report tested two block storage volume sizes for each VM to get a detailed look at storage performance. The testing was performed in a North American data center for each provider.

All VMs went through the same setup process, involving updating all packages and then rebooting, followed by entire disk partitioning if needed. Each VM was left alone for one hour after mounting each storage device, and subsequently tested ‘as is’, with no kernel or OS optimizations applied.

For 4-CPU Dedicated Block Storage IOPS Analysis (Read), GCP scored the best in the 500GB volume group with a rating of 8.60 – a lower score means more consistency. Linode (34.79), DigitalOcean (36.22) and Azure (59.98) also scored well, ahead of AWS (1262.05). At 1TB the results were a lot closer – GCP again topped the list at 8.48, ahead of AWS (16.75). For Write, where a higher score was better, Linode was top in both 500GB (9.332) and 1TB (9.299).

For CPU consistency, DigitalOcean came out on top with a score of 2.84. This was ahead of AWS (6.18), GCP (9.76), Linode (11.68) and Azure (16.06).

When it came to 1-CPU Shared Block Storage IOPS Analysis (Read), looking at random read performance consistency GCP and Linode offered best performance across both volume size groups (100GB and 500GB). At 100GB, GCP scored best at 15.42, followed by Linode at 24.25. At 500GB, Vultr’s shared 1-2 500GB NVMe storage performed better at 22.12, but still behind Linode (18.87) and GCP (20.85).

For Write, Vultr and GCP did extremely well when measuring consistency for write performance. Vultr scored 108.46 for its NVMe, compared with GCP at 110.11. In comparison, AWS and Azure scored 1344.89 and 1364.25 respectively. Regarding database performance consistency, unsurprisingly the three largest providers offered the most consistent results. GCP topped the bill scoring 53, ahead of AWS (69) and Azure (87).

“Some of the smaller, alternative providers delivered nearly double the performance per dollar compared to the larger, well-known clouds,” Cloud Spectator concluded. Linode and DigitalOcean delivered 200% better database performance per dollar spent than AWS, Azure, and GCP. “Amazon and Microsoft block storage offerings consistently underperformed all others in this benchmark cohort,” the report added.

You can read the full benchmark analysis here (no email signup required).

WEBINAR OF THE WEEK 2: BULLETPROOFING YOUR K8S BUILD

By Apurva Kadam

TL:DR. Containers make deployment of granular resources possible, but if the specifications are not considered then they can add to costs as well as risks. This webinar aims to educate the audience about the effect of resource specification in containers and cloud environments.

—

The game of Tetris teaches an important lesson on the principle of optimization. Kelsey Hightower agrees, as exhibited in this PuppetConf webinar. In the context of containers, small decisions made while deploying Kubernetes may have adverse effects on the success of the cloud. The choices that are made during the initial phases trickle down to glitches in functionality, higher costs at scale, and lower productivity.

In the Bulletproofing Your K8s Build webinar, broadcast on October 20, Andrew Hillier and Chuck Tatham provide insight on how to set the resources in the right way. This online seminar is part of Densify’s offering for optimizing your cloud.

Hillier, the CTO and co-founder of Densify, explains why it is important to start managing your K8s CPU and memory quotas, limits, and requests operationally.

The webinar starts with a macro-level view of resource optimization moving from a virtual to a cloud environment. In this shift, the focus of the customer is on the bill as containers on the cloud deploy granular resources. The elasticity within the structure has given rise to a micro-purchasing phenomenon. “As we move from virtual to cloud and containers, the complexity increases and the number of things that need to be managed go up”, says Hillier.

In turn, there needs to be a change in the way that resources are managed. The capacity management model must move from being static and periodic to dynamic, effectively responding to all granular operations requests. In that view, Densify estimates that most resourcing decisions will be automated within the next five years.

App owners and cloud engineers build and deploy codes with the utmost due diligence. A lot of steps in this phase include verification and version management. But the same attention to detail is not paid when it comes to the resource specifications within the containers. In most cases, resource requests and limits are left to estimation. Even the policies and software requirements for resources are not specific and the users end up employing more resources than required. While deploying Kubernetes, resource allocation is particularly flawed. The utilization pattern and resource request values do not match. In virtually every environment, 80-90% of node resources are deployed but only 7% are utilized resulting in the clusters going idle. The stranded capacity leads to higher costs to the companies. Customers are led to believe that running containers is expensive, but the cost is a result of mismanagement.

Risk is another aspect worth considering. Under-deploying resources may trigger a container malfunction. Containers will continue to run exhibiting undetectable erratic behavior. Hillier details the impact of incorrect container resource specifications. He considers K8 resources such as CPU and memory requests, and CPU and memory limits and the effects of setting them too small or big or not setting them at all. The adverse effects include CPU throttling, pod termination, OOM killers, process termination, noisy neighbor risk, and stranded resources.

Since container environments run on an Infrastructure-as-code, fundamental changes happen at the coding level. Therefore, the app owners and cloud engineers must agree about the specifications. Densify generates reports that inform users about policy requirements, utilization, and financial impact of the proposed change while automating optimization at the coding -level. In conclusion, a machine learning policy-driven analysis similar to the one Densify offers, is necessary to make the most out of your containers.

Analysis-powered optimization of container resources will not only bring down costs but mitigate unforeseen risks. This webinar is effective in educating the audience about risks and how a provider such as Densify can help them resolve resource optimization issues.

Timestamps:

00:12 – 04:50 Macro-view of resource optimization, app deployment
04:51 – 07:53 Deploying containers
07:54 – 09:05 Specifying container resources
09:06 – 10:45 Cost problems of incorrect specifications
10:46 – 12:14 Viewer poll on costs
12:15 – 14:16 Container optimization example
14:17 – 17:04 Memory risk example
17:05 – 22:33 Impact of incorrect container resource specifications
22:34 – 24:11 Vendor policies
24:12 – 29:50 Bulletproofing your K8s with Densify
29:51 – 31:02 Viewer Poll 2
31:03 – 37:12 Summary

SECRET KNOWLEDGE

Fn: An event-driven, open source, functions as a service (FaaS) compute platform that you can run anywhere. Primary language: Go (97.3%)

GCP Sketch Note: Every product in the Google Cloud family described in the visual sketchnote format to grasp the capability of the tools quickly and easily.

Kubernetes Marketplace: A marketplace of Kubernetes applications available for quick and easy installation into Civo Kubernetes clusters. Civo is a cloud-native service provider powered only by Kubernetes.

Puccini: Deliberately stateless cloud topology management and deployment tools based on TOSCA. Latest release: v0.19.1 (Nov 1). Primary language: Go (91.5%)

Pulumi: Simply write code in your favorite language and Pulumi automatically provisions and manages your AWS, Azure, Google Cloud Paltform, and/or Kubernetes resources, using an infrastructure as code approach. Latest release: v3.17.0 (Nov 3).

Rover: Interactive Terraform visualization. State and configuration explorer. Latest release: v0.2.2 (Oct 2). Primary language: Go (50.7%), Vue (43.3%)

Traefik: The cloud-native application proxy. A modern HTTP reverse proxy and load balancer that makes deploying microservices easy. Traefik integrates with existing infrastructure components (Docker, Kubernetes, Consul, Amazon ECS) and configures itself automatically and dynamically. Latest release: v1.7.33 (Oct 7). Primary language: Go (91.8%)

Get the best cloud content in a 10-minute digest