CloudProfs 6: Azure containers, kubelet and multi-cloud goliaths

Welcome! This is the sixth edition of CloudProfs, sent to subscribers on September 3. See the email in-browser here.

If you enjoyed this newsletter, why not sign up to receive it in your inbox every week? Or if you have any feedback, email the editor.

What’s Been Said and Done in Cloud This Week

A large learning curve remains in organisations’ understanding and expertise in continuous integration and delivery (CI/CD). The findings come from a study by CloudBolt Software, which polled more than 200 senior executives who had ‘specific knowledge’ of CI/CD and infrastructure as code (IaC) within their organisations. Only 4% of companies polled considered themselves experts in CI/CD, while more than two thirds (69%) admit taking days or weeks to deploy a single CI/CD pipeline. With regard to IaC tooling, only 2% of respondents said they deployed a majority of their infrastructure with Terraform. Why? Complexity. 90% of those polled said Terraform drives up the need for custom integrations with other tools and technologies. Read the full report here.

Google has committed to a €1 billion ($) investment in Germany, including a new cloud region in Berlin. The new build adds to Google Cloud’s presence in Frankfurt, and to the 27 Google Cloud regions worldwide. In Frankfurt itself, Google Cloud is committing to another cloud facility, which is promised to be fully operational in 2022. Keynote customers based in Germany include BMG, Delivery Hero, and Deutsche Bank. Elsewhere, AWS hit a partial outage for six hours in its Tokyo region this week, causing issues for banks and airlines among other businesses, as reported by the Japan Times. The fault was related to the AWS Direct Connect networking too. “Between 3:30 PM and 9:42 PM PDT we experienced elevated packet loss for customers connecting to AWS services within AP-NORTHEAST-1 Region through their Direct Connect connections,” an AWS status update read (RSS feed). “This was caused by the loss of several core networking devices that are used to connect Direct Connect network traffic to all Availability Zones in the AP-NORTHEAST-1 Region.”

Chris Oliver, network architect at NI, explains what you need to do as an architect if you’re exploring multi-cloud in a Day 2 Cloud podcast this week. “It’s a great learning experience to dig into each cloud’s native offerings,” he said. “At least keep your finger on what’s going on there. But if you just need to get things online quickly, and if you don’t really have a decent control of what cloud you might be in, then definitely do to the third-party market and look at the different offerings out there. That [will] help you stitch and manage the pieces together, as you’re really going to be chasing a lot, trying to figure out some of these pieces in it, besides your traditional firewalls and Cisco CSR routers and SD-WAN platforms.

“To stitch cloud-to-cloud, the native function, I don’t know how you would actually achieve it. [It’s a] chicken before the egg problem, you usually have to know the destination, but you can’t really build an IPsec tunnel between two clouds, because both ends have a ‘what’s the destination?’ It doesn’t tell you the destination until you start the process.”

Read a transcript of the podcast here (.txt file) or if you prefer listening to the podcast, find it here. Note there may be transcription errors in the document.

BONUS RESOURCE: Take a look at this new interactive cloud data map from venture capital firm Greylock. The map, called Castles in the Cloud, collates the number of cloud services providers, the power of AWS, Microsoft and Google Cloud, and opportunities for innovation. Unsurprisingly, AI/ML is the most densely populated cloud services market, ahead of analytics, developer tools, and management and governance. Greylock adds it will periodically publish essays on tactics to compete with the hyperscalers. Who is this resource useful for? C-suite, startup/sole traders, or anyone with an interest in the business of cloud computing.

An Overview of Azure Containers

By Sjoukje Zaal

“It works on my machine” – a phrase that every developer has uttered, at least once in his or her career.

Containers are the solution to the problem of how to get software run reliably when moved between computing environments. From a developer laptop to production environments, or from on-premises environments to cloud environments. By packaging all the required dependencies together with the application, you ensure that it can run in every environment. That is one of the reasons that container deployment is so successful nowadays. They also give you the portability to run exactly the same workloads across different cloud providers. You see that a lot of enterprises are heavily investing in multi-cloud environments. Gartner forecasts that by 2022, more than 75% of global organizations will be running containerized applications in production.

Microsoft Azure provides different solutions to deploy and run applications in Azure. In this article I will dive briefly into the various offerings that Microsoft provides for running container deployments and when to use what for your workloads.

—

Read the full article ‘An Overview of Azure Containers’ here.

The Week in K8s: The Mystery of kubelet Eating CPU and IOPS

Quote of the week:

“Kubernetes with static pod IPs is like peanut butter and potatoes. Potatoes resemble apples in some ways, but they are not apples. Using them like apples is likely to disappoint.” Tim Hockin, principal software engineer, Google. Source

An interesting blog post from Thomas Dullien at Prodfiler (not heard of Prodfiler? Find out more on Secret Knowledge) on kubelet eating CPU and IOPS. The team noticed, when running Prodfiler on Kubernetes clusters of more than 100 nodes, ‘jumps’ in CPU utilization, persisting over hours and even days. The increase in CPU consumption came from path/filepath.Walk() code.

The biggest increase found was in the cadvisor/container/common.(*realFsHandler).trackUsage() command, jumping from approximately 16 samples in one hour involving this stack frame to almost 15,000 with the jump in CPU utilization.

The solution? From looking at the source code for cadvisor, the team found in order to track the disk usage of local containers, cadvisor performs a recursive directory walk of the r/w layer of all containers every minute. Not only does kubelet/cadvisor eat significant CPU walking the directory tree again and again, the periodic file system walk can start consuming things like EBS credits. This is where the IOPS comes in, as it becomes deteriorated for the underlying volume. “Once the issue was diagnosed, it was fixed relatively easily by reducing the size and depth of the directory structure the container created on startup, and kubelet CPU (and IOPS) consumption reverted back to the expected norm,” Dullien wrote.

You can look at the following Dockerfile (top) and source code (bottom) to reproduce the issue on your own cluster:

FROM ubuntu:bionic

RUN chmod 777 /tmp
RUN apt-get update && apt-get -y upgrade
RUN apt-get install -y git wget cmake sudo gcc-7 g++-7 python3-pip zlib1g-dev g++

RUN mkdir /code
COPY ./main.cpp /code
RUN g++ /code/main.cpp -o /code/a.out
RUN chmod +x /code/a.out

WORKDIR /code
ENTRYPOINT [“/code/a.out”]

Source code:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <cstring>
#include <sys/stat.h>
#include <sys/types.h>

int base = 256;

int main(int argc, char**argv) {
char buf[256];
memset(buf, 0, sizeof(buf));
sprintf(buf, “garbage”);
int res = mkdir(buf, 0700);
if (res != 0) {
printf(“Failed to create directory %s\n”, buf);
}
for (int i=0; i < base; ++i) {
sprintf(buf, “garbage/%d”, i);
int res = mkdir(buf, 0700);
if (res != 0) {
printf(“Failed to create directory %s\n”, buf);
}
for (int j=0; j < base; ++j) {
sprintf(buf, “garbage/%d/%d”, i, j);
res = mkdir(buf, 0700);
if (res != 0) {
printf(“Failed to create directory %s\n”, buf);
}
for (int k=0; k < base; ++k ) {
sprintf(buf, “garbage/%d/%d/%d”, i, j, k);
res = mkdir(buf, 0700);
if (res != 0) {
printf(“Failed to create directory %s\n”, buf);
}
}
}
}
while(1) {
printf(“Printing a message, then sleeping a bit.\n”);
sleep(1);
}
}

OTHER RESOURCES THIS WEEK:

– Amazon ECS vs EKS: The best AWS Container Service (beginner, 7 min read)
– Kubernetes Logging: Using kubectl & Best Practices (intermediate, 10 min read)
– Deploying Kubernetes-Based HPC Clusters in a Multi-Cloud Environment (advanced, 8 min read)

Thoughts on Taming the Multi-Cloud Goliath

By Shammy Narayanan

The pandemic has further accelerated the pace of cloud adoption, yet businesses are yet to come in terms with these rapid changes. Before we could catch our breath, the tidal wave of multi-cloud has started rising, what should today’s businesses do to ride this crest? Let’s deep dive.

Data management and governance strategy

“Data is the New Oil” but handle it with care as it can catch fire! As a primary step before embarking on a cloud journey, establish a sound data governance and management Strategy. It should cover the basics such as data elements that are essential to the business, owning applications of such data, a guideline to reconcile conflicts especially when data originates from multiple data sources, dissemination methodology, security and compliance framework, and toolkits. Never yield to the alluring temptation to capture all possible data: it will result in a data deluge and will quickly spiral out of control. Once base strategy is established, extend it to include cloud parameters such as preferred regions, storage tire and levels of encryption, authorization and authentication policies. A disaster recovery plan with well documented drill frequency between hybrid clouds will be an icing on the cake.

Training strategy

An oft-rumored fallacy on cloud adoption is the imperative need to double up on resources for maintaining the legacy and to migrate to cloud. It’s not factually true; with a customized training plan, a transparent communication and execution framework you can embark on cloud journey not only transforming the systems but also making your on-prem team to be the best proponents of the cloud move. Spotify’s experience of migrating a system supporting 170 million userbase to the cloud is a strong vindication of this fact. Let’s face it, your applications were not built overnight, it got hardened over multiple deployment spread over years. No one understands it better than your in-house subject matter experts, so engaging a vendor to support the cloud move is just like involving “packers and movers”.

Once the job is completed, the load of maintenance, new build and upgrade shifts back to your team. So, make learning and development as an indispensable ingredient to your cloud recipe. Training should start as early as you had firmed up on the choices of primary and secondary cloud. An ideal training plan, besides covering the basics of the chosen cloud, must include the shortlisted tool kits and provide a sandbox for practice and assignments.

Time to market

Once a team is assembled, it will be tempting to attempt a big bang “lift and shift” approach. While this looks convenient it’s equally risky, not merely due to the high cost of failure but also due to the conflicting business priorities. To explain, at any point in time, there will be multiple large-scale programs in different stages of execution from business and regulatory perspectives. Such programs cannot be abruptly paused. Cloud strategy should factor-in this shifting architectural and organizational goal posts. Besides embracing an easier Lift and shift means you are not maximizing on the potential of cloud which is as good as driving on the slowest gear in the fastest lane, so the best bet will be an incremental approach backed up by the “value vs. risk” grid.

Budget

Business class may cost more than economy, but it comes with its own set of convenience and comfort. Similarly, a multi-cloud approach will cost you more, but it can bring you comfort of an airtight plan for business continuity (remember the infamous Netflix outage in 2012 during the peak holiday season). It can also insulate from the fear of vendor lock-in and adds flexibility of choosing tools from multiple marketplaces. In addition, you can further cut down costs if your team does solid homework on predicting type of resources and their durations. Negotiating on porting your on-prem licenses can save additional cost. In essence, the greater the clarity, the lower the cost. So never venture into cloud negotiation without completing a whiteboarding exercise with your architects.

Hidden costs

Continuing on the cost, multi-cloud has few hidden costs such as data egress, over/under provisioning of resources, unused resources, choice of support tiers, cost of additional DR drills. Let’s consider data egress – in other words, when there are no charges for bringing data into the cloud, there is a cost when it leaves. Conventionally while estimating annual cloud budget, these charges are ignored but it haunts us when the actual bill pops up denting your ROI promises to the board. Even the brightest talents in NASA weren’t immune to this trap: with an estimate of $65 million of annual cloud provider charges they were way off the mark by more than 50% solely due to data egress.

Let’s simplify this concept with an example. Assume your architecture, has data is stored in AWS and your machine learning jobs run in GCP. For every model building, gigabytes of data is transferred from AWS to GCP (Egress -1) and on completion the transformed data is stored back from GCP to AWS (Egress -2). This sample work-packet besides the regular computational and storage cost will incur additional egress charges. With cloud you should be aware of such hidden costs and also design your systems to minimize such unplanned spikes.

In conclusion, multi-cloud approach does bring in its own complexity, training and budgetary constraints; however with a structured strategy and meticulous execution, the benefits far outweigh the challenges and it’s well worth the effort.

About the author: Shammy Narayanan is an 8x Cloud certified, distinguished IT specialist/architect for cloud, data transformation, and automation, with 21+ years of experience in providing technical leadership for complex business problems in the IT services industry. Shammy has worked in various leadership roles with service industry giants such as Cognizant, TCS and HCL and led major client engagements across the globe that includes technical leadership, architectural governance/discipline, technical risk management, services delivery, technology road map, defining IT strategy, design, apply methodologies, knowledge management, architectural reviews, client briefing and leading diversified and distributed global teams.

Ask the CloudProfs Community

This week, the first in a new series! We asked CloudProfs readers: “What is the most exciting thing in cloud right now for you and why should other community members be inspired to learn it?”

Take a look at the answers below and see if you get inspired. Thank you to everyone who responded! Want to be part of the CloudProfs Community Content Builders? Email the editor!

“In my opinion, Kubernetes is probably one of the most exciting cloud computing technologies right now. It has generated an important shift of paradigm in the way we used to deliver software, it comes with fantastic possibilities, and its success in a few years is a confirmation of there being so much software derived on Kubernetes, to try to enhance or automate it in so many ways. Since GitOps was coined as a terminology, Kubernetes is the perfect fit for that better DevOps approach”

“Lambda is cool, right now serverless programming is also in trend”

“For me the most exciting thing right now is chaos engineering, because it allows us to ensure the resiliency of the system. I also find interesting a way to ensure the SLA/SLO – for instance with keptn, and detecting infrastructure drifting, especially with Terraform, for instance, with driftctl”

“For me, personally, Functions as a Service is currently the most exciting aspect of cloud. In my opinion, as enterprise and hobbyist coders migrate to a more cloud-facing stance, FaaS skills are going to be as crucial as Docker containers are today”

“The most exciting thing for me is the evolution of containerized cloud-native apps, which brings us to the huge usage of Kubernetes-based orchestration tools. This is only a focus on infrastructure, otherwise the world of cloud workloads is very very vast!”

“The speed to develop new applications and the scalability of the infrastructure is very impressive”

“Definitely edge computing for me, and how it will change our lives with everything being connected with each other. The influence automatization, machine learning and data analysis will have in our daily lives and the Internet of Everything”

“I think that AI and machine learning are fields with great opportunities in cloud field, also data science. In my personal case, architecting and security are my primary focus topics”

Learn more about these technologies with Packt!

Secret Knowledge

A cool selection of recent (or recently updated) cloud repositories and tools across vendors and languages. Got a tip or are you working on a project you want the world to know about? Email the editor today!

bellsoft-liberica: A cloud-native buildpack that provides the Bellsoft Liberica implementations of JREs and JDKs. Paketo BellSoft Liberica Buildpack 8.5.0 released this week (September 3). Main language: Go (98.1%).

Cloud-PAW-Management (Microsoft): This application automates processes to reduce human error and simplify the required security expertise to deploy and manage PAWs and SPA architectures, specifically from deployment to lifecycle management. Main language: TypeScript (100%)

Collie-cli: Allows you to manage your AWS, Azure and GCP cloud landscape through a single view. V0.6.0 released this week (August 27). Main language: TypeScript (98.2%).

Prodfiler: First whole-system multi-language continuous profiling platform that does not require recompilation, on-host debug symbols, or service restarts. Launched this week (Aug 31). Compatible with C/C++, Java, Go, Rust, Perl, PHP, Python, Ruby, Scala. Main language: Shell (100%)

Self-taught-guide-to-cloud-computing: Topics covered: Linux and networking essentials (6 weeks), learning scripting and code (6 weeks), learning a cloud platform (8 weeks), learning DevOps practices (4 weeks).

Get the best cloud content in a 10-minute digest