CloudProfs Issue 3: K8s, EC2, Boto3, and MLOps

Welcome! This is the third edition of CloudProfs, sent to subscribers on August 13. See the email in-browser here.

If you enjoyed this newsletter, why not sign up here? Or if you have any feedback, email the editor here.

News

HashiCorp’s first ever State of Cloud Strategy survey has underlined what many cloud developers already know: multi-cloud rules supreme. The survey, which polled 3,200 responses across industries and countries, found more than three quarters (76%) already work in multi-cloud environments. Yet more than half (57%) of respondents added that a shortage of proper skills was a major hindrance to their organisation’s ability to operationalise multi-cloud. The report also found that, for almost half of those polled (46%), Covid-19 had not affected their shift to multi-cloud adoption. You can read the full analysis here (no sign-up required).

Workday, the provider of finance and HR-based enterprise apps, has selected Google Cloud as its ‘preferred cloud provider’. The companies promise an “exceptional public cloud experience” in the canned quote of co-CEO Chano Fernandez. This development may raise an eyebrow or two given Business Insider reported last month (subscription required) that Workday’s relationship with Amazon Web Services had ended. Workday clarified the situation in a blog post, saying it and Amazon ‘mutually agreed to discontinue Amazon’s Workday HCM deployment, with the potential to revisit it in the future.’ An AWS spokesperson is on record that a ‘number of significant teams within Amazon continue to use Workday.’

Stripe has, once again, hit the top of the Forbes Cloud 100. The list ranks the best privately-held companies across cloud platforms, infrastructure and software from expert opinion. Last year’s list saw data warehouse provider Snowflake take top spot – ending Stripe’s three-year stay at the summit – before weeks later whisking off to its IPO. The top five companies run the gamut of cloud applications; big data darling Databricks (#2), design software Canva (#3), Terraform arbiter HashiCorp (#4), and restaurant software provider Toast (#5). Take a look at the full list here.

Understanding… MLOps

MLOps is an emerging term which defines how organisations can run AI successfully through various cloud services and other software. As with DevOps, MLOps aims to apply DevOps principles to machine learning delivery, enabling collaboration between data science and operations departments. From a hyperscale cloud perspective, the most notable MLOps offerings today are AWS SageMaker, Azure ML and Google Vertex AI.

Two research reports explore the understanding of MLOps from both the developer/data scientist and the vendor perspective.

The first report comes from Valohai (link, no email required), an MLOps platform provider. The majority of the 100 respondents (33%) said their primary role was to build both models and infrastructure.

The key stats in understanding how engineers and organisations were utilising MLOps were around focus areas and tooling. The area with the biggest growth from this time last year was monitoring models in production. 31% of respondents cited it this time around, compared with 13% last time. This represents the most advanced capability; approximately 40% of those polled are interested in developing models for production use, and then deploying them to production, a little further back.

On the flip side, fewer respondents said they need to prove the potential of machine learning. ‘While being by no means exhaustive, the results support that teams have made strides towards MLOps in the past year, and implementing machine learning systems (as opposed to projects) is top of mind.’

So how are organisations productionising machine learning? Perhaps unsurprisingly, code repositories are the most popular area where tooling has been established, cited by more than 75% of respondents. 50% of those polled believe they have adequate tooling for machine learning pipelines, with 33% looking for a solution.

Elsewhere, a report from GigaOm (link, no client access required) concludes that Azure ML is the most enterprise-ready MLOps offering, ahead of AWS SageMaker and Google Vertex AI. The research explored enterprise time-to-value as the key metric, and tested each solution on ease of setup and use, MLOps workflow, security, governance, and automation. Overall, Azure ML scored 2.9 out of 3, SageMaker 2.5 and Vertex AI 1.9.

BONUS LINKS:

How to build a Kubernetes platform on AWS

Garreth Davies, cloud consultant at Mobilise Cloud, talks through the considerations needed to build Kubernetes platforms, the components involved, and common misconceptions in a recent webinar (30 mins, direct link, no signup required).

Kelsey Hightower wrote in 2017 that Kubernetes ‘is a platform for building platforms. It’s a better place to start; not the endgame.’ This quote in itself is an important place to begin understanding Kubernetes. “If you’re serious about building a standardised, containerised platform to host your enterprise applications, then you’re going to need a lot more help from other services to deploy and run the K8s cluster,” noted Davies.

The example architecture, with AWS EKS (Elastic Kubernetes Service) in the middle and various services outside, included:

AWS Best Practices. This firstly rests on issues such as laying out account structures to segregate workflows, or ensuring MFA (multi-factor authentication) was enabled. As per the diagram, there are four key accounts: 1) a management account which houses Git and other tooling; 2) an audit account which looks after audit trails from other accounts and limits access to specific users, such as security engineers; 3) & 4) non-production and production accounts. This is where the segregation of workloads comes in. Guard Rails and Landing Zones are also used here so that when an account is created, best practices and security features are baked in so features and configurations do not need to be set each time.
CI/CD Architecture. Why have a central CI server? “What we want to move away from is a pattern of having developers building, testing and deploying their applications from laptops or other devices,” said Davies. “Why? There’s a good chance if developers are building locally, that they’re introducing unknown artefacts into their builds. It’s all about security and governance in terms of how we control our builds so they are reliable and repeatable, and how we secure our deployments.”

Security Best Practices. There are three different levels of security to consider: the AWS platform itself (primary level), EC2 infrastructure (secondary level), and the K8s platform (third level). The first level will see elements such as MFA introduced, per the abovementioned AWS Best Practices. At the EC2 level, these can be implementations such as EC2 security groups. “We may remove the SSH access and use things like AWS Systems Manager so we can SSH into the boxes through the AWS console, rather than manage users and keys for the EC2 instance,” said Davies. The Kubernetes level will include things like pod security policies, network security policies, and resource and usage limits.

Cost Control. This is where Spot Instances can come in. Spot Instances, whereby up to 70% of EC2 costs can be saved, work better in test environments than in production, as AWS is able to reclaim them with only a two-minute warning. This is less of an issue for K8s however, Davies noted: as workloads deployed to K8s should be stateless and built for fault tolerance, when an instance is taken away and a new one comes up, the workload should recover and – hopefully – act as if nothing has happened in the background.

Other parts of the architecture, which are explained further in the webinar, are CI/CD Application Pipelines, Testing, Logging, Monitoring and Alerting.

Azure user? Read Quickstart: Deploy an Azure Kubernetes Service (AKS) cluster using the Azure portal (July)

MORE RESOURCES FROM THIS WEEK

The week’s top podcasts

Ravi Lachhman, evangelist at Harness, a self-service CI/CD platform for developers, explains the role of security in the deployment process and being ‘good enough’ in the Wild West of CD. “I hate semantics,” said Lachhman. “If you look at the true definition of microservices, that’s very strict. You must be using messaging, in a synchronous fashion, you have to have multiple copies of the data. It’s a very strict definition. We’re just being more efficient.

“You know the appetite for risk, the appetite for automation, you make incremental changes. If you were deploying every six months, and you’re now deploying every three months, you made a 100% gain. Engineers – we’re natural optimisers.” (What the Dev?)

—

“Chaos engineering is a little bit more than a tough piece meant for super SREs. Now, chaos engineering is more of a good, and easy, and must-have tool for DevOps, as long as you’re trying to improve something on reliability.” Uma Mukkara, CEO and co-founder of ChaosNative. Mukkara and co-founder and CTO Karthik Satchitanand, discuss the evolution of chaos engineering and a good way to discover inconvenient truths about that beautiful code you wrote. (The Changelog).

—

Communication and coordination best practices to deliver great software. “We’ve started to go in that direction. Calendar awareness is definitely on our short term roadmap,” notes Yishai Beeri, growth technologies lead at LinearB. “We’re already providing technology that helps the communication between developers around the coding cycle, reviews, mergers. So helping devs be effective to make that multi-developer interaction or dance happen more efficiently. The ability to improve that process for the developer is already a step in that direction, and being deliberate about conflict switching is key.” (Adventures in DevOps)

Get the best cloud content in a 10-minute digest

News

Understanding… MLOps

Top tutorials: EC2/Boto3 and Kubernetes

How to build a Kubernetes platform on AWS

The week’s top podcasts

Leave a Reply Cancel reply