Monthly Cloudy Updates, July 2024

Table of Contents

Hello World!

First, let me share with you some interesting news and articles I found this last month.

If you think reserving compute is the best way to save money in the cloud, you might want to check this: How we saved $5k a month with a single Grafana Query.

Ever wondered what it feels like to do everything yourself? The Architecture Behind A One-Person Tech Startup

No intro for this one, just read :) A Note on Essential Complexity.

Postgres, every day we learn how powerful it is Postgres is all you need, even for vectors.

And last but not least, a shameless self-plug Simplifying Event-Driven Architectures: KEDA Scaling with Managed Identity in Azure Container Apps.




AWS

AWS Managed Services (AMS) Accelerate now includes Trusted Remediator

AWS Managed Services (AMS) Accelerate customers can now use Trusted Remediator to automatically remediate recommendations based on Trusted Advisor checks. More info here.

Elastic Fabric Adapter (EFA) now supports cross-subnet communication

AWS now supports cross-subnet communication between Elastic Fabric Adapter (EFA) interfaces for Amazon EC2 instances within the same Availability Zone (AZ). This enhancement makes it possible to communicate with EC2 instances across subnets while benefiting from the low latency and high throughput provided by EFA. More info here.

AWS Lambda introduces new controls to make it easier to search, filter, and aggregate Lambda function logs

You can now capture your Lambda logs in JSON structured format without having to bring your own logging libraries. JSON format allows logs to be structured as a series of key-value pairs, enabling you to quickly search, filter, and analyze your function logs. More info here.

Amazon SQS introduces new Amazon CloudWatch metrics for FIFO queues

  • NumberOfDeduplicatedSentMessages - The number of messages sent to a queue that were deduplicated.
  • ApproximateNumberOfGroupsWithInflightMessages - The approximate number of message groups with inflight messages, where a message is considered to be in flight after it is received from a queue by a consumer, but not yet deleted from the queue. More info here.

AWS announces the general availability of vector search for Amazon MemoryDB

This capability helps you to store, index, retrieve, and search vectors. Amazon MemoryDB delivers the fastest vector search performance at the highest recall rates among popular vector databases on AWS. More info here.

AWS App Studio preview

App Studio opens up application development to technical professionals without software development skills (such as IT project managers, data engineers, and enterprise architects), empowering them to quickly build business applications, eliminating the need for operational expertise. More info here.

Amazon ECS now provides enhanced stopped task error messages for easier troubleshooting

ECS now makes it easier to troubleshoot task launch failures with enhanced stopped task error messages. When your Amazon ECS task fails to launch, you see the stopped task error messages in the AWS Management Console or in the ECS DescribeTasks API response. With this launch, Amazon ECS stopped task error messages are now more specific and actionable. More info here.

AWS Batch now supports gang-scheduling on Amazon EKS using multi-node parallel jobs

With AWS Batch MNP jobs you can run tightly-coupled High Performance Computing (HPC) applications like training multi-layer AI/ML models. More info here.

Amazon ECS now enforces software version consistency for containerized applications

Amazon ECS now resolves container image tags to the image digest (SHA256 hash of the image manifest) when you deploy an update to your Amazon ECS service and enforces that all tasks in the service are identical and launched with this image digest(s). More info here.

AWS Secrets Manager announces open source release of Secrets Manager Agent

The Secrets Manager Agent open source code is available on GitHub and can be used in all AWS Regions where AWS Secrets Manager is available.

AWS AppConfig announces feature flag targets, variants, and splits

AWS AppConfig has new capabilities for feature flags. A common use-case for feature flag targets include allow lists, where a customer can specify user IDs or customer tiers, and only enable a new or premium feature for those segments. More info here.

AWS Graviton-based EC2 instances now support hibernation

More info here.

Azure

Migrate to continuous backup to optimize costs and unlock point-in-time restores. More info here.

Run Azure Load Testing on Azure Functions

You can now create and run load tests directly from Azure Functions in Azure portal. Load test your functions by simply selecting the function, key and specifying request parameters and load configuration. More info here.

Databricks Foundation Model Training

With Foundation Model Training, Azure Databricks users can use their own data to customize a foundation model to optimize performance for their specific application. Azure Databricks users have everything in a single platform: their own data to use for training, the foundation model to train, checkpoints saved to MLflow, and the model registered in Unity Catalog ready to deploy. More info here.

Run load tests in debug mode on Azure Load Testing

Azure Load Testing now supports running low scale test runs in Debug mode enabling better debuggability with enhanced logging. More info here.

Azure Virtual Network Manager mesh and direct connectivity

Azure Virtual Network Manager mesh connectivity configuration, direct connectivity in the hub, and spoke connectivity configuration are generally available in all public regions. More info here.

az command invoke in AKS

AKS run command allows users to remotely invoke commands in an AKS cluster through the AKS API. For example, az aks command invoke kubectl get nodes.

Public Preview: Geo-Replication for Azure Service Bus Premium

This feature ensures that the metadata and data of a namespace are continuously replicated from a primary region to a secondary region. More info here.

Public Preview: Windows Server 2025 now available

Read about the enhancements here.

Public Preview: ED25519 SSH key support for Linux virtual machines

More info here.

Databricks serverless compute

DLT with serverless compute allows you to run your streaming data pipelines without configuring and deploying infrastructure. More info here.

Serverless compute for notebooks allows you to run your notebooks quickly without configuring and deploying infrastructure. More info here.

Public Preview: Advanced Network Observability for your Azure Kubernetes Service clusters through Azure Monitor

Advanced Network Observability is the inaugural feature of the ACNS suite that provides pod level metrics, DNS metrics, L4 connections, and enhanced debug capability for troubleshooting AKS cluster networks. More info here.

Encryption using Customer Managed Keys for Backup Vaults

You can use Customer Managed Keys (CMK) when creating a new backup vault or update the encryption settings for an existing vault to use CMK. More info here.

Generally Available: Azure Virtual Network Manager mesh and direct connectivity

Azure Virtual Network Manager’s mesh connectivity configuration and direct connectivity option in the hub and spoke connectivity configuration are generally available in all public regions. More info here.

Generally Available: Support for Azure Key Vault certificates in Azure Container Apps

You can now use Azure Key Vault to store and manage your own TLS/SSL certificates for Azure Container Apps at the environment level. You can do so using the Azure portal as well as the Azure CLI. More info here.

Generally Available: Azure Container Apps support for peer-to-peer encryption

Azure Container Apps support for environment level peer-to-peer Transport Level Security (TLS) encryption is now generally available. More info here.

Public Preview: Managed identity support for scaling rules in Azure Container Apps

You can now use managed identities in your scale rules to authenticate with supported Azure services. More info here.

Public Preview: Managed Java component (Admin for Spring) in Azure Container Apps

You can use managed Java components to access platform features for your apps that you would otherwise have to manage yourself. More info here.

Generally Available: Redis extension for Azure Functions

The Redis extension for Azure Functions is now generally available. The extension can be used as a trigger in Azure Functions, allowing Redis to initiate a serverless workflow. More info here.

GCP

Cloud Monitoring

You can now create private uptime checks that issue TCP requests. For more information, see Create private uptime checks.

View your Carbon Footprint in the FinOps hub

In the FinOps hub, you can now view the estimated greenhouse gas emissions for your Google Cloud usage by visiting the Carbon Footprint dashboard.

Capacity Planner

Capacity Planner displays GPU usage and forecasts of the GPUs in your Google Cloud project or organization.

VPC Service Controls feature

Support to programmatically retrieve the list of services that are supported by VPC Service Controls is generally available. Using this feature, you also can retrieve the list of methods and permissions supported by VPC Service Controls for a service. More info here.

Google Kubernetes Engine

Ray Operator on GKE is now generally available in the Rapid channel. Ray Operator is a GKE add-on that allows you to manage and scale Ray applications.

Compute Engine

You can create GPU VMs in a managed instance group (MIG) by using resize requests. Resize requests help you create VMs all at once and give you higher chances to obtain highly demanded resources such as GPUs.

Hyperdisk ML, block storage designed specifically for high-performance AI workloads. Each Hyperdisk ML volume can achieve up to 1,200,000 MBps of throughput. For large-scale training and inference workloads, you can attach a single Hyperdisk ML volume to up to 2,500 VM instances.

Hyperdisk Balanced High Availability provides cross-zonal, synchronous replication for your disk data, offering the best set of options for RPO, RTO, and performance. More info here.

You can limit the run time of VMs, which automatically stops or deletes a VM after a specific time or duration. Limiting your VMs’ run times can help you optimize temporary workloads by minimizing costs and releasing quota, more info here.

Cloud Run

Compute flexible committed use discounts are now available for Cloud Run services with CPU always allocated, and Cloud Run jobs. More info here.

Apigee X

Operations Anomalies now supports data residency. Data residency meets compliance and regulatory requirements by allowing you to specify the geographic locations (regions) where Operations Anomalies data is stored.