Monthly Cloudy Updates, July 2024
Table of Contents
Hello World!
First, let me share with you some interesting news and articles I found this last month.
If you think reserving compute is the best way to save money in the cloud, you might want to check this: How we saved $5k a month with a single Grafana Query.
Ever wondered what it feels like to do everything yourself? The Architecture Behind A One-Person Tech Startup
No intro for this one, just read :) A Note on Essential Complexity.
Postgres, every day we learn how powerful it is Postgres is all you need, even for vectors.
And last but not least, a shameless self-plug Simplifying Event-Driven Architectures: KEDA Scaling with Managed Identity in Azure Container Apps.
AWS
AWS Managed Services (AMS) Accelerate now includes Trusted Remediator
AWS Managed Services (AMS) Accelerate customers can now use Trusted Remediator to automatically remediate recommendations based on Trusted Advisor checks. More info here.
Elastic Fabric Adapter (EFA) now supports cross-subnet communication
AWS now supports cross-subnet communication between Elastic Fabric Adapter (EFA) interfaces for Amazon EC2 instances within the same Availability Zone (AZ). This enhancement makes it possible to communicate with EC2 instances across subnets while benefiting from the low latency and high throughput provided by EFA. More info here.
AWS Lambda introduces new controls to make it easier to search, filter, and aggregate Lambda function logs
You can now capture your Lambda logs in JSON structured format without having to bring your own logging libraries. JSON format allows logs to be structured as a series of key-value pairs, enabling you to quickly search, filter, and analyze your function logs. More info here.
Amazon SQS introduces new Amazon CloudWatch metrics for FIFO queues
- NumberOfDeduplicatedSentMessages - The number of messages sent to a queue that were deduplicated.
- ApproximateNumberOfGroupsWithInflightMessages - The approximate number of message groups with inflight messages, where a message is considered to be in flight after it is received from a queue by a consumer, but not yet deleted from the queue. More info here.
AWS announces the general availability of vector search for Amazon MemoryDB
This capability helps you to store, index, retrieve, and search vectors. Amazon MemoryDB delivers the fastest vector search performance at the highest recall rates among popular vector databases on AWS. More info here.
AWS App Studio preview
App Studio opens up application development to technical professionals without software development skills (such as IT project managers, data engineers, and enterprise architects), empowering them to quickly build business applications, eliminating the need for operational expertise. More info here.
Amazon ECS now provides enhanced stopped task error messages for easier troubleshooting
ECS now makes it easier to troubleshoot task launch failures with enhanced stopped task error messages. When your Amazon ECS task fails to launch, you see the stopped task error messages in the AWS Management Console or in the ECS DescribeTasks API response. With this launch, Amazon ECS stopped task error messages are now more specific and actionable. More info here.
AWS Batch now supports gang-scheduling on Amazon EKS using multi-node parallel jobs
With AWS Batch MNP jobs you can run tightly-coupled High Performance Computing (HPC) applications like training multi-layer AI/ML models. More info here.
Amazon ECS now enforces software version consistency for containerized applications
Amazon ECS now resolves container image tags to the image digest (SHA256 hash of the image manifest) when you deploy an update to your Amazon ECS service and enforces that all tasks in the service are identical and launched with this image digest(s). More info here.
AWS Secrets Manager announces open source release of Secrets Manager Agent
The Secrets Manager Agent open source code is available on GitHub and can be used in all AWS Regions where AWS Secrets Manager is available.
AWS AppConfig announces feature flag targets, variants, and splits
AWS AppConfig has new capabilities for feature flags. A common use-case for feature flag targets include allow lists, where a customer can specify user IDs or customer tiers, and only enable a new or premium feature for those segments. More info here.
AWS Graviton-based EC2 instances now support hibernation
More info here.
Azure
Public Preview: Azure Cosmos DB continuous backup for accounts using Azure Synapse Link
Migrate to continuous backup to optimize costs and unlock point-in-time restores. More info here.
Run Azure Load Testing on Azure Functions
You can now create and run load tests directly from Azure Functions in Azure portal. Load test your functions by simply selecting the function, key and specifying request parameters and load configuration. More info here.
Databricks Foundation Model Training
With Foundation Model Training, Azure Databricks users can use their own data to customize a foundation model to optimize performance for their specific application. Azure Databricks users have everything in a single platform: their own data to use for training, the foundation model to train, checkpoints saved to MLflow, and the model registered in Unity Catalog ready to deploy. More info here.
Run load tests in debug mode on Azure Load Testing
Azure Load Testing now supports running low scale test runs in Debug mode enabling better debuggability with enhanced logging. More info here.
Azure Virtual Network Manager mesh and direct connectivity
Azure Virtual Network Manager mesh connectivity configuration, direct connectivity in the hub, and spoke connectivity configuration are generally available in all public regions. More info here.
az command invoke in AKS
AKS run command allows users to remotely invoke commands in an AKS cluster through the AKS API. For example,
az aks command invoke kubectl get nodes
.
Public Preview: Geo-Replication for Azure Service Bus Premium
This feature ensures that the metadata and data of a namespace are continuously replicated from a primary region to a secondary region. More info here.
Public Preview: Windows Server 2025 now available
Read about the enhancements here.
Public Preview: ED25519 SSH key support for Linux virtual machines
More info here.
Databricks serverless compute
DLT with serverless compute allows you to run your streaming data pipelines without configuring and deploying infrastructure. More info here.
Serverless compute for notebooks allows you to run your notebooks quickly without configuring and deploying infrastructure. More info here.
Public Preview: Advanced Network Observability for your Azure Kubernetes Service clusters through Azure Monitor
Advanced Network Observability is the inaugural feature of the ACNS suite that provides pod level metrics, DNS metrics, L4 connections, and enhanced debug capability for troubleshooting AKS cluster networks. More info here.
Encryption using Customer Managed Keys for Backup Vaults
You can use Customer Managed Keys (CMK) when creating a new backup vault or update the encryption settings for an existing vault to use CMK. More info here.
Generally Available: Azure Virtual Network Manager mesh and direct connectivity
Azure Virtual Network Manager’s mesh connectivity configuration and direct connectivity option in the hub and spoke connectivity configuration are generally available in all public regions. More info here.
Generally Available: Support for Azure Key Vault certificates in Azure Container Apps
You can now use Azure Key Vault to store and manage your own TLS/SSL certificates for Azure Container Apps at the environment level. You can do so using the Azure portal as well as the Azure CLI. More info here.
Generally Available: Azure Container Apps support for peer-to-peer encryption
Azure Container Apps support for environment level peer-to-peer Transport Level Security (TLS) encryption is now generally available. More info here.
Public Preview: Managed identity support for scaling rules in Azure Container Apps
You can now use managed identities in your scale rules to authenticate with supported Azure services. More info here.
Public Preview: Managed Java component (Admin for Spring) in Azure Container Apps
You can use managed Java components to access platform features for your apps that you would otherwise have to manage yourself. More info here.
Generally Available: Redis extension for Azure Functions
The Redis extension for Azure Functions is now generally available. The extension can be used as a trigger in Azure Functions, allowing Redis to initiate a serverless workflow. More info here.
GCP
Cloud Monitoring
You can now create private uptime checks that issue TCP requests. For more information, see Create private uptime checks.
View your Carbon Footprint in the FinOps hub
In the FinOps hub, you can now view the estimated greenhouse gas emissions for your Google Cloud usage by visiting the Carbon Footprint dashboard.
Capacity Planner
Capacity Planner displays GPU usage and forecasts of the GPUs in your Google Cloud project or organization.
VPC Service Controls feature
Support to programmatically retrieve the list of services that are supported by VPC Service Controls is generally available. Using this feature, you also can retrieve the list of methods and permissions supported by VPC Service Controls for a service. More info here.
Google Kubernetes Engine
Ray Operator on GKE is now generally available in the Rapid channel. Ray Operator is a GKE add-on that allows you to manage and scale Ray applications.
Compute Engine
You can create GPU VMs in a managed instance group (MIG) by using resize requests. Resize requests help you create VMs all at once and give you higher chances to obtain highly demanded resources such as GPUs.
Hyperdisk ML, block storage designed specifically for high-performance AI workloads. Each Hyperdisk ML volume can achieve up to 1,200,000 MBps of throughput. For large-scale training and inference workloads, you can attach a single Hyperdisk ML volume to up to 2,500 VM instances.
Hyperdisk Balanced High Availability provides cross-zonal, synchronous replication for your disk data, offering the best set of options for RPO, RTO, and performance. More info here.
You can limit the run time of VMs, which automatically stops or deletes a VM after a specific time or duration. Limiting your VMs’ run times can help you optimize temporary workloads by minimizing costs and releasing quota, more info here.
Cloud Run
Compute flexible committed use discounts are now available for Cloud Run services with CPU always allocated, and Cloud Run jobs. More info here.
Apigee X
Operations Anomalies now supports data residency. Data residency meets compliance and regulatory requirements by allowing you to specify the geographic locations (regions) where Operations Anomalies data is stored.