Take SNAT, says Microsoft, to improve Azure load balancing
When too many cloudy ports are barely enough
Microsoft has rewritten the operation of its Source Network Address Translation (SNAT) protocol in a bid to improve Azure's load balancing performance.
Source Network Address Translation is a special sauce that allocates IP port numbers, making it easy for a router to fling traffic around multiple Azure servers. In the TCP/IP stack, the “port” is an addressing field in the header that identifies a host (because incoming traffic is addressed to the router, not the server) and the protocol. In load balancing, the customer allocation of ports identifies the server instances associated with a customer's traffic.
Originally, SNAT worked with a pre-allocated set of 160 dynamic ports, giving the customer extra ports if their allocation was exhausted by their traffic.
According to this post by Raman Deep Singh, a program manager in Azure's software-defined networking operation, Microsoft has found use-cases where that doesn't hold up.
If a service runs flows to lots of external endpoints, Singh wrote, the existing SNAT model works well to create uniform flows.
But when there are lots of flows to few external destinations, SNAT has a problem: “the initial port allocation gets exhausted in a short period”, and the connection becomes intermittent.
“With the on-demand model, the ports are not evenly distributed. This results in longer pending state for SNAT port allocation for some of the instances in the pool”, the post continues.
Hence the revised SNAT, designed to make port allocation more predictable: “all the available ports are pre-allocated, and evenly distributed amongst the backend pool of the Load Balancer depending on the pool size”.
Here are the available port allocations:
|Pool size||Pre-allocated ports|
Existing deployments will be moved to the new SNAT model by Northern Summer 2018.
New customers, whether they subscribe to the Azure Standard SKU Load Balancer or the Basic SKU Load Balancer (and Classic cloud deployments) will be assigned immediately.
Microsoft doesn't anticipate problems in the new model, except in the case of a service that demands lots of SNAT connections from many individual instances. In that case, Singh points to this paper about managing SNAT exhaustion. ®