IPv6 Part 9: “NAT breaks applications”

One of the biggest objections to NAPT (or in fact to any form of NAT) is that it can break certain applications. This is usually because source or destination IP addresses are referenced within the application data. In order to avoid breaking these applications, a NAT device has to be able to recognise them, reach into their data and modify it to be consistent with NAT; it does this using an Application Layer Gateway (ALG).

The best known example of an application that requires this treatment is File Transfer Protocol (FTP): the FTP control protocol transmits IP addresses as ASCII text. In conventional (active) FTP, the client opens a control connection to the server, and then commands the server to connect back to its own IP address to open a data transfer connection. In fact the data transfer connection doesn’t have to be back to the client: FTP originally supported direct transfers between two remote systems without the data having to go via the client, but this File eXchange Protocol (FXP) functionality is nearly always disabled now for security reasons.

Another well-known application protocol that can have problems with NAT is Session Initiation Protocol (SIP). A SIP user agent runs both a client and a server, to initiate and receive data connections. The SIP protocol itself only handles call initiation and close: separate protocols handle the data in either direction. All this sounds complex, and it is, so a SIP ALG has to be pretty intelligent to identify which connections are related and translate all the traffic correctly.

As I pointed out in my last post, most NAT devices at the enterprise perimeter are also firewalls, and modern next-generation firewalls are reaching deeper into the application data anyway, in order to enforce enterprise policies. Such firewalls will already have intelligent ALGs for such applications. Things get trickier when application encrypt their traffic; if the application is proxy-aware, then it may be possible to insert an ALG into the architecture, but otherwise the application traffic will be opaque to the ALG on the NAT device/firewall.

What these problematic applications usually have in common is that they are not conventional client-server applications. For example FTP needed to include IP addresses within its protocol, because the FTP “client” could be managing a connection between two remote systems, so who in that case was the client? In general, with client-server applications, it’s clear which is the client and which is the server, and the application shouldn’t need to include IP addresses within its data. Moreover, most enterprises have up to now used a pretty simple network architecture in relation to the Internet: clients are allowed to make outbound connections from the corporate network, and inbound connections are usually only permitted to a DMZ. It may be that peer-to-peer Internet applications will become more important in the enterprise world in the future, in which case such architectures may have to change, but at the moment peer-to-peer applications seem to me more relevant to home or mobile networks. Client-server applications fit the centralised enterprise model better.

Tom Coffeen (see IPv6 Address Planning, chapter 2) argues that NAT emphasises the perimeter model of security, and that in a world of pervasive malware that model is no longer relevant anyway. I would argue that although there’s no longer an absolute distinction between untrusted and trusted networks, the enterprise perimeter hasn’t disappeared, it’s just that there are now different levels of trust. We need perimeters within perimeters; defence in depth is all the more necessary.

Once you start to unpick the question of NAT’s impact on applications, it’s clear that much wider issues are at stake than NAT alone, although that’s rarely spelt out by the critics of NAT. It opens up the whole question of what is a relevant secure architecture today. What is clear is that with the current level of external threat enterprises will need a compelling reason (a killer app?) to move away from their conventional application and network architectures.

In the next post I will look at NAT’s impact on IPsec.

IPv6 Part 8: “State in the network is bad”

One of the most common objections to NAPT is that NAPT devices maintain state, and that this violates the “end-to-end principle” (see for example Tom Coffeen’s book IPv6 Address Planning: Designing an Address Plan for the Future, chapter 1).

Now the end-to-end principle is fundamental to the architecture of the Internet. It owes a lot to the work of Louis Pouzin on the CYCLADES network: his insight was that functions like reliability of transmission and virtual circuits were best handled at the endpoints of a connection, leaving the network to simply shift packets around, without worrying about reliability or even the order in which the packets arrived at their destination. You might summarise this as “smart hosts, dumb networks”. To illustrate this, let’s imagine a set of ping-pong balls that spell out the word “hello”. If I drop them through a wooden box that splits them up and sends them through different paths before they fall out of the bottom (a bit like a bean machine only outputting a single sequence of balls), then I might end up with a set that says “loleh”. I would then put them back in the correct order (perhaps by using a sequence number on the back of each ball). If the box lost a ball I would arrange for retransmission of the lost ball. That’s basically how TCP/IP creates virtual connections over an unreliable datagram network.

The end-to-end principle has been very important in allowing the Internet to scale up to the size it is today: routers are (relatively) simple devices that can be added into the network to form a mesh, through which individual datagrams (the segments of a conversation) can take various paths to their destination. Crucially it has also made the Internet much more robust: if there is a problem at any point in the path between two hosts, then datagrams can route themselves around the obstacle and find an alternative path.

To quote RFC 1958:

This principle has important consequences if we require applications to survive partial network failures. An end-to-end protocol design should not rely on the maintenance of state (i.e. information about the state of the end-to-end communication) inside the network. Such state should be maintained only in the endpoints, in such a way that the state can only be destroyed when the endpoint itself breaks (known as fate-sharing). An immediate consequence of this is that datagrams are better than classical virtual circuits. The network’s job is to transmit datagrams as efficiently and flexibly as possible. Everything else should be done at the fringes.

If a particular node or set of nodes on a link has to maintain the state of the connection, then datagrams can’t route themselves round trouble. However this argument only really applies in the Default-Free Zone (DFZ), the meshed heart of the Internet where there is no default route and datagrams have a multiplicity of routes to their destination. Stub networks in general (and enterprise networks in particular) are more like the branch of a tree than a mesh: there is generally only one well-defined path out to the Internet. That path out to the Internet typically goes through an NAPT device; if that NAPT device fails, then packets have no alternative path to take anyway.

Now it’s true that if the link to the Internet went through a stateless device then it could fail and then recover, and the endpoints could continue their conversation where they had been interrupted, assuming that the application hadn’t timed out in the meantime. When a NAPT device fails and then recovers then the NAPT state has been lost and all the connections going through that device are broken (this assumes that the NAPT device is not clustered in a way that maintains NAPT state during a failover). However, this argument holds true of any network device that holds connection state, stateful-inspection firewalls for example, although the anti-NATters rarely make this explicit.

In fact (at the enterprise level at least) NAPT devices are nearly always stateful-inspection firewalls as well. The anti-NAPT argument often refers to the performance overhead of maintaining NAPT state, but stateful-inspection firewalls have to maintain the state of permitted connections anyway, and it would surprise me if the internal architecture didn’t combine the two functions. The whole point of stateful-inspection firewalls is that they improve performance, by avoiding the need to test every datagram against the firewall’s ruleset.

In reality there are many stateful network devices at the modern enterprise perimeter: not just firewalls/NAPT devices, but intrusion prevention systems, web proxies, load balancers and other reverse proxies. They all violate the end-to-end principle, and they all have to devote resources to maintaining state, but the security and performance benefits that they provide outweigh the loss of resilience. It’s a pragmatic compromise: architectural principles are fine as long as you don’t lose sight of the bigger picture.

In the next post I’ll look at the impact of NAT on applications.

IPv6 Part 7: No more NAT

One of the biggest culture shocks for me as a security professional is the assumption that IPv6 addressing is end-to-end; in other words, no more network address translation (NAT). Untranslated addresses expose both the host identity and the topology of the local network to the outside world. If an auto-configured IPv6 address is based on the interface MAC address (see IPv6 Part 3: Address auto-configuration) then the hardware vendor of the interface is exposed too (there are alternatives to this as we shall see). However, what I find even more shocking is the level of hostility to NAT within the IPv6 world: it seems to be an article of IPv6 faith that NAT is bad, often without giving any real case against it. I want here to take a good look at NAT and the arguments for and against, without prejudice.

One of the sources of confusion and ignorance there is about NAT comes from confusion of terminology, so I’ll start off by trying to cut through that in this post. It’s important to understand that there are two main different types of NAT. The first is usually referred to as one-to-one NAT (bi-directional NAT according to RFC 2663, Static NAT in the Check Point world). As the name suggests, there is a one-to-one mapping between addresses in the public domain and the private domain. As datagrams pass through the NAT gateway that lies between the two, the source or destination address is rewritten accordingly, and for TCP, UDP and ICMP (and IPv4 datagrams) the header checksum is recalculated. IPv6 will simplify this slightly because there is no longer a datagram checksum to recalculate. One-to-one NAT was first used in the early days of the Internet to handle cases where end-users had changed providers and hadn’t completed the process of readdressing all their hosts using the new (provider-assigned) prefix. An experimental form of prefix translation has now been defined for IPv6 (RFC 6296).

If the NAT gateway uses static rules to map between private and global addresses then the process is stateless: the gateway simply translates addresses packet by packet. However RFC 3022 defines a method called Basic NAT, where bindings between private and global addresses are set up dynamically, from a pool of global addresses; this means that the NAT gateway has to manage the state of these bindings.

The other type of NAT is what RFC 3022 refers to as Network Address Port Translation (NAPT; known as overloading or Port Address Translation in the Cisco world, Hide NAT in the Check Point world). This allows multiple hosts on a private network to connect to the Internet using one global address: very attractive in the IPv4 world with the increasing shortage of globally routable addresses. A private network using NAPT will typically use RFC 1918 addresses internally, which are not globally routable.When a host on the private network initiates a connection to the Internet, the NAPT gateway will typically translate the source address of the initial datagram to the global address of the gateway. It then dynamically translates the source TCP or UDP port of the initial datagram to an available port on the gateway itself. TCP and UDP ports are 16-bit numbers, and outbound source ports are generally allocated in the range above 1023, so this will scale up to a a maximum of about 64,000 simultaneous connections. ICMP datagrams have to be modified in an analogous way.

To take an example, say the NAPT gateway has a global address of 192.0.2.1, and there are two hosts with RFC 1918 addresses, 10.0.0.1 and 10.0.0.2, on the private side. Host 10.0.0.1 initiates a TCP connection to port 80 on 198.51.100.1 with a randomly selected source port of 7680. As the first datagram of the connection passes through the NAT gateway, the gateway translates the source address to 192.0.2.1, and the TCP port to a spare port on itself, let’s say 20231. Then host 10.0.0.2 makes a TCP connection to another destination, port 80 on 203.0.113.1, with the source port set to 1818. The gateway will translate port 1818 to another spare port, in this case 10434. The gateway needs to maintain the state of these mappings, so that when a datagram comes in on the global interface with destination of 192.0.2.1 and TCP port 20231, it knows that this needs to be translated to 10.0.0.1 and port 7680 in order to reach its destination.

NAPT gateways have even more work to do than one-to-one NAT gateways. Not only must their checksum recalculations include the modified port numbers, but they must also now maintain the state of every connection that passes through, and clean this up when the connection is closed. If there are multiple NAPT gateways for redundancy, then this state will need to be replicated between them to keep connections up after a gateway failure.

NAPT has the magical property of allowing a private network to be much larger than it appears from the Internet: a bit like Doctor Who’s TARDIS, which is much larger than it appears to be from the outside. It has come to be almost synonymous with NAT, as it’s by far the most prevalent form of NAT today. However it’s important to remember that the different types of NAT have different properties, and different goals. In the following posts I’ll be keeping this in mind as I go through the objections to NAT and assess their validity.