The printer goes brrrrr!!!

Rédigé par Mehdi Talbi , Rémi Jullian , Thomas Jeunet - 25/05/2022 - dans Exploit , Reverse-engineering - Téléchargement
Network printers have been featured for the first time at Pwn2Own competition in Austin 2021. Three popular LaserJet printers were included in the completion: HP, Lexmark and Canon. During the event, we (Synacktiv) managed to compromise all of them allowing us to win the whole competition. In this post, we will focus on how we achieved code execution on the Canon printer.

Network printers are good target candidate from an attacker perspective since they are rarely reinstalled or supervised and thus constitutes a perfect place to hide on a network. Moreover, they provide the attackers with persistent access to sensitive documents that may be scanned or printed. The Canon ImageCLASS MF644Cdw printer was one of three printers that could be targeted during the Pwn2Own competition in Austin 2021.

Each team willing to take part in the Pwn2Own registers for one or several devices which they want to compromise during the competition. Then, if they succeed during one of three attempts, they earn "master of pwn" points, a cash prize (that will vary depending on the targeted equipment), and the equipment itself. The "master of pwn" points are used to establish an overall ranking which allows at the end of the competition to attribute the title of "Master of Pwn" to the winner. We managed to compromise the Canon printer during our first attempt, and thus won a $20,000 cash prize as well as 2 "master of pwn" points.

 

Bootloader analysis

The first step to bootstrap the research is to obtain the binary executed by the device. To do so, an approach is to dump the memory storage of the device. Looking at the printed circuit board, we can identify some interesting integrated circuits and a potential UART connector:

printable circuit board

According to what is printed on it, the Flash circuit is a W25Q16JV, a 16Mbit serial NOR Flash. Using the pin configuration described in its datasheet available on the  winbond website, it is straightforward to extract the Flash content using a SOP8 clip and a CH341A, using flashrom.

W25Q16JV pin configuration

The dumped binary file does not contain a filesystem nor ELF or PE executable, but might contain a compiled binary anyway as stated by this binwalk output:

$ binwalk flash.bin

DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
336448        0x52240         Zlib compressed data, default compression
337968        0x52830         Zlib compressed data, default compression
341524        0x53614         Zlib compressed data, default compression
348537        0x55179         Copyright string: "Copyright (C) 1997-2015 by CANON Inc."
350068        0x55774         Certificate in DER format (x509 v3), header length: 4, sequence length: 859
351488        0x55D00         SHA256 hash constants, little endian
361256        0x58328         Unix path: /home/nca/workspace/RB-EMERALD/printing_pf/release/modules/dryos/../dryos/src/mohacs_boot/device/lcd_7line.c
361468        0x583FC         Unix path: /home/nca/workspace/RB-EMERALD/printing_pf/release/modules/dryos/../dryos/src/mohacs_boot/device/emmc.c
361748        0x58514         Unix path: /home/nca/workspace/RB-EMERALD/printing_pf/release/modules/dryos/../dryos/src/mohacs_boot/device/sicdlIntegritycheck.c
362256        0x58710         Unix path: /home/nca/workspace/RB-EMERALD/printing_pf/release/modules/dryos/../dryos/src/mohacs_boot/device/sicdlRomData.c
365496        0x593B8         Unix path: /home/nca/workspace/RB-EMERALD/printing_pf/release/modules/dryos/../dryos/src/oscore/stdlib/iobuf.c
365712        0x59490         Unix path: /home/nca/workspace/RB-EMERALD/printing_pf/release/modules/dryos/../dryos/src/mohacs_boot/drysh/cmd_chkcachelib.c
444441        0x6C819         Certificate in DER format (x509 v3), header length: 4, sequence length: 1288
[…]

Then, we loaded the dumped binary file in IDA Pro in order to reverse the binary. After finding the absolute address, we deduced the loading address of this binary as 0x10000000. However, it was not the firmware running on the device but only the bootloader to start it, while the actual firmware was stored on the eMMC at address 0x1500000 and mapped at address 0x40b00000.

The bootloader is able to "download" a firmware to the eMMC. According to the decompiled code, the format of the firmware is composed of a header starting with a 4 bytes' magic and, depending on this magic, the content is obfuscated.

The routine used to deobfsucate the firmware is the following:

Firmware analysis

Several methods can be used in order to obtain the firmware. While it should be possible to retrieve it by dumping the eMMC, we managed to obtain it in an easier way, simply by setting an HTTP proxy, with the printer's administration panel web interface, and then to check for updates. As plain-text HTTP is used for firmware updates, one can easily get the firmware download URL if an update is available. Another method is to extract the firmware from the "MF643Cdw/ MF641Cw Firmware Update Tool", which is available on Canon support website. This tool is provided as either a PE file (e.g win-mf643-641-fw-v1005.exe) or a macOS disk image file (e.g mac-mf643-641-fw-v1005-64.dmg), and allows firmware update using either a Windows or MacOS workstation. The PE file is also a self-extracting archive, containing another PE (e.g mf643c_mf642c_mf641c_v1005_typea_w.exe), from which three NCFW packages can be extracted as they are simply appended as raw files. These NCFW packages can thus be extracted as following:

user@debian:~/ 7z x win-mf643-641-fw-v1005.exe
user@debian:~/ grep --byte-offset --only-matching --text 'NCFW' mf643c_mf642c_mf641c_v1005_typea_w.exe
470016:NCFW
140927837:NCFW
152682373:NCFW
user@debian:~ dd if=mf643c_mf642c_mf641c_v1005_typea_w.exe of=package1.bin bs=1 skip=470016 count=$((140927837-470016))
user@debian:~ dd if=mf643c_mf642c_mf641c_v1005_typea_w.exe of=package2.bin bs=1 skip=140927837 count=$((152682373-140927837))
user@debian:~ dd if=mf643c_mf642c_mf641c_v1005_typea_w.exe of=package3.bin bs=1 skip=152682373

Three NCFW packages can be identified, the firmware itself (134MB), a language package (12MB), and a DCON package (506K) related to the DC controller. It should be noted that the CEFW package exists only for firmware updates coming from the internet.

The package format is depicted by the following figure:

Package Format
  1. The CEFW package is only present when the printer downloads directly its upgrade from one of Canon websites. The content is gzipped and the header contains in particular: header size, uncompressed size and actual package size. The uncompressed data holds one to many NCFW packages.
  2. The NCFW package has a header containing in particular header size and package size, with the content obfuscated with the previously shown routine. The deobfuscated data holds one to many NCA packages.
  3. The NCA package represents a block of data written on the eMMC. Its header contains the eMMC address where the package should be written to, its size and apparently its release date (directly in hex, 0x20220222 for the 22nd of February 2022 for instance) and its version. In most cases, the first NCA package in a NCFW package is special and contains a SIG package and one to many Mm packages:
  4. The Sig package holds cryptographic signature of the data of the different NCA packages.
  5. The Mm package is only a header containing in particular the eMMC address of the others NCA package in the same NCFW package. Thus, it allows to know how many NCA package remain.

An IDA loader has been implemented in order to load a Canon firmware in IDA. The loader is available in the Synacktiv's Github repository.

In this post, we will focus on the firmware in version 10.02, which was running on the targeted device during the contest (the latest available firmware at that time).

The operating system on the printer is based on a custom Real Time Operating System named “DryOS”:

DRYOS version 2.3, release #0059

We also identified this operating system during a previous work, on a quite old Canon based printer (MX920 series), which is running a former version (release #0049). This system is itself based on µITRON, a Japanese RTOS specification. DryOs is used by Canon not only for their printers, but also for DSL cameras.

As we knew this operating system provides a debug shell called DryShell, we tried to find the UART on the main board, using a Saleae logic analyzer. We also found that several debug messages were sent by the bootloader on the UART:

Logic analyzer capture using Saleae

After identifying the UART, we made simple soldering points (TX, RX and GND) in order to use a USB to serial adapter. This allowed using tools like pyserial in order to benefit from the debug shell. Once the printer boot initialization process is finished, the DryOs shell prompt is displayed and several commands can be used:

Dry> vers
DRYOS version 2.3, release #0059
  Dry-MK 2.66
  Dry-DM 1.21
  Dry-FSM 0.10
  Dry-EFAT 1.22
  Dry-stdlib 1.57
  Dry-PX 1.15
  Dry-drylib 1.22
  Dry-shell 1.19
  Dry-command alpha 065

Commands like xd or xm can be used for respectively dump or modify the memory, and thus were quite useful during the exploitation of our vulnerability.

Hunting for bugs

The firmware is quite big since it contains more than 100k functions, automatically discovered by IDA. As the firmware doesn't contain any symbol information, a common task when analyzing such firmware is to identify strings that may be used for debug purposes. By briefly looking at a few functions, we found a logging function used more than 19k times in the firmware. Here are few examples where this function is called:

While this logging function is not always used with a fixed pattern, the third argument often matches a string like "[PREFIX]  %s", with the first variadic function argument being the name of the function. We thus developed an IDA Python script, based on BIP, to take profit of HexRay's API in order to rename some functions automatically.

In order to identify the attack surface and choose a network service that could be targeted, we ran a quick nmap scan to find all UDP and TCP ports that were exposed in the default configuration. We then tried to find, for each network service, the related DryOs task and its entry-point:

Service Port Task name Notes
HTTP/HTTPS 80/TCP, 443/TCP HtpInit Canon HTTP Server
LPD 515/TCP LPDCtrl Line Printer Daemon Protocol
IPP/IPPS 631/TCP,10443/TCP IPP_INIT Internet Printing Protocol
Jetdirect 9100/TCP RAWctrl Allow printing using PDL (Page Description Language)
Canon MFNP 8610/TCP
8610/UDP
pscan_TCP_Task
pscan_UDP_Task
Print/Scan jobs over the network (based on BJNP protocol)
Canon CADM 9007/TCP
9013/TCP
47545/TCP
47547/TCP (SSL)
47545/UDP
cadm_tcp_res
cadm_tcp_cal
cadm_tcp_adm
cadm_tcp_sec
cadm_udp_adm
Canon administration proprietary protocol
NetBIOS 137/UDP, 138/UDP smbinit NetBIOS
SNMP 161/UDP SNAgent Simple Network Management Protocol
SLP 427/UDP SLPSvcAgent Service Location Protocol
WSD 3702/UDP WSINinit Web Services Dynamic Discovery
Zeroconf 5353/UDP BN_BNSet Multicast DNS (Bonjour Apple)

A custom HTTP server runs in order to implement a web administration panel, based on CGI scripts. Most of the CGI scripts can be reached post authentication, and thus were excluded from our analysis. The HTTP server is also used to handle IPP requests (Internet Printing Protocol) as IPP is based on HTTP. Unsurprisingly, the HTTP server also handles TLS and thus allows HTTPS and IPPS. Such as the HTTP server, the TLS stack doesn't seem to be based on an open-source project. In addition to IPP, other standard printing-related protocols such as LDP or Jetdirect are implemented. In order to allow device and service discovery in typical office computer network, protocols like SLP, WSD and Zeroconf are also supported. MFNP is a Canon custom protocol, based on BJNP, which allows for instance print and scan job creation over the network. Finally, another Canon custom protocol named CADM allows printer administration over the network either over TCP (47545), TCP+TLS (47547) or UDP (47545). We decided to focus on this protocol as analyzing the packet flow processing and reverse engineering the command handlers was straightforward.

The vulnerability

The vulnerability is present in the CADM service. The following figure illustrates the format of CADM messages:

CADM Message Format

The header starts with the magic value 0xCDCA. The param len field encodes the length of the payload and the operation code field specifies the type of operation. CADM supports several operations such as adding a new user (operation code = 12), changing the password of a user (operation code = 15) or simply echoing back submitted payload (operation code = 1). There are 41 available handlers defined at offset 0x44557b58 through the following structure:

The vulnerability is present in the decoding function of the handler responsible for checking the password (operation code = 0x83), namely, the function pjcc_act_checkUserPassword2 (0x4198ecf0).

The payload is depicted by the following figure. The payload is made of 3 distinct buffers along with their size encoded into 1 byte field.

Checkpass Message Format

 

As shown by the following snippet of code, the vulnerable function allocates a structure of 428 bytes and copies inside its inlined buffers the data from the packet without checking the size:

The vulnerable object pjcc_checkpass_obj has the following structure:

struct pjcc_checkpassword_payload
{
    unsigned uint8_t type;
    unsigned uint8_t field_4;
    unsigned uint8_t buffer_len;
    unsigned uint8_t buffer[256];
    unsigned uint8_t salt[32];
    unsigned uint8_t salt_len;
    unsigned uint8_t hash[128];
    unsigned uint8_t hash_len;
};

The buffers salt and hash are vulnerable to heap-based overflow are depicted by the following figure:

Overflow

Exploitation

The exploitation of this vulnerability requires to dig into the allocator internals.

DryOs Allocator

The DryOs Allocator is simply a "best-fit” allocator that maintains a singly linked list of free chunks. This freelist is stored at address 0x45f17540. As shown by the following figure, it is a linked list of free chunks:

DryOS Allocator

The allocation function iterates over the freelist and returns the first chunk that fulfills the requested size. The current chunk is fragmented and a new chunk is created if the remaining space (chunk_size - request_size > metadata_size) is larger than the size of the chunk’s metadata (40 bytes). The allocated chunk is then unlinked from the freelist.

Malloc


Chunks in the freelist are ordered by their address and when a chunk is freed, it is inserted back in the freelist. The chunk is merged with adjacent free chunks.

In order to track allocations, we have implemented a DryShell command that dumps the freelist.

Exploitation Scenario

Our exploitation strategy is to overflow the hash buffer and corrupt the next field of the chunk that is adjacent in memory to our vulnerable object. Our goal is to force a subsequent allocation at an arbitrary address.

The first step of the exploitation is to fragment the heap in order to insert large chunks. The goal is to allocate our vulnerable object from a large chunk in order to prevent our fake chunk to be served at an early stage once the vulnerable chunk is inserted back in the freelist. Requesting the UI using HTTPS is sufficient to create the desired heap state:

DryOs > !hd
magic = 0x0, size = 0x5ff930, next = 0x49c1dc88
magic = 0x46524545, size = 0x48, next = 0x49c1e7c0
magic = 0x46524545, size = 0x78, next = 0x49c30e50
magic = 0x46524545, size = 0x30, next = 0x49c30f10
magic = 0x46524545, size = 0x60, next = 0x49c35c98
magic = 0x46524545, size = 0x48, next = 0x49d0b578
magic = 0x46524545, size = 0x60, next = 0x49d14c70
magic = 0x46524545, size = 0x60, next = 0x49d15a18
magic = 0x46524545, size = 0x240, next = 0x49d22268
magic = 0x46524545, size = 0x2848, next = 0x49d24b68
magic = 0x46524545, size = 0x9198, next = 0x49d2ddd8
magic = 0x46524545, size = 0x292140, next = 0x0

The second step is to trigger the overflow and corrupt the next field of a freed chunk. In the exploit we set the next pointer to the address 0x44557b14. This address has been selected because it precedes several CADM structures (state machine, handlers, etc.) that contain multiple function pointers. Moreover, this address is a good candidate for a fake chunk since the memory at 0x44557b14 + 4 contains a very large value (size field). A large size allows us to request a large allocation that cannot be fulfilled by the previous free chunks in the freelist. The memory at 0x44557b14 + 8 contains a NULL pointer (next field) that allows to close the freelist. Please note that there is no need to have the string ‘FREE’ at the selected address since there are no security checks made by the allocator.

Overflow

The final step is to send a large CADM echo packet in order to get our fake chunk and overlap CADM data structures with controlled data. More precisely, we rewrite these data structures with the original content in order to not break the CADM state machine. We only corrupt the handler responsible for processing the echo request (pjcc_act_echo) such that it points to our shellcode. The shellcode is also embedded in the echo’s payload. It gets immediately executed after processing the CADM echo packet according to the CADM state machine.

Post Exploitation

In order to illustrate a successful exploitation of the vulnerability, we decided to write a shellcode that displays a picture (Synacktiv's logo) on the LCD printer's screen. Therefore, the first step was to understand how the screen's framebuffer works. We identified that it is mapped at 0x40900000. Then, we used the xm (modify memory) command of the DryShell in order to write a green pixel (0x00FF00) as a long value:

Dry> xm --help
usage: xm [-|addr [access [e_addr value flag]]]
      - : usage
   addr : [0-9a-fA-F]*
 access : b(byte:default) | w(word) | l(long word)
 e_addr : [0-9a-fA-F]*
  value : [0-9a-fA-F]*
   flag : f(fill) | s(search) | u(unmatch)
   -> + : auto increment
Dry> xm 0x40900000 l 0x40900000 0x00ff00 f

We noticed that the first pixel from the LCD printer's screen changed to green and was quickly restored to its previous value. Thus, we understood that the frame buffer is simply an array where 3 bytes are used to store a single pixel value. Each of these 3 bytes defines respectively the Red, Green and Blue intensity for a pixel. As the screen resolution is 800x480 pixels, the framebuffer size is thus 800x480x3.

Framebuffer

In order to implement a quite small shellcode, a TCP socket is used to read each pixel value from a picture, sent by a Python script, using PIL (Python Imaging Library) and implementing the server side.

The shellcode implements this by calling the following network-related functions:

  •  netSocket (0x424bbfb4)
  •  netConnect (0x424bc298)
  •  netRecv (0x424bc554)

It is worthwhile to note that the sin_family field specified in the sockaddr_in structure must be set to 0x100 to specify AF_INET (i.e it changes from Linux where AF_INET is 0x02). As the shellcode calls netRecv in an infinite loop, DryOS is able to perform tasks context switches, and only the CADM service is impacted by the exploitation of the vulnerability. Another approach could have been chosen, by creating a proper DryOs task, and restoring the hijacked function context.

The exploit code is available in the Synacktiv's Github repository. You can also take a look at our Ninja beeing displayed on the printer's screen:

Video file