Investigating IDA Lumina feature

Rédigé par Johan Bonvicini - 15/12/2020 - dans Outils , Reverse-engineering - Téléchargement
Lumina is a built-in function recognition feature of the well-known IDA pro disassembler that relies on an online signature database. Unfortunately, the database server is not available for local private use. Have you ever raged at a misstyped hotkey that sent your database content to the Lumina servers, wondered how it works, what kind of data is sent, and wished for a local server under your control? This blog post might answer some of your questions.

Overview

Introduced in IDA 7.2 Lumina is an online function recognition feature described in those terms:

“The Lumina server is currently very simple: it holds metadata (function names, prototypes, comments, operand types, and other info) about well-known functions. Any user can send or receive metadata from Lumina.

When using Lumina, IDA does not send byte patterns to the server. Instead, it sends some hash values and this is enough for Lumina to find the corresponding metadata. If metadata is found, it will be downloaded and applied to the current database. This is a great way to improve the disassembly listing. It is possible to configure IDA to automatically request metadata at the end of analysis.”

source: https://www.hex-rays.com/products/ida/lumina

In a nutshell Lumina is the evolution of the good old FLIRT (Fast Library Identification and Recognition Technology) mechanism with some improvements:

  • feature is embedded in IDA GUI : external tools to generate signatures are no longer needed
  • end users can select which function they want to generate a signature for (one, all or user selection)
  • unlike FLIRT, all signatures and metadata are stored in a single database to avoid individual loading of each signature file
  • additional metadata is stored, instead of only function name and comment in the past.
IDA Lumina menu
IDA Lumina menu

Protocol

Hex-Rays developers have implemented a custom TCP based RPC protocol to communicate between IDA instances and the servers. This protocol is simple: on each client request, a TLS/TCP session to lumina.hex-rays.com:443 is established. The client initiates an RPC handshake followed by a client request, a server response, then the session termination.

TLS handshake

During a TLS handshake, the server certificate is checked against two hardcoded CA certificates:

ecdsa-with-SHA256:

-----BEGIN CERTIFICATE-----
MIIBwTCCAWigAwIBAgIUTywOBIR2odB59aEjU981FBmOi+AwCgYIKoZIzj0EAwIw
UzELMAkGA1UEBhMCQkUxDzANBgNVBAcMBkxpw6hnZTEVMBMGA1UECgwMSGV4LVJh
eXMgU0EuMRwwGgYDVQQDDBNsdW1pbmEuaGV4LXJheXMuY29tMB4XDTE5MTAwODE0
MTg1OFoXDTIwMTAwNzE0MTg1OFowUzELMAkGA1UEBhMCQkUxDzANBgNVBAcMBkxp
w6hnZTEVMBMGA1UECgwMSGV4LVJheXMgU0EuMRwwGgYDVQQDDBNsdW1pbmEuaGV4
LXJheXMuY29tMFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEbZMvGlWyAOKOLcXk
6VglBuWCPyNgdNVaSkXEl0gpBdcRa3QCZIkQeu1YaCdBY8v7y+G7YljzvmWx+S4V
qg6XFqMaMBgwFgYDVR0lAQH/BAwwCgYIKwYBBQUHAwEwCgYIKoZIzj0EAwIDRwAw
RAIgB6B+bFSXowi5wV0xJXsCyyR/EjKg1OIHlFbDW9SHCRoCIH+b7xguFt0IptGV
qx1spjBjuLXas8sMFJKDqheggBl3
-----END CERTIFICATE-----

sha512WithRSAEncryption:

-----BEGIN CERTIFICATE-----
MIIF0TCCA7mgAwIBAgIULzKtEOP9Q7V/L/G4Rnv4L3vq/hEwDQYJKoZIhvcNAQEN
BQAwVDELMAkGA1UEBhMCQkUxDzANBgNVBAcMBkxpw6hnZTEVMBMGA1UECgwMSGV4
LVJheXMgU0EuMR0wGwYDVQQDDBRIZXgtUmF5cyBTQS4gUm9vdCBDQTAeFw0yMDA1
MDQxMTAyMDhaFw00MDA0MjkxMTAyMDhaMFQxCzAJBgNVBAYTAkJFMQ8wDQYDVQQH
DAZMacOoZ2UxFTATBgNVBAoMDEhleC1SYXlzIFNBLjEdMBsGA1UEAwwUSGV4LVJh
eXMgU0EuIFJvb3QgQ0EwggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAwggIKAoICAQDB
rsEh48VNyjCPROSYzw5viAcfDuBoDAHe3bIRYMaGm2a6omSXSzT02RAipSlO6nJZ
/PgNEipaXYLbEXmrrGdnSdBu8ub51t17AdGcGYzzPjSIpIVH5mX2iObHdS3gyNzp
JKJQUCDM6FdJa8ZcztKw+bXsN1ftKaZCzHcuUBc8P5lkiRGcuYfbiHri5C02pGo1
3y4Oz99Sot8KUfwNhByOOGOweYyfn9NgmhqhkBu27+6rxpmuR7mHyOhfnLs+psQ0
yjE6bzul2ilWFrOSaLAxKbhBLLQDWCYeBvXmE0IzmZVbo2DqTU+NWREU6avmRRBz
6RnZHFUhl2LVbJ5Ar45BawR38bRNro6VNCTq89rBXVFeCnk9Ja6v4ZAoWmjJupHC
pXTIxoebkoeWAwICuz63cWsRh1y2aqdgQ6v9yVErA64GhgCkpJO82HDtA9Siqge3
T+rgUnj1pcllGKgxAFYcKhlCLl4+bm0ohlxF0WF8VMhG/TBLNH3MlJFjlMoBwQnl
APheEgZWoQSEjAkzRLUrRw7kVk/Qt8G5hFGLb3UjE8SKDPKRYSBAUN/uP8YHKFqo
2arpTCi1DO4SqX8r6zqzslVTf6uWTiq8MNkZ/+7NYr1/JPT25iMlw6sa6g4GUPpQ
zhRaPy19obGe43u4vjpyse9g5vqX9p3u9MI14x3k6QIDAQABo4GaMIGXMB0GA1Ud
DgQWBBQaxNacfM7XKjKIutIHrc6tjiE9DTAfBgNVHSMEGDAWgBQaxNacfM7XKjKI
utIHrc6tjiE9DTAPBgNVHRMBAf8EBTADAQH/MA4GA1UdDwEB/wQEAwIBhjA0BgNV
HR8ELTArMCmgJ6AlhiNodHRwOi8vY3JsLmhleC1yYXlzLmNvbS9yb290X2NhLmNy
bDANBgkqhkiG9w0BAQ0FAAOCAgEAdKp4InpRk5z0BjPs6CcJSgKbCH0MXZqbt/EM
/4dJPvmA6tAexJpv9e9BmT/DOB84QB2xzQlEiNOB7/V4j3oDij5mMwRyqYL24l3g
HAavwc+dLrpzX/54uZmH9bKs7yj3fk/vU3e7th720ArL2/YZjHV2Wx0BMcs+YVit
phvG2mxu16DTpidms3pCj25eEISJvXfe8XEfKOP1FxGCpmKxx6qPHlNASOp5zdwV
iEimkguUwzCsmmPI5rEWLXdLRxc0CkffmbsNmsF8SZz38CiwuRlichDDdZuJXji7
jnZF7h04Mo2AKPt6wJ9+66rYqDigvP9sHGKpQp5hr1DMukFGnei3S9h5Kp8eDhRX
Y24y/CJVNO0rxYoFPUnOwbSUF3Fwu4fX3Ezq5eW7N0Nl7s0XHExb/P9fmhPxQBV1
gwr665inq5ZwD8H9uwGEVp3IBT9cHRu8ieZrQDMI1UqPOy+2EWNPtY4KxmgerTbc
N0VH4BuE8tdxTGUckg4JTbsNRUbqxSXmSL9jA1dLBT63lbMLIU06dIdqNbpxE4GV
MgOLwqwx/BF+FZgQTttdjmpexml6NIDVGDBxfyECJ5vdwxbKMIRfo7fp0jRpjZpP
8bw4BPnx0Y4NpMzKxiWS0i7re9iEafdh6GtpNynKU0JFSKrIwmIecKF+Z4ZUE/1K
+t/FOgI=
-----END CERTIFICATE-----

Please note that TLS parameters (server, port and certificates) can be configured or disabled. More about that later.

RPC Protocol

This protocol seems to be a work in progress, as many packet handlers are not fully implemented yet (spoiler: for debug purpose? Or new IDA features ahead like local server or P2P communication for IDA Teams ☺️?).

RPC message creation is handled by the undocumented function new_packet:

RPCMessage* new_packet(RPC_PACKET_TYPE pkt_type, 
                const char* data,
                size_t size, 
                int version = 2);

// Each packet type is handled by its own class for (de)serializing network data:
class RPCMessage {
	virtual ~RPCMessage();
	virtual void serialize(bytevec_t* output);
	virtual void deserialize(bytevec_t* , size_t offset);
	RPC_PACKET_TYPE type;
}

// List of the available RPC codes for Lumina
enum RPC_PACKET_TYPE : uint8_t
{
    OK = 0xa,       // Server ack
    FAIL,           // Server ack with error message
    NOTIFY,         // Server ack with error message
    HELO,           // Client HELLO (handshake)
    PULL_MD,        // Pull metadata request
    PULL_MD_RESULT, // Pull metadata response
    PUSH_MD,        // Push metadata request 
    PUSH_MD_RESULT, // Push metadata response
    // below packet types are not handled by the server
    GET_POP,
    GET_POP_RESULT,
    LIST_PEERS,
    LIST_PEERS_RESULT,
    KILL_SESSIONS,
    KILL_SESSIONS_RESULT,
    DEL_ENTRIES,
    DEL_ENTRIES_RESULT,
    SHOW_ENTRIES,
    SHOW_ENTRIES_RESULT,
    DUMP_MD, // :'(
    DUMP_MD_RESULT,
    CLEAN_DB,
    DEBUGCTL,
}

As mentioned above, the RPC protocol relies on a single handshake followed by a request/response message:

  • handshake:
    • the client sends an RPC_HELO packet to authenticate to the server with a valid IDA license
    • the server responds with an RPC_OK (or an RPC_NOTIFICATION with an "Invalid license" error message)
  • request can be one of the following:
    • a PULL_MD message that contains a list of function signatures to lookup. The server replies with a PULL_MD_RESULT containing the list of results (function metadata) if found.
    • a PUSH_MD message that contains a list of function signatures and metadata to push. The server replies with a PUSH_MD_RESULT containing the push "ack" and popularity of each function information. Popularity might be the number of (unique?) pushed signatures for this function.
  • other message types are not handled and server responds with an RPC_NOTIFICATION with an "Unknown command" message

Lumina internals

Note: Lumina protocol and implementation are not yet "documented" in the IDA SDK. We reverse engineered most of the type structures and functions, some of them being exported by IDA core (Windows:ida(64).dll, Linux:libida(64).so) or available in the idapython repository.

Hex-Rays states that "It is like FLIRT but it is dynamic, stores more information, and can recognize functions that FLIRT cannot".

Signature generation algorithm is based on MD5 hash of "cleaned" function code: for each basic block in a function, IDA applies a bitmask on each opcode to zero out any variant information like relative and absolute addresses or nops. To get rid of variant bits, IDA uses a combination of get_wide_byte (retrieves a byte from database) and undocumented method processor_t::calcrel that provides a bitmask to "clean" the current instruction or data. Both chunk bitmask and masked data are saved for computing an MD5 hash for all blocks that will be the function signature. We won't get into further details on this process as it is a well-known problem, already discussed in a hexrays blog post about FLIRT signature mechanism. Thus, the signature generation seems to be the same as FLIRT.

The generated function signature is stored in a func_sig_t structure:

struct func_sig_t {
    uint32_t version;  // current version == 1
    qstring signature; // md5 digest
};

The func_info_t structure represents the function metadata:

struct func_info_t {
    qstring func_name;
    uint32_t func_size;
    bytevec_t serialized_data;  // serialized matadata
    uint32_t popularity;        // returned by the server
};

Signature and metadata are stored in a func_md_t structure:

struct func_md_t{
    func_info_t metadata;
    func_sig_t signature;
}

Metadata for a function is generated by ida!calc_func_metadata:

// Exported by ida core
int /*func_size*/ calc_func_metadata(
    char signature[16], // output MD5 signature
    func_info_t *,      // output info
    func_t* pfn,        // input function
    void (__thiscall *callback)(void*, func_t*) = NULL);

Once metadata and signature are computed, they are serialized into an RPC packet to be sent.

Privacy considerations

Quoting the official documentation, metadata consists of the following information:

  • function address, name, prototype
  • function frame layout
  • stack variables
  • user-defined sp change points
  • representation of instruction operands
  • function and instruction comments

This statement is almost accurate: these are metadata sent for a given function but there is more information sent:

  • on every request (in the HELO packet):
    • the entire content of the current ida.key file
    • some unique identifier (or watermark) stored in the hexrays module file
  • on push request
    • the computer's hostname (or Windows machine name)
    • the idb full file path (yes, absolute path)
    • the corresponding absolute input file path (got by idaapi.get_input_file_path())
    • the input file md5

We strongly advise you to take into account this potential information leak, especially when using non-official servers like https://lumen.abda.nl. If you need more control over the Lumina feature, or want to disable it, apply the following modifications.

Customize Lumina feature

The first thing you might want to do is to (partially) disable Lumina client and hotkeys from IDA settings.

Note: as mentioned in the documentation, we recommend modifying settings in the IDAUSR directory, rather than in the IDADIR, as modifications would override settings for each installed IDA version, even future ones.

Edit or create the ida.cfg file:

  • Windows: %APPDATA%\Hex-Rays\IDA Pro\cfg\ida.cfg
  • Linux/OSX: $HOME/.idapro/cfg/ida.cfg
  • from Python API: get IDAUSR value by calling idaapi.get_user_idadir()

Lumina related settings are listed below. You can copy/paste the following recommended values:

// Lumina related parameters
LUMINA_HOST = "localhost";  // Lumina server url (default = "lumina.hex-rays.com")
                            // Warning: don't forget the semicolon or file parsing would fail
LUMINA_MIN_FUNC_SIZE = 32   // minimum function size (in bytes) to trigger a query (default = 32)
LUMINA_PORT = 4443          // TCP port (default = 443)
LUMINA_TLS= YES             // enable TLS (YES) or use plaintext tcp (NO)  (default = YES)

Lumina default hotkeys are F12, Ctrl+F12 and Alt+F12, (in)conveniently very close to the well known Shift+F12 (Open strings window) shortcut.
You might also want to disable those hotkeys in $IDAUSR/cfg/idagui.cfg the same way:

// Lumina commands
// Some commands are visible from Lumina menu under certain conditions
// Make sure to disable all hotkeys
"LuminaApplyMdFromList"  	= 0
"LuminaFunctionsPullMd"  	= 0
"LuminaFunctionsPushMd"  	= 0
"LuminaFunctionsRevertMd"	= 0
"LuminaFunctionsViewMd"  	= 0
"LuminaIDAViewPullMd"    	= 0
"LuminaIDAViewPushMd"    	= 0
"LuminaIDAViewRevertMd"  	= 0
"LuminaPullAllMds"       	= 0 // default: "F12"
"LuminaPushAllMds"       	= 0 // default: "Ctrl-F12"
"LuminaRestoreMdFromList"	= 0
"LuminaViewAllMds"       	= 0 // default: "Alt-F12"

Lastly, you should also disable the AutoUseLumina option (auto request for metadata at the end of analysis) by unticking "Automatically use Lumina server" option in the Help->check for free updates menu:

Wink wink Bruno

IDA is using hardcoded CA keys for certificate pinning, but we can also override root CA by adding a custom hexrays.crt file in any of the config folders. This may become handy to implement our own server.

POC||GTFO

Having analyzed the Lumina protocol and pinpointed the lack of privacy and offline mode, we wrote a simple POC of Lumina server for private/offline use, available on github https://github.com/synacktiv/lumina_server. This is a simple Python script that implements a server for pushing and querying signatures, stored in a json "database" file. Server does not handle (yet) signature collision, metadata extraction or merge, and proxying to Hex-Rays servers (hey, it's a POC!).

Script can be used as a regular offline Lumina server or as a cheap synchronization method between multiple users. Just keep in mind that the nominal behaviour is that colliding signatures are appended to existing ones without merge, and the last pushed signature is fetched on a query. Don't forget to pull before pushing and use your favourite hotkey (ctrl-z) or the revert functionality in Lumina menu if you are no satisfied with the results.

You can also use the server as a one-shot instance to dump all your database functions and share the json file, to be applied to a fresh idb (without leaking information stored in the idb).

Protocol is implemented using Construct package so you can use it to implement your own server. Notice that implementing your own proxying method would allow you to remove private information sent to Hex-Rays but the servers always checks license and metadata validity sent in the HELO packet to grant querying access to the database. It is also worth mentioning that due to the protocol limitation, you cannot dump the entire online database for your own private use.

Conclusion

In this blogpost, we investigated the Lumina protocol internals and provided an open source implementation of an offline server. Lumina (ex FLIRT) feature is a good way to get fast function recognition as it is natively implemented. It can also be used as a cheap database export and collaboration tool with some improvements.

Still, it is only useful for common functions recognition and not efficient for advanced identification, bin-diffing or collaboration work. Tools like diaphora and bindiff for diffing, polichombr and First for advanced function identification and IDArling or Yaco for collaborative work would be more accurate. Unfortunately, most of these tools are no longer maintained and were broken by the IDA API 7 transition, like the majority of the plugins ecosystem.