Google Casts a New Spell: Introducing Magik(a)

Renmarc AndradaRenmarc Andrada 07/06/2024

How often do we witness the blend of innovation and generosity in the tech world? Google's latest contribution to the open-source community, Magika, serves as a fascinating case study in this regard.

This tool not only highlights Google's commitment to fostering creativity and collaboration among the community worldwide but also sets a precedent for how large corporations can contribute significantly to the technological ecosystem.

Ready? Let’s start…

What’s the magic behind Magika? And how does its spell work?

Google's Magika is the newest addition to the open-source community within the lists of file type identification tools, setting itself apart in a field previously navigated by tools like the GNU/Linux 'file' command and the Windows-based tool like 'TrID'. What sets Magika apart is its innovative use of AI and deep learning algorithms, which allow for remarkably accurate detection of file types.

What renders Magika truly magical, in line with its name, is its efficiency and minimal resource requirement; it is ingeniously designed to operate smoothly on just a single CPU. This cutting-edge tool has the prowess to identify over 100 different file types, including, but not limited to, PDF, ZIP, ASP, PowerShell, and TIFF. The fusion of advanced technology with a user-centric design elevates Magika to a critical position in the technological toolkit, offering a sophisticated yet easily accessible solution to the challenges of file identification.

A Quick Recap for File Type Identification Techniques

Tool-Based File Type Identification

GNU/Linux ‘file’ command

Before the advent of Magika, particularly in the field of Malware Analysis, analysts predominantly relied on the GNU/Linux 'file' command.

To illustrate how this tool operates, consider the following example:

For Linux, the 'file' command is utilized in the following manner: file <file_name>

As you can see, using ‘file’ the tool tells you what file type it is.

GNU/Linux ‘file’ command

Windows ‘TrID’

The 'TrID' tool is a utility designed for Windows that helps in identifying file types from their binary signatures. Unlike simple file extension matching, TrID analyzes the patterns within the file to determine its type, making it a powerful tool for identifying unknown or mislabeled files, especially useful in fields like digital forensics and malware analysis.

For Windows, the 'TrID' tool operates in the following manner: trid.exe <file_name>

Windows ‘TrID’

Manual File Type Identification

Another method for file type identification involves manually examining the file in hexadecimal format. For instance, a Portable Executable (PE) file in Windows will feature "4D5A" as its 4-byte magic number at the beginning, representing 'MZ'. This indicates that the analyzed file is a PE file. The sequence "4D5A" corresponds to the ASCII characters 'M' and 'Z', which are the initials of Mark Zbikowski, a developer who helped design the DOS executable file format. The presence of "4D5A" at the start of a file signifies it as a PE executable, commonly used for .EXE and .DLL files in Windows.

manually examining the file in hexadecimal format Portable Executable (PE) file in Windows

Let's consider another example, this time of a PDF file. Here, the magic number is represented by the sequence "2550 4446" in hexadecimal format, which translates to "%PDF" when converted to ASCII string. This sequence, typically found at the beginning of the file, indicates that the file is indeed a PDF document. The "%PDF" signature is the hallmark of PDF files, signaling the start of a document formatted according to the Portable Document Format standard.

manually examining the file in hexadecimal format Portable Executable (PE) file in Windows PDF example

Using Magika’s spell for File Identification

To understand the functionality of Magika and see its magic in action, let's explore how it can be utilized.

There are two primary methods to use Magika, and we will discuss both, along with how to employ them.

Firstly, if you have Python3 installed on your system, you can easily get started with Magika by using the package installer for Python (pip). Simply execute the command pip install magika in your terminal or command prompt. This command will download and install the Magika package, making it ready for use in your Python environment.

Using Magika’s spell for File Identification

Secondly, given that Google is a major player with extensive resources, deploying tools on the web for easy and convenient access is a given. Therefore, you can explore Magika's capabilities firsthand through its web demo. Access the web-based version of Magika by visiting the following link: Magika .

Below is a brief overview of what you can expect from the Magika user interface on the web.

Magika user interface on the web

Single File Analysis

You can seamlessly analyze a single file with Magika using the command line or through its web user interface (UI). To analyze a file via the command line, use the following syntax: magika <file_name>

Single File Analysis

Alternatively, if you prefer a graphical interface, Magika's web UI allows for file uploads directly through your browser. Simply navigate to the Magika web demo link provided earlier, and you will find an option to upload the file you wish to analyze. This user-friendly interface makes it straightforward to obtain the file type identification without needing to use command-line tools.

Both methods offer a convenient way to utilize Magika's file analysis capabilities, catering to different preferences for interacting with the tool.

graphical interface, Magika's web UI allows for file uploads directly through your browser

Multiple File Analysis

You can leverage Magika to analyze multiple files within a directory by employing the wildcard (*) syntax. This approach is particularly useful for batch processing of files. Here's how you can do it: magika Directory/*

Upon executing this command, Magika will swiftly process and print the analysis results for each file, including the identification of each file's type. This feature makes it exceptionally efficient for users who need to analyze a large number of files in a concise timeframe.

Multiple File Analysis

Magika's web UI supports the drag-and-drop or upload functionality for multiple files simultaneously. This user-friendly feature allows you to efficiently analyze several files in one go. Once the files are uploaded, Magika will process them and display the results in separate tables for each file. This organization makes it easy to review the file types and any other relevant analysis outcomes for each individual file, streamlining the process of managing and understanding a batch of files at once.

Magika's user-friendly UI allowing drag-and-drop

Case Study: Addressing Magika’s Gap as Malware Analysis Tool

Given that Magika is celebrated for its lightweight and user-friendly attributes, it's important to acknowledge that, like any tool, it has its limitations, especially when considered for specialized tasks such as Malware Analysis. It's a universal truth in the tech domain that even the most advanced tools have their specific weaknesses.

In our earlier discussion, we highlighted 'file' and 'TrID' in the 'tool-based' section of this blog. To provide a clearer perspective on how these tools measure up, especially in practical scenarios, let's delve into a comparative analysis using a real example. We will examine a sample of Ardamax malware, known to be packed with UPX, and see how each tool performs in identifying and analyzing this particular piece of malware.

Magika

In our analysis, we utilized Magika to identify a sample piece of malware. Magika returned a result indicating the file is a 'PE executable'. This classification is supported by the presence of the magic number '4D5A' (MZ) visible in the right portion of the analysis output.

Using PE executable with analysis output

TrID

Moving on to TrID, utilizing this tool provided further insights into the file's identity. TrID not only recognized the file type but also quantified its confidence level in the identification. It determined the file to be an '.EXE file, a Win32 Executable,' with a confidence of '52.9%'. This percentage reflects TrID's analysis based on its extensive database of file signatures, indicating a probabilistic measure of how closely the file matches the characteristics of a Win32 executable.

Using TrID

GNU/Linux File

What's fascinating, given the results from Magika and TrID, is how the 'file' command offers a slightly different and more detailed perspective.

In our analysis of the sample file, specifically the 'Ardamax' malware, which is UPX packed, both Magika and TrID identified the file as a 'PE' or 'EXE' file but did not provide details regarding its packing. However, when utilizing the 'file' command, the result was more informative. It labeled the file as 'UPX compressed', shedding light on the fact that this sample is not merely a 'PE' or 'EXE' file but a 'UPX Packed PE file'.

This distinction is crucial in malware analysis, as the packing of a file can be a tactic used to obfuscate the malware's true nature, making it harder for antivirus programs to detect. The ability of the 'file' command to detect this layer of complexity provides analysts with valuable insights into the potential challenges of unpacking and analyzing the malware further.

Using GNU/Linux File

Summary

Google has made numerous open-source products available to the community throughout the years. Tools such as OSS-Fuzz, GRR Rapid Response, and Tsunami have had a big impact on the security industry. Google's latest release, Magika, presents the potential for incorporating AI and deep learning into future technologies. Magika improved by automation and deep learning which allowing analysts to do their analysis more easily and quickly. Our discussion demonstrates that relying on a single tool is impractical. Instead, in the fields of data analysis and forensic inquiry, the ideas of "triangulation" or "multi-source verification" are critical for producing thorough and dependable results.

If you found this topic interesting and you don’t have any exposure to Malware Analysis, Reverse Engineering and Incident Response, why not take a look at our gamified blue team labs

Reference:

Magika

GitHub - google/magika: Detect file content types with deep learning

https://github.com/google/magika/blob/main/docs/supported-content-types-list.md

Marco Pontello's Home - Software - TrID

Security Blue Team: Training for all Stages of your Cybersecurity Career

Security Blue Team is a leading online defensive cybersecurity training provider with over 100,000 students worldwide, and training security teams across governments, military units, law enforcement agencies, managed security providers, and many more industries.

Renmarc Andrada

Renmarc is an avid fan of the phrase 'sharing is the new learning'. As a content developer with years of experience under his belt, he dedicates most of his time to researching both old and new TTPs in broad areas such as DFIR, CTI, threat hunting and malware analysis.


Don't miss a post

Subscribe to our digest to learn about new product features, the latest in cybersecurity, solutions, and updates.

We care about your data. See our privacy policy.