Too Much Data
Millions of files are transferred through networks every day ranging from documents to executables to compressed files and countless more. Data traverses the network without providing much awareness or context for analysts - eventually requiring incident responders to connect to hosts or capture network traffic to dig into file content. Assuming files are still available at the time of request, this activity requires analysts to perform data collection activities and potentially basic file or malware analysis.
Strelka is a modular active data collection platform, allowing analysts to observe file content for any defined file types without ever needing to perform active file collection. Data is submitted by an endpoint (e.g., network sensor, host) to a Strelka cluster, analyzed, normalized into (JSON), and passed on to another endpoint. Coupled with a security information and event management system (SIEM), Strelka can aggregate, alert, and provide analysts with the capability to better understand their environment without having to perform initial data gathering or file analysis.
Active File Analysis
The easiest way to understand Strelka is by observing its foundation: scanners. Scanners are Python scripts whose role is to perform data analysis per filetype. These scanners are provided files by Strelka based on the taste, or signature, of an identified file. For example, if Strelka has a signature for docx file identification, it executes the Strelka docx scanner. This specific scanner handles the extraction of data such as: author, comments, last modified timestamps, and more before passing it back the results back to Strelka. Here are a few more predefined scanners and some of the data they collect:
|HTML||hta_file, text/html, html_file||Hyperlinks, Title, Forms, Frames, Scripts|
|ZIP||application/zip, application/vnd.openxmlformats-officedocument, ooxml_file||In addition to collecting the quantity and size of the compressed file, Strelka will decompress the file and pass the contents back into the Strelka pipeline.|
|application/pdf, pdf_file||Annotated URIs, Objects|
|YARA||All||Scans all files for any provided YARA files and returns matches.|
Table 1: Scanner Examples
At a high level, Strelka is a mostly simple process. Figure 1: Strelka Overview visualizes the data flow for files in the Strelka pipeline. A go application is used to upload or stream files into the Strelka cluster where a backend server performs file tasting, or type identification. After that, relevant scanners, like those defined above, are executed against the files. Execution of analysis is defined in a configuration file on the server that allows administrators to set size and timeout requirements. Upon completion, the data is packaged into a JSON file and shipped to an additional endpoint or back to the submitter. For more detailed information on the design of Strelka and component-to-component details, please see the Readme.
Figure 1: Strelka Overview
The following, also available in the Readme, is a response example from Strelka showing how a typical response object is generated. This example shows a scan result for a text file that appears to be a shell script containing an IP address. The IP address is redacted to prevent accidental navigation.
"header": "cd /tmp || cd /var/run || cd /mnt || cd /root || c"
Since Strelka is an open source tool available on GitHub, the community can utilize this high-speed, modular, file analysis platform either in response, investigations, or hunting. A Readme is available to provide users with a detailed understanding of getting started, installation, architecture, modules, how to contribute, and more. The best way to get started with contributing to the Strelka project is through scanner development. These scanners are written in Python 3 and if you have any ideas on new types of scanners (or want to have a go at writing your own) we encourage you to submit those as an Issue or Pull Request.
This post has only highlighted a foundational overview of Strelka and we recommend you give it a test with the quickstart. While we only touched on a few concepts and scanners, Strelka isn’t defined only by filetypes and metadata. Scanners can be developed to perform low level analysis as well as pass files to other tools (e.g., sandboxes). If you have any questions, concerns, or feedback about Strelka, contact us via GitHub.