YARA规则

Ghostasky 收录于 Technology

2022-08-13 约 3500 字预计阅读 7 分钟

[toc]

1.YARA简介

YARA 是一个旨在（但不限于）帮助恶意软件研究人员识别和分类恶意软件样本的开源工具。

YARA的每一条描述、规则都由一系列字符串和一个布尔型表达式构成，并阐述其逻辑。YARA规则可以与文件或在运行的进程，以帮助研究人员识别其是否属于某个已进行规则描述的恶意软件等。

项目地址：https://github.com/VirusTotal/yara，（yara64.exe ， yarac64.exe ）

python：https://github.com/VirusTotal/yara-python

官方文档：https://yara.readthedocs.io/

2.YARA示例

rule silent_banker : banker
{
    meta:
        description = "This is just an example"
        thread_level = 3
        in_the_wild = true
    strings:
        $a = {6A 40 68 00 30 00 00 6A 14 8D 91}
        $b = {8D 4D B0 2B C1 83 C0 27 99 6A 4E 59 F7 F9}
        $c = "UVODFRYSIHLNWPEJXQZAKCBGMT"
    condition:
        $a or $b or $c
}

以上规则：

名为rule silent_banker的规则，其中banker是规则的tag字段(可以有多个tag)
meta字段是规则的描述信息，比如可以有规则说明、作者信息、威胁等级、在野情况、文件MD5、来源等内容；
strings是规则字段
condition则是条件判断的字段

3.yara规则

官方文档：https://yara.readthedocs.io/en/v4.2.3/writingrules.html

yara中的规则都以rule开头，后面跟着的是identifier（标识符），标识符和编程中的变量命名差不多，部分保留的关键字不能用作标识符：

all	and	any	ascii	at	base64	base64wide	condition
contains	endswith	entrypoint	false	filesize	for	fullword	global
import	icontains	iendswith	iequals	in	include	int16	int16be
int32	int32be	int8	int8be	istartswith	matches	meta	nocase
none	not	of	or	private	rule	startswith	strings
them	true	uint16	uint16be	uint32	uint32be	uint8	uint8be
wide	xor	defined

strings的部分由$后跟一系列字母数字字符和下划线组成，strings可以以文本或十六进制形式定义，示例：

rule ExampleRule
{
    strings:
        $my_text_string = "text here"
        $my_hex_string = { E2 34 A1 C8 23 FB }
    condition:
        $my_text_string or $my_hex_string
}

文本使用引号括起来
十六进制使用大括号（只能是16进制

3.1 注释

跟c语言一样，行注释和段注释都一样。

3.2 Strings

3.2.1 Hex

这里其实有三种，文本，十六进制，还有一种是正则。

十六进制可以使用占位符?，通配符使用[]

$hex_string = { E2 34 ?? C8 A? FB }
$hex_string = { F4 23 [4-6] 62 B4 }

yara2.0之后可以使用无界跳转：

FE 39 45 [10-] 89 00
FE 39 45 [-] 89 00

还有一种给定部分的替换方案：

$hex_string = { F4 23 ( 62 B4 | 56 ) 45 }

他会匹配：F42362B445 or F4235645

3.2.2 Text strings

不区分大小写：nocase

不区分大小写：

$text_string = "foobar" nocase

宽字节：wide

如果字符串“Borland”的编码为每个字符两个字节（即B\x00o\x00r\x00l\x00a\x00n\x00d\x00），则以下规则将匹配：

rule WideCharTextExample1
{
    strings:
        $wide_string = "Borland" wide

    condition:
        $wide_string
}

如果要同时搜索ASCII和wide格式的字符串，可以将ASCII修饰符与wide结合使用，先后顺序无所谓：

$wide_and_ascii_string = "Borland" wide ascii

默认情况下Text就是ascii的。

XOR：xor

以下规则将搜索应用于字符串“This program cannot”（包括明文字符串）的每个单字节XOR：

rule XorExample1
{
    strings:
        $xor_string = "This program cannot" xor

    condition:
        $xor_string
}

等价于：

rule XorExample2
{
    strings:
        $xor_string_00 = "This program cannot"
        $xor_string_01 = "Uihr!qsnfs`l!b`oonu"
        $xor_string_02 = "Vjkq\"rpmepco\"acllmv"
        // Repeat for every single byte XOR
    condition:
        any of them
}

同样可以结合wide和ascii使用

base64：base64

以下规则将搜索字符串“此程序无法”的三个base64排列：

rule Base64Example1
{
    strings:
        $a = "This program cannot" base64

    condition:
        $a
}
/*
VGhpcyBwcm9ncmFtIGNhbm5vd
RoaXMgcHJvZ3JhbSBjYW5ub3
UaGlzIHByb2dyYW0gY2Fubm90
*/

base64wide修改器的工作方式与base64修改器类似，但base64调整器的结果将转换为wide。

base64和base64宽修饰符还支持自定义字母表，当然字母表长度必须为64

$a = "This program cannot" base64("!@#$%^&*(){}[].,|ABCDEFGHIJ\x09LMNOPQRSTUVWXYZabcdefghijklmnopqrstu")

Searching for full words：fullword

比如说匹配domain的话，www.mydomain.com这样不会被匹配，但：www.my-domain.com and www.domain.com这样会被匹配

3.2.3 正则

正则同样可以后面跟nocase，ascii等修饰符

rule RegExpExample1
{
    strings:
        $re1 = /md5: [0-9a-fA-F]{32}/
        $re2 = /state: (on|off)/

    condition:
        $re1 and $re2
}

可以在后面结束的斜杠后面加i或者s，用于指定正则表达式不区分大小写。

.可以匹配新一行的字符：

rule RegExpExample2
{
    strings:
        $re1 = /foo/i    // This regexp is case-insentitive
        $re2 = /bar./s   // In this regexp the dot matches everything, including new-line
        $re3 = /baz./is  // Both modifiers can be used together
    condition:
        any of them
}

YARA正则表达式识别以下元字符：

`\`	引用下一个元字符
`^`	匹配文件的开头，或在用作左括号后的第一个字符时，对字符类求反
`$`	匹配文件的结尾
`.`	匹配除换行符以外的任何单个字符
**`	`**
`()`	Grouping
`[]`	Bracketed character class

以下量词也可以识别：

*``**	Match 0 or more times
`+`	Match 1 or more times
`?`	Match 0 or 1 times
`{n}`	Match exactly n times（精确匹配）
`{n,}`	Match at least n times（至少匹配n次）
`{,m}`	Match at most m times（最多匹配n次）
`{n,m}`	Match n to m times

以下转义字符可识别：

`\t`	Tab (HT, TAB)
`\n`	New line (LF, NL)
`\r`	Return (CR)
`\f`	Form feed (FF)换页
`\a`	Alarm bell
`\xNN`	序号为给定十六进制数的字符

公认字符类：

`\w`	匹配单词字符 (alphanumeric plus “_”)
`\W`	匹配非单词字符
`\s`	匹配空白字符
`\S`	Match a non-whitespace character
`\d`	匹配十进制数字字符
`\D`	Match a non-digit character

3.3 Conditions

conditions就是布尔表达式，and or not ，关系运算符，算数运算符，位运算等。

整数的长度始终为64位，使用位运算符时（例如，~0x01不是0xFE，而是0xFFFFFFFFFE）。

3.3.1 Counting strings

如题，就是计算string的次数，这里使用的是井号

rule CountExample
{
    strings:
        $a = "dummy1"
        $b = "dummy2"

    condition:
        #a == 6 and #b > 10
}

yara4.2.0后，

##a in (filesize-500..filesize) == 2
//文件最后500个字节中的“a”字符串数必须正好等于2。

3.3.2 String offsets or virtual addresses

需要知道字符串是否位于文件上的某个特定偏移量或进程地址空间中的某个虚拟地址，使用at。

rule AtExample
{
    strings:
        $a = "dummy1"
        $b = "dummy2"

    condition:
        $a at 100 and $b at 200
}

at的优先级高于and

at运算符允许在文件或进程内存空间中的虚拟地址的某个固定偏移量处搜索字符串，而in运算符允许在偏移量或地址范围内搜索字符串。

rule InExample
{
    strings:
        $a = "dummy1"
        $b = "dummy2"

    condition:
        $a in (0..100) and $b in (100..filesize)
}

可以使用@a[i]获得字符串$a第i次出现的偏移量或虚拟地址。索引是基于1的，因此第一次出现是@a[1]，第二次出现是@a[2]，依此类推。如果提供的索引大于字符串的出现次数，则结果将是一个NaN（不是数字）值。

3.3.3 File size

直接就是filesize

rule FileSizeExample
{
    condition:
        filesize > 200KB
}

3.3.4 Executable entry point

另个一特殊的变量就是entrypoint，当然，使用这个的前提必须是pe文件

rule EntryPointExample1
{
    strings:
        $a = { E8 00 00 00 00 }

    condition:
        $a at entrypoint
}

rule EntryPointExample2
{
    strings:
        $a = { 9C 50 66 A1 ?? ?? ?? 00 66 A9 ?? ?? 58 0F 85 }

    condition:
        $a in (entrypoint..entrypoint + 10)
}

3.3.5 给定位置的数据访问

int8(<offset or virtual address>)
int16(<offset or virtual address>)
int32(<offset or virtual address>)

uint8(<offset or virtual address>)
uint16(<offset or virtual address>)
uint32(<offset or virtual address>)

int8be(<offset or virtual address>)
int16be(<offset or virtual address>)
int32be(<offset or virtual address>)

uint8be(<offset or virtual address>)
uint16be(<offset or virtual address>)
uint32be(<offset or virtual address>)

举例：

rule IsPE
{
    condition:
        // MZ signature at offset 0 and ...
        uint16(0) == 0x5A4D and
        // ... PE signature at offset stored in MZ header at 0x3C
        uint32(uint32(0x3C)) == 0x00004550
}

3.3.6 字符串集

使用of，至少存在字符串集中的一部分。

rule OfExample1
{
    strings:
        $a = "dummy1"
        $b = "dummy2"
        $c = "dummy3"

    condition:
        2 of ($a,$b,$c)
}

也可以这样：

rule OfExample2
{
    strings:
        $foo1 = "foo1"
        $foo2 = "foo2"
        $foo3 = "foo3"

    condition:
        2 of ($foo*)  // equivalent to 2 of ($foo1,$foo2,$foo3)
}

rule OfExample3
{
    strings:
        $foo1 = "foo1"
        $foo2 = "foo2"

        $bar1 = "bar1"
        $bar2 = "bar2"

    condition:
        3 of ($foo*,$bar1,$bar2)
}

也可以直接使用($*)引用所有字符串，或者可以使用them

rule OfExample4
{
    strings:
        $a = "dummy1"
        $b = "dummy2"
        $c = "dummy3"

    condition:
        1 of them // equivalent to 1 of ($*)
}

还有以下几种：

all of them       // all strings in the rule
any of them       // any string in the rule
all of ($a*)      // all strings whose identifier starts by $a
any of ($a,$b,$c) // any of $a, $b or $c
1 of ($*)         // same that "any of them"
none of ($b*)     // zero of the set of strings that start with "$b"

3.3.7 对多个字符串应用相同的条件

使用的是for...of操作符：

for expression of string_set : ( boolean_expression )

从string_ set中的这些字符串中，至少它们的expression必须满足boolean_expression

可以使用$来做占位符：

for any of ($a,$b,$c) : ( $ at pe.entry_point  )

3.3.8 使用带of和for..of的匿名字符串

rule AnonymousStrings
{
    strings:
        $ = "dummy1"
        $ = "dummy2"

    condition:
        1 of them
}

3.3.9 迭代字符串出现次数

rule Occurrences
{
    strings:
        $a = "dummy1"
        $b = "dummy2"

    condition:
        for all i in (1,2,3) : ( @a[i] + 10 == @b[i] )
    	//也可以这样写
    	//for all i in (1..3) : ( @a[i] + 10 == @b[i] )
}

#a表示$a的出现次数

for all i in (1..#a) : ( @a[i] < 100 )
//每次出现都在前100字节内

3.3.10 Iterators

yara4.0后for...of 得到了改善

for any section in pe.sections : ( section.name == ".text" )
//等价
for any i in (0..pe.number_of_sections-1) : ( pe.sections[i].name == ".text" )

3.4 More rules

3.4.1 Global rules

全局规则允许您同时在所有规则中施加限制。全局规则可以有很多

global rule SizeLimit
{
    condition:
        filesize < 2MB
}

3.4.2 Rule tags

这些tag稍后可以用于过滤YARA的输出

3.4.3 Metadata

没啥说的，上面写了。

4.Modules

4.1 PE

https://yara.readthedocs.io/en/v4.2.3/modules/pe.html

有点多，，没啥写的

后面那个elf也是

4.2 Magic

就是文件的 magic，可以file命令查看

Magic Module一共两个函数：

type()：returning a string with the type of the file.

magic.type() contains "PDF"
mime_type()：returning a string with the MIME type of the file.

magic.mime_type() == "application/pdf"

4.3 Hash

不用解释了，，有：MD5，sha1，sha256

4.4 Math

允许您从文件的某些部分计算某些值，并基于这些结果创建签名。

4.5 Console module

log(string)：console.log(pe.imphash())
log(message,string)：console.log("The imphash is: ", pe.imphash())

后面的参数不止可以string，int，float也可