Concordance and term frequency in analyzing api calls for malware behavior detection

Nur Hilda Amira Abd Wahab, Masnizah Mohd, Ravie Chandren Muniyandi, Balaji Rajendran, Gopinath Palaniappan

Research output: Contribution to journalArticle

Abstract

Application Programming Interface (API) is used for the software to interact with an operating system to do certain task such as opening file, deleting file and many more. Programmers use this API to make it easier for their program to communicate with the operating system without having the knowledge of the hardware of the target system. Malware author is an attacker that may belong to an organization or work for themselves. Some malware author has the capabilities to write their own malware, uses the same kind of APIs that is used to create normal programs to create malware. There are many researches done in this field, however, most researchers used n-gram to detect the sequence of API calls and although it gave good results, it is time consuming to process through all the output. This is the reason why this paper proposed to use Concordance to search for the API call sequence of a malware because it uses KWIC (Key Word in Context), thus only displayed the output based on the queried keyword. After that, Term Frequency (TF) is used to search for the most commonly used APIs in the dataset. The results of the experiment show that concordance can be used to search for API call sequence as we manage to identify six malicious behaviors (Install Itself at Startup, Enumerate All Process, Privilege Escalation, Terminate Process, Process Hollowing and Ant debugging) using this method. And based on the TF score, the most commonly used API in the dataset is the Reg Close Key (TF: 1.388), which on its own is not a dangerous API, hence we can infer that most API is not malicious in nature, it is how they were implemented is making them dangerous.

Original languageEnglish
Pages (from-to)1307-1319
Number of pages13
JournalLubricants
Volume7
Issue number12
DOIs
Publication statusPublished - 1 Jan 2019

Fingerprint

Application programming interfaces (API)
Malware
Computer operating systems
Computer hardware
Computer systems

Keywords

  • API call sequence
  • Concordance
  • Dynamic analysis
  • KWIC
  • Malware behaviors

ASJC Scopus subject areas

  • Mechanical Engineering
  • Surfaces, Coatings and Films

Cite this

Wahab, N. H. A. A., Mohd, M., Muniyandi, R. C., Rajendran, B., & Palaniappan, G. (2019). Concordance and term frequency in analyzing api calls for malware behavior detection. Lubricants, 7(12), 1307-1319. https://doi.org/10.3844/jcssp.2019.1307.1319

Concordance and term frequency in analyzing api calls for malware behavior detection. / Wahab, Nur Hilda Amira Abd; Mohd, Masnizah; Muniyandi, Ravie Chandren; Rajendran, Balaji; Palaniappan, Gopinath.

In: Lubricants, Vol. 7, No. 12, 01.01.2019, p. 1307-1319.

Research output: Contribution to journalArticle

Wahab, NHAA, Mohd, M, Muniyandi, RC, Rajendran, B & Palaniappan, G 2019, 'Concordance and term frequency in analyzing api calls for malware behavior detection', Lubricants, vol. 7, no. 12, pp. 1307-1319. https://doi.org/10.3844/jcssp.2019.1307.1319
Wahab, Nur Hilda Amira Abd ; Mohd, Masnizah ; Muniyandi, Ravie Chandren ; Rajendran, Balaji ; Palaniappan, Gopinath. / Concordance and term frequency in analyzing api calls for malware behavior detection. In: Lubricants. 2019 ; Vol. 7, No. 12. pp. 1307-1319.
@article{e913578bed4145089d4f624eb152c173,
title = "Concordance and term frequency in analyzing api calls for malware behavior detection",
abstract = "Application Programming Interface (API) is used for the software to interact with an operating system to do certain task such as opening file, deleting file and many more. Programmers use this API to make it easier for their program to communicate with the operating system without having the knowledge of the hardware of the target system. Malware author is an attacker that may belong to an organization or work for themselves. Some malware author has the capabilities to write their own malware, uses the same kind of APIs that is used to create normal programs to create malware. There are many researches done in this field, however, most researchers used n-gram to detect the sequence of API calls and although it gave good results, it is time consuming to process through all the output. This is the reason why this paper proposed to use Concordance to search for the API call sequence of a malware because it uses KWIC (Key Word in Context), thus only displayed the output based on the queried keyword. After that, Term Frequency (TF) is used to search for the most commonly used APIs in the dataset. The results of the experiment show that concordance can be used to search for API call sequence as we manage to identify six malicious behaviors (Install Itself at Startup, Enumerate All Process, Privilege Escalation, Terminate Process, Process Hollowing and Ant debugging) using this method. And based on the TF score, the most commonly used API in the dataset is the Reg Close Key (TF: 1.388), which on its own is not a dangerous API, hence we can infer that most API is not malicious in nature, it is how they were implemented is making them dangerous.",
keywords = "API call sequence, Concordance, Dynamic analysis, KWIC, Malware behaviors",
author = "Wahab, {Nur Hilda Amira Abd} and Masnizah Mohd and Muniyandi, {Ravie Chandren} and Balaji Rajendran and Gopinath Palaniappan",
year = "2019",
month = "1",
day = "1",
doi = "10.3844/jcssp.2019.1307.1319",
language = "English",
volume = "7",
pages = "1307--1319",
journal = "Lubricants",
issn = "2075-4442",
publisher = "MDPI AG",
number = "12",

}

TY - JOUR

T1 - Concordance and term frequency in analyzing api calls for malware behavior detection

AU - Wahab, Nur Hilda Amira Abd

AU - Mohd, Masnizah

AU - Muniyandi, Ravie Chandren

AU - Rajendran, Balaji

AU - Palaniappan, Gopinath

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Application Programming Interface (API) is used for the software to interact with an operating system to do certain task such as opening file, deleting file and many more. Programmers use this API to make it easier for their program to communicate with the operating system without having the knowledge of the hardware of the target system. Malware author is an attacker that may belong to an organization or work for themselves. Some malware author has the capabilities to write their own malware, uses the same kind of APIs that is used to create normal programs to create malware. There are many researches done in this field, however, most researchers used n-gram to detect the sequence of API calls and although it gave good results, it is time consuming to process through all the output. This is the reason why this paper proposed to use Concordance to search for the API call sequence of a malware because it uses KWIC (Key Word in Context), thus only displayed the output based on the queried keyword. After that, Term Frequency (TF) is used to search for the most commonly used APIs in the dataset. The results of the experiment show that concordance can be used to search for API call sequence as we manage to identify six malicious behaviors (Install Itself at Startup, Enumerate All Process, Privilege Escalation, Terminate Process, Process Hollowing and Ant debugging) using this method. And based on the TF score, the most commonly used API in the dataset is the Reg Close Key (TF: 1.388), which on its own is not a dangerous API, hence we can infer that most API is not malicious in nature, it is how they were implemented is making them dangerous.

AB - Application Programming Interface (API) is used for the software to interact with an operating system to do certain task such as opening file, deleting file and many more. Programmers use this API to make it easier for their program to communicate with the operating system without having the knowledge of the hardware of the target system. Malware author is an attacker that may belong to an organization or work for themselves. Some malware author has the capabilities to write their own malware, uses the same kind of APIs that is used to create normal programs to create malware. There are many researches done in this field, however, most researchers used n-gram to detect the sequence of API calls and although it gave good results, it is time consuming to process through all the output. This is the reason why this paper proposed to use Concordance to search for the API call sequence of a malware because it uses KWIC (Key Word in Context), thus only displayed the output based on the queried keyword. After that, Term Frequency (TF) is used to search for the most commonly used APIs in the dataset. The results of the experiment show that concordance can be used to search for API call sequence as we manage to identify six malicious behaviors (Install Itself at Startup, Enumerate All Process, Privilege Escalation, Terminate Process, Process Hollowing and Ant debugging) using this method. And based on the TF score, the most commonly used API in the dataset is the Reg Close Key (TF: 1.388), which on its own is not a dangerous API, hence we can infer that most API is not malicious in nature, it is how they were implemented is making them dangerous.

KW - API call sequence

KW - Concordance

KW - Dynamic analysis

KW - KWIC

KW - Malware behaviors

UR - http://www.scopus.com/inward/record.url?scp=85077848340&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85077848340&partnerID=8YFLogxK

U2 - 10.3844/jcssp.2019.1307.1319

DO - 10.3844/jcssp.2019.1307.1319

M3 - Article

AN - SCOPUS:85077848340

VL - 7

SP - 1307

EP - 1319

JO - Lubricants

JF - Lubricants

SN - 2075-4442

IS - 12

ER -