Lexical criminal identification for chatting corpus

Siti Hanom Marjuni, Abdullah Mohd. Zin, Ramlan Mahmod, Aida Mustapha, Abd Azim Abd Ghani

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

This paper aims to identify lexical of criminal elements for chatting corpus, which involved suspect and victim conversation utterances. Lexical criminal identification requires three processes. The first is tokenization to automatically assign each lexical with a corresponding serial number in every suspect and victim utterance. The second is tagging the lexical with parts of speech to identify verbs and nouns in the utterances. The third is to identify and analyze the interrogative criminal construct to get the criminal evidence. The chatting corpus consists of 3,067 suspect and victim utterances with 16,278 words, collected from 9 criminal chatting cases. The results indicate that both verb and noun are the most important part of speech elements that represent the criminal constructs in chat utterances.

Original languageEnglish
Title of host publicationProceedings - 2009 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009
Pages360-364
Number of pages5
DOIs
Publication statusPublished - 2009
Event2009 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009 - Beijing
Duration: 8 Aug 200911 Aug 2009

Other

Other2009 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009
CityBeijing
Period8/8/0911/8/09

Keywords

  • Chatting
  • Criminal construct
  • Criminal evidence
  • Lexicon
  • Part of speech
  • Tagging

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems
  • Software

Cite this

Marjuni, S. H., Mohd. Zin, A., Mahmod, R., Mustapha, A., & Ghani, A. A. A. (2009). Lexical criminal identification for chatting corpus. In Proceedings - 2009 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009 (pp. 360-364). [5234700] https://doi.org/10.1109/ICCSIT.2009.5234700

Lexical criminal identification for chatting corpus. / Marjuni, Siti Hanom; Mohd. Zin, Abdullah; Mahmod, Ramlan; Mustapha, Aida; Ghani, Abd Azim Abd.

Proceedings - 2009 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009. 2009. p. 360-364 5234700.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Marjuni, SH, Mohd. Zin, A, Mahmod, R, Mustapha, A & Ghani, AAA 2009, Lexical criminal identification for chatting corpus. in Proceedings - 2009 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009., 5234700, pp. 360-364, 2009 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009, Beijing, 8/8/09. https://doi.org/10.1109/ICCSIT.2009.5234700
Marjuni SH, Mohd. Zin A, Mahmod R, Mustapha A, Ghani AAA. Lexical criminal identification for chatting corpus. In Proceedings - 2009 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009. 2009. p. 360-364. 5234700 https://doi.org/10.1109/ICCSIT.2009.5234700
Marjuni, Siti Hanom ; Mohd. Zin, Abdullah ; Mahmod, Ramlan ; Mustapha, Aida ; Ghani, Abd Azim Abd. / Lexical criminal identification for chatting corpus. Proceedings - 2009 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009. 2009. pp. 360-364
@inproceedings{0042dd2a26b24b3b9d557836ebb14529,
title = "Lexical criminal identification for chatting corpus",
abstract = "This paper aims to identify lexical of criminal elements for chatting corpus, which involved suspect and victim conversation utterances. Lexical criminal identification requires three processes. The first is tokenization to automatically assign each lexical with a corresponding serial number in every suspect and victim utterance. The second is tagging the lexical with parts of speech to identify verbs and nouns in the utterances. The third is to identify and analyze the interrogative criminal construct to get the criminal evidence. The chatting corpus consists of 3,067 suspect and victim utterances with 16,278 words, collected from 9 criminal chatting cases. The results indicate that both verb and noun are the most important part of speech elements that represent the criminal constructs in chat utterances.",
keywords = "Chatting, Criminal construct, Criminal evidence, Lexicon, Part of speech, Tagging",
author = "Marjuni, {Siti Hanom} and {Mohd. Zin}, Abdullah and Ramlan Mahmod and Aida Mustapha and Ghani, {Abd Azim Abd}",
year = "2009",
doi = "10.1109/ICCSIT.2009.5234700",
language = "English",
isbn = "9781424445196",
pages = "360--364",
booktitle = "Proceedings - 2009 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009",

}

TY - GEN

T1 - Lexical criminal identification for chatting corpus

AU - Marjuni, Siti Hanom

AU - Mohd. Zin, Abdullah

AU - Mahmod, Ramlan

AU - Mustapha, Aida

AU - Ghani, Abd Azim Abd

PY - 2009

Y1 - 2009

N2 - This paper aims to identify lexical of criminal elements for chatting corpus, which involved suspect and victim conversation utterances. Lexical criminal identification requires three processes. The first is tokenization to automatically assign each lexical with a corresponding serial number in every suspect and victim utterance. The second is tagging the lexical with parts of speech to identify verbs and nouns in the utterances. The third is to identify and analyze the interrogative criminal construct to get the criminal evidence. The chatting corpus consists of 3,067 suspect and victim utterances with 16,278 words, collected from 9 criminal chatting cases. The results indicate that both verb and noun are the most important part of speech elements that represent the criminal constructs in chat utterances.

AB - This paper aims to identify lexical of criminal elements for chatting corpus, which involved suspect and victim conversation utterances. Lexical criminal identification requires three processes. The first is tokenization to automatically assign each lexical with a corresponding serial number in every suspect and victim utterance. The second is tagging the lexical with parts of speech to identify verbs and nouns in the utterances. The third is to identify and analyze the interrogative criminal construct to get the criminal evidence. The chatting corpus consists of 3,067 suspect and victim utterances with 16,278 words, collected from 9 criminal chatting cases. The results indicate that both verb and noun are the most important part of speech elements that represent the criminal constructs in chat utterances.

KW - Chatting

KW - Criminal construct

KW - Criminal evidence

KW - Lexicon

KW - Part of speech

KW - Tagging

UR - http://www.scopus.com/inward/record.url?scp=70449096182&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70449096182&partnerID=8YFLogxK

U2 - 10.1109/ICCSIT.2009.5234700

DO - 10.1109/ICCSIT.2009.5234700

M3 - Conference contribution

SN - 9781424445196

SP - 360

EP - 364

BT - Proceedings - 2009 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009

ER -