Repository logo
  • Log In
    Have you forgotten your password?
Home
  • Browse Our Collections
  • Researchers
  • Scholarly Output
  • Consultancy / Projects
  • Statistics
  • Log In
    Have you forgotten your password?
  1. Home
  2. Faculties / Institutes
  3. Faculty of Information and Communication Technology
  4. Published Scholarly Output
  5. Kelantan and Sarawak Malay Dialects: Parallel Dialect Text Collection and Alignment Using Hybrid Distance-Statistical-Based Phrase Alignment Algorithm
 
  • Details
Options

Kelantan and Sarawak Malay Dialects: Parallel Dialect Text Collection and Alignment Using Hybrid Distance-Statistical-Based Phrase Alignment Algorithm

Journal
Turkish Journal of Computer and Mathematics Education (TURCOMAT)
ISSN
1309-4653
Date Issued
2021-04-10
Author(s)
Jasmina Khaw Yen Min
Faculty of Information and Communication Technology
Tan Tien Ping
Ranaivo Malancon Bali
DOI
https://doi.org/10.17762/turcomat.v12i3.1160
Abstract
<jats:p>Parallel texts corpora are essential resources especially in translation and multilingual information retrieval. However, the publicly available parallel text corpora are limited to certain types and domains.  Besides, Malay dialects are not standardized in term of writing. The existing alignment algorithms that is used to analayze the writing will require a large training data to obtain a good result. The paper describes our methodology in acquiring a parallel text corpus of Standard Malay and Malay dialects, particularly Kelantan Malay and Sarawak Malay. Second, we propose a hybrid of distance-based and statistical-based alignment algorithm to align words and phrases of the parallel text. The proposed approach has a better precision and recall than the state-of-the-art GIZA++. In the paper, the alignment obtained were also compared to find out the lexical similarities and differences between SM and the two dialects.</jats:p>
File(s)
Loading...
Thumbnail Image
Name

Picture1.png

Type

personal picture

Size

3.11 KB

Format

PNG

Checksum

(MD5):21881560e0c3c9c06b18c6e8fdc11acf

  • Cookie settings
  • Privacy policy
  • End User Agreement
  • Send Feedback