Building a Libyan Dialect Lexicon-Based Sentiment Analysis System Using Semantic Orientation of Adjective-Adverb Combinations

Twitter is a social media network website, where its users can post their opinions and sentiments about issues, objects, services, places or people in short text messages called tweets. The sentiment information which is extracted from analyzing tweets is very useful in various aspects such as business, governments and so on. Although Arabic dialects social media sentiment analysis has attracted several studies, yet there has been almost no work on the Libyan dialect sentiment analysis. In this research, an adjective priority scoring algorithm which calculates the sentiment orientation of adjective-adverb combinations is used to build a fine-grained sentiment analysis system for classifying Libyan dialect tweets into seven categories. Therefore, we exploit a freely available Libyan dialect twitter corpus, which contains 5000 sentences or tweets to carry out our work, the tweets in the corpus were equally divided into two data sets (study and test). Adjectives and adverbs in the study data set were manually collected to construct sentiment dictionaries or lexicons. Consequently, approximately 108 adjectives were stored in a adjectives dictionary, the polarities or semantic orientation scores of these adjectives were manually assigned by two annotators in the range of [+2,-2]. Likewise, each adverb of degree was scored in the range from 0 to 1 and stored them in a separate dictionary which totally contains 27 adverbs. Our system yields an F-score of 82.19% on the test data set.


I. INTRODUCTION
Sentiment analysis is an automatic process for extracting sentiments or opinions from a text, which written by individuals. In this research, we aim to achieve a finegrained Twitter sentiment analysis system for the Libyan dialect by classifying tweets or sentences into many sentiment categories. Generally, sentiment analysis has been mainly classified into three levels: document, sentence and entity and aspect level. In sentence level, sentiment analysis basically consists of two tasks: the first task is distinguishing subjective from objective sentences. If the sentence is subjective, the system will first determine whether the sentence expresses positive, negative or neutral. Then in the second task, the sentiment analysis system will automatically identify the intensity of these sentiments or Manuscript received May 9, 2020; revised August 1, 2020. Husien A. Alhammi is with the Department of Electrical and Electronic, Higher Institute of Science and Technology of Zawia, Libya (e-mail: h1974hami@gmail.com).
opinions, in detail, whether these positive or negative opinions are strong, moderate or weak, usually giving a value in a scaling system. Fine-grained sentiment analysis systems commonly use a sentiment scoring system to classify text into several categories. There have been three approaches for sentiment analysis, which can be classified as lexicon-based [1], [2] machine learning [3], [4] and hybrid approach [5], [6]. In the lexicon-based approach, the systems that perform sentiment analysis based on a set of manually crafted rules, whereas the machine learning approach based on machine learning techniques that learn from a large annotated training data set to classify opinions. In the last approach, the systems combine lexicon-based and machine learning approaches in one method. Mainly, all sentiment analysis approaches need many tools and resources in the field of natural language processing (NLP) to be carried out such as corpora, training data set, stemmer and dictionaries. Unfortunately, the Libyan dialect lacks such kind of tools and resources which can be used in computing sentiment information, for example, part of speech tagger (POS) and language parser are used to understand the context of the text, these tools are very important to do (NLP) research. In contrast, the lexiconbased approach utilizes fewer resources and (NLP) tools than the other approaches, that is why we decided to use lexicon-based approach. Accordingly, adjectives convey sentiment information and much of subjective content in text so they can be used as features to perform sentiment analysis. Therefore, a lexicon-based approach that utilizes adjective-adverb combinations as features to compute sentiment was used in our system. This system requires an adjectives dictionary with their corresponding semantic orientation, and it also needs an adverbs dictionary with their corresponding modifiers that can be used for modifying adjectives.
Our system mainly exploits a freely available Libyan dialect Twitter corpus which is initially extracted from Twitter [7], the corpus was divided into two data sets: a study data set and a test data set. Likewise, our proposed system basically based on a very important task called linguistic study, which is done on the study data set. In the linguistic study, many different types of adjectives and adverbs that appeared in the study data set were manually identified and collected to build sentiment lexicons or dictionaries. Moreover, the study covered all morphological forms of features and their combinations that existing in the study data set. The second data set which called a test data set was used to evaluate the system. The goal of the study is to introduce a lexical based system for classifying Libyan dialect tweets into seven categories based on semantic orientation scores of adjective-adverb combinations.

II. RELATED WORK
Many types of research in many languages focus on (POS) that bear sentiment information as features to achieve sentiment analysis. In 2011, [1] developed a lexicon-based system for text sentiment analysis called (SO-CAL) semantic-orientation CALculator. They also described the development of sentiment dictionaries that contain adjectives, nouns, verbs, adverbs, intensifiers and negation with Their semantic orientations or polarities. The idea of (SO-CAL) was to assign the polarities positive or negative to the text. [8] used the semantic orientation of adjectives and adverbs for performing opinion classification on product reviews. In his work, a search engine was used to treat the internet as a very large corpus for estimating the semantic orientation of pairs of words or phrases based on the (PMI-IR) method, the (PMI-IR) uses Pointwise Mutual Information (PMI) and Information Retrieval (IR). His system achieved an average accuracy of 74% on 410 reviews which were collected from Epinion in different domains. [9] Their work focuses on the effect of adverbs of degree on adjectives to measure the overall sentiment of sentences. They have proposed three different adverbadjective combinations (AAC) scoring algorithms for sentiment analysis, these (AAC) scoring algorithms namely: variable scoring, adjective priority scoring, and adverb first scoring algorithm. They also defined a general set of axioms for this purpose. Their results showed that the adjective priority scoring algorithm outperformed the others, and they stated that the use of adverb-adjective combinations is better than using only adjectives to compute sentiment. [10] They introduced a lexical-based method for classifying the sentiment of Facebook comments that were written in the Malay language. Two types of lexical-based techniques called term counting and term counting average were implemented to classify the sentiment of Malay Facebook comments. Verbs, adverbs and negations were taken into account and create a list of (POS) combinations to be implemented in term counting (TC) and term counting average (TCAvg) scoring method. The work showed that the term counting works better for adjectives and adverbs while term counting average performs better for verbs and negation words. [11] developed an algorithm for sentiment analysis based on adverb-adjective-noun combinations (AANC). Their work based on deploying linguistic analysis of adverbs of degree, domain-specific adjective and abstract noun. And they defined a set of general axioms based on a classification of adverbs of degree into five categories, adjectives into ten specific domains and abstract nouns in two categories. The main algorithm consists of two proposed algorithms unary (AANC) algorithm and binary (AANC) Algorithm.

III. LINGUISTIC STUDY
In our linguistic study, the adjectives and adverbs that appeared in the study data set were identified and collected by authors. About 108 adjectives and 27 adverbs were recognized from the study data set which consists of 2500 tweets. The study included a deep linguistic analysis of adverbs of degree, and also studied the syntactic constructions of adverb-adjective combinations that appeared in the entire study data set. The study data set contains several different morphological forms of adjectives. For example, the adjective ‫"قنينة"‬ for singular feminine which means in English "beautiful" was identified in the study data set, the same adjective appeared in the other form ‫"قنينات"‬ which also means in English "beautiful" but for the plural feminine. Our linguistic study takes into account all adjective morphological forms. The study also showed that adverbs are usually placed after the adjectives, for example, the Libyan dialect words ‫هلبة"‬ ‫"غالي‬ which is equivalent to English words "very expensive", the adverb ‫"هلبة"‬ was located after the adjective ‫."غالي"‬ Additionally, in many cases, the adverbs in the study data set might be repeated to increase adjective intensity in the same sentence, we have picked up an example of repeated intensifiers from the study data set, the words " ‫غال‬ ‫هلبة‬ ‫هلبة‬ ‫ي‬ " which main in English "very very expensive", the twice-repeated adverb "very" clearly increases the intensity of the adjective "expensive". Furthermore, the following structure "adjective +adverb +Conjunction +adjective", for example, the Libyan dialect words ‫يمحن"‬ ‫و‬ ‫"مليح‬ which is equivalent to English words "it is nice and fantastic" frequently exist in the study data set. Moreover, adjectives also in the study data set frequently appeared in the form of a twice-repeated adjective. For example, the words ‫دانداني"‬ ‫"دانداني‬ which is equivalent to English words "good good", Arabic in this case, is different from English. Grammatically in English, a twice-repeated adjective is not used, and it makes no sense.

A. Adverbs Classification
An adverb is a part of speech that modifies or qualifies a verb, adjective, adverb, phrase, clause or sentence. In our linguistic study, many adverbs types that appeared in the study data set were collected and classified into several categories. Semantically, adverbs in our study were classified into six categories based on distinct conceptual notions [12], [13]. First of all, adverbs of time: these adverbs that express when an action or event takes place. For instance, the adverb ‫"غدوة"‬ which means in English "tomorrow". Secondly, adverbs of frequency: these adverbs that express how often an action or event takes place such as the adverb ‫"ديمة"‬ which means in English "always". Third, adverbs of location: these adverbs that express where an action or event takes place. For example, the adverb word ‫"البرا"‬ which means in English "abroad and outside". Next, adverbs of manner: these adverbs that express how an action or event takes place. For example, the adverb word ‫"فيسع"‬ which means in English "quickly". After that, conjunctive adverbs: these adverbs that used to link between two sentences, for instance, the adverb word ‫"بعتالي"‬ which means in English "then". Finally, adverbs of degree: tell us about the intensity of something. For example, the adverb word ‫"واجد"‬ which means in English "much and very". In this work, all previous types of adverbs were ignored other than adverbs of degree because they have no impact on sentiment words, adverbs of degree play major role to International Journal of Computer Theory and Engineering, Vol. 12, No. 6, December 2020 compute the sentiment. The sentiment of words can be affected by adverbs of degree, adverbs of degree can be classified as follows [14], [15].
• Adverbs of affirmation: these include adverbs that are used in a sentence to affirm it as true, such as the adverb word ‫"بزبط"‬ which means in English "exactly". The sentence ‫بزبط"‬ ‫قلت‬ ‫شنو‬ ‫"هدا‬ which are in English "that is exactly what I said" shows the effect of affirmation adverbs on sentence. • Adverbs of doubt: these include adverbs that convey the lack of absolute certainty about something, such as the adverb word ‫"مرات"‬ which means in English "possibly". For example the phrase ‫حق"‬ ‫"مرات‬ which are in English "it is possibly true", it shows the case of uncertainty. • Strong intensifying adverbs: these include adverbs that emphasize or amplify another word or phrase. Also known as a booster or an amplifier, such as the adverb word ‫"هلبة"‬ which means in English "exceedingly or very". For example the phrase ‫هلبة"‬ ‫"متكنطي‬ which are in English "he is very angry" shows the effect of strong intensifying adverbs on the words. • Weak intensifying adverbs: these include adverbs that tone down the strength of another word in the sentence such as the adverb word " ‫م‬ ‫اش‬ ‫اله‬ " which means in English "slightly" that is used in the phrase ‫تزيد"‬ ‫ماله‬ ‫اش‬ ‫"االسعار‬ which means in English "prices slightly increase" to tone down the sentence strength.
• Negation adverbs: these include adverbs that are necessary to state that a fact is not true, it can be done by using any negative words such as no, not and never. The adverb word ‫"مش"‬ which means in "not and no" is used to make negative statements, for example, the phrase ‫قنينة"‬ ‫"مش‬ which are in English "she is not beautiful". • Minimizes adverbs: these include adverbs that usually reduce the sentiments in both positive and negative such as "hardly", it reduces the positive and negative degree of the sentence. The adverb word ‫"شوية"‬ which means in "a bit" is used in the phrase " ‫منه‬ ‫نتك‬ ‫شوي‬ ‫ة‬ " which are in English "I'm feeling a bit tired" to reduce the negative degree of the sentence. In the study data set we found that the most frequent adverbs were ‫"هلبة"‬ which means in English "very or much" and ‫"شوية"‬ which means in English "a little", and they were widely used to express sentiments.

IV. SENTIMENT SCORES GRANULARITY
We proposed a general scoring axiom for assigning sentiment scores to adjectives to be in one of these polarities: weak positive, moderate positive, strong positive, weak negative, moderate negative, strong negative or neutral. The scoring axiom is used to map between polarities and sentiment scores and vice versa. In our scoring axiom, if score +2 is assigned to a particular adjective then the adjective is considered to belong to strong positive polarity or category. Similarly, this axiom can also be used to assign final sentiment to tweet or text, we consider the text as strong negative if score -2 is assigned to it. Fig. 1 shows the scale of sentiment scores of our axiom.

V. SENTIMENT LEXICONS GENERATION
Adjectives can be assigned scores in the scale from negative to positive numbers for indicating their polarity strengths. Sentiment dictionaries or lexicons can be either constructed manually [16] or automatically [17]. The automatically method can be classified into two categories: (i) lexicon-based approaches which start with a small positive and negative seed words list to expand the sentiment dictionary, for example, [18] use machine learning-based approaches to assign scores in the interval [-1; +1] to adjectives. They use the WordNet synonymy relation between adjectives to expand their seed sets of opinion words. (ii) corpus-based approaches that use a large corpus or large collection of data such as web to extract semantic relations between words and use the statistical measurements to calculate the sentiment orientation of words based on their syntactic or co-occurrence relationships. The Pointwise Mutual Information (PMI) technique was used for this purpose. For example, [19], [20] computed the score or semantic orientation of target adjectives by computing their mutual information with a set of seed adjectives. In our work, the dictionary of adjectives semantic orientation for Libyan dialect was built manually, due to the lack of required resources for both lexicon-based and corpus-based approaches, such as WordNet which is a lexical database for the English language.
To assign semantic orientation or sentiment score to each adjective, two annotators were recruited for scoring adjectives in the range of [+2, -2]. Annotators who determine which category of adjectives belong to (weak positive, moderate positive, strong positive, weak negative, moderate negative, strong negative or neutral). These adjectives have been stored in a dictionary or lexicon called an adjective sentiment dictionary. A score of +2 denotes that the adjective is maximally positive, while a score of -2 denotes that the adjective is maximally negative. Likewise, scores between 0 and 1 were assigned to adverbs of degree by authors. A score of 1 indicates that the adverb has maximum impact on adjectives whereas a score of 0 indicates that the adverb has no impact on adjectives. Also, all of these adverbs have been stored in a lexicon called adverbs sentiment dictionary.

A. Agreement Study
To measure the reliability of both test data set and semantic orientation of adjectives in sentiment dictionaries which are manually annotated by humans, the interannotator agreement study was conducted on the annotated data by using Cohen's Kappa [21]. The axiom in section 4 is used to measure Cohen's Kappa for both test data set and sentiment dictionaries entries. For example, if annotator 1 assigns a score of 1.18 to adjective "good" and the score of 0.90 is assigned to the same adjective by annotator 2. In this case, there is an agreement between two annotators, because both adjective scores 1.18 and 0.90 belong to the same sentiment polarity called moderate positive. The overall obtained Kappa weights were K=0.814 for test data set and K=0.781 for semantic orientation or scores of adjectives, obtained Cohen's Kappa values indicate reliable data annotations [22]. Two annotators annotated approximately 2500 tweets in the test data set and about 108 adjectives in sentiment dictionary.

VI. ADJECTIVE-ADVERB COMBINATIONS SCORING ALGORITHMS
The main idea of the Adjective-Adverb Combinations (AAC) sentiment analysis technique is to calculate a sentiment value based on the effect of adverbs on adjectives. Although adverbs do not have prior polarity, they can play the main role in determining the sentiment of a text. For example, if the score or polarity strength of adjective word "good" is 1 and the adverb score "very" is 0.8 then the final score or semantic orientation of an adjective-adverb combination of "very good" is calculated as flowing: 1(100%+80%)=1.8. It is clear that the intensifying adverb "very" amplified the intensity of adjective "good". As shown in the above example, the adjective-adverb combinations are used for obtaining a better result than to use only adjectives, because adverbs of degree determine the intensity of sentiment bearing adjectives.
There have been three alternative (AAC) scoring algorithms were presented by [10]. These scoring algorithms compute the score of adjectives according to the type of adverb of degree which might be contiguous with them. The first algorithm called variable scoring it modifies adjective scores in different ways, based on the score of the adjective. The second algorithm is adjective priority scoring, it scores an (AAC) by modifying the adjective score by assigning a fixed weight to the relevance of adverbs. The last algorithm namely adverb first scoring which scores an (AAC) by modifying the score of an adverb by assigning relevance to each adjective. Their experiments showed that the adjective priority scoring algorithm is the best. Consequently, we chose an adjective priority scoring algorithm to conduct our work.

VII. SYSTEM OVERVIEW
Our proposed sentiment analysis system was divided into two major phases as shown in Fig. 2, data preprocessing and sentiment assignment phase. The first phase involves two processes of tokenization and identification. In the tokenization process, the tweet is broken into keywords called tokens. Once prepositions, punctuation and stop words are discarded and eliminated from the tweet, the remained tokens are stored in the list called a bag of words. In the second process, only adjectives that bear positive or negative sentiment and also adverbs that modify these adjectives will be identified and extracted.
Sentiment dictionaries which contain lists of adjectives and adverbs with their sentiment scores are also used to identify the adjectives and adverbs from a given bag of words of a tweet, the dictionary lookup method is used to identify adjectives and adverbs. The process starts by matching words in the targeted tweet against the sentiment dictionaries entries. If a given word in the targeted tweet is found in the dictionary, then a given word will be extracted. The type of a given word is considered either an adjective or an adverb depending on the type of dictionary entry that matches it. As well as, the semantic orientation value of sentiment dictionaries entries are assigned to given adjectives or adverbs that match these entries. In the sentiment assignment phase, a specific algorithm called adjective priority scoring (APS) is used to obtain the final sentiment score of a tweet by calculating the score of the adjective-adverb combination that appears in that tweet. We also adapted the algorithm to be extended to treat binary (APS) and negated (APS). Negation always reverses the sentiment score of any sentiment words. Eventually, the final sentiment score which obtained from (APS) algorithm was mapped to the corresponding polarity according to our axiom in section 4. The (APS) algorithm was implemented by PHP on a local server. Fig. 3 shows the adjective priority scoring algorithm.

VIII. EVALUATION AND DISCUSSION
Almost 2500 tweets in our test data set were used for evaluating the performance of the system, the tweets were manually annotated as one of seven polarities. The accuracy of the system was manually examined against the baseline test data set. In this section, we have reviewed some examples that were produced by our system to discuss its performance in terms of the accuracy of classifying tweets. Firstly, we have stated few examples that were successfully classified by the system, and then we also focus on a set of International Journal of Computer Theory and Engineering, Vol. 12, No. 6, December 2020 examples that were wrongly classified and what could be the reasons behind that? Here is an example, the tweet " ‫جوك‬ ‫اليو‬ ‫م‬ ‫بكل‬ ‫بلهون‬ " which means in English "you are in a bad mood today", it was classified to the correct sentiment category by the system. Another example is the tweet " ‫مش‬ ‫واجد‬ ‫"حلو‬ which means in English "it is not very good" contains two adverbs and one adjective. The tweet was treated by binary algorithm, the algorithm firstly computed the semantic orientation score of an intensifying adverb "very" and an adjective "good", then the score was negated by a negation adverb "not". Furthermore, tweets which have double adjective were not manipulated by the system. For example, the tweet ‫هلبة"‬ ‫متكنطي‬ ‫متكنطي‬ ‫"انا‬ which contains the same dual adjective "I am very upset upset" was not treated, despite it has strong negative sentiment, a dual adjective is allowed in Arabic. Also, the system did not succeed to handle some compound sentences which contain other adverb-adjective syntactic constructions with a conjunction. Conjunctions always play a very important role in computing the overall sentiment of compound sentences. For instance, let us take following example, the tweet " 4 ‫جي‬ ‫غالي‬ ‫لكن‬ ‫فت‬ ‫"خت‬ which means in English "4G internet is very fast but expensive" has the structure adjective + adverb + conjunction + adjective. Obviously, sentiment analysis of compound sentences is a difficult task because of compound sentences consist of many clauses which are linked to each other through conjunction, and they may not have the same polarity. There is also a common drawback related to the lexicon-based approach which is the lack of scalability, the lexicons may be incomplete. It always needs to update or expand the number of its entries through tedious manual work. We use the F1-score metric to evaluate system accuracy. In this work, a recall should retrieve only tweets that have adjectives and adverbs. The system's recall, precision and F1-score reached to 78.3%, 86.5% and 82.19% respectively. Noticeably, the system has a low recall due to the coverage of sentiment dictionaries that can lead to the failure of the identification of adjectives and adverbs.
Beyond the scope of our study, there have been many tweets that convey sentiments were expressed by the other part of speech (POS) types such as verbs, adverb and nouns, our system does not deal with verbs and nouns to calculate sentiment. Consider this example, " ‫ال‬ ‫فاز‬ ‫االتحاد‬ ‫ي‬ ‫و‬ ‫م‬ ", which means in English "Aletihad team has won today", the example contains an another part of speech sentiment bearing word which is a verb "won". Although the tweet clearly conveys positive sentiment, the system failed to classify it.

IX. CONCLUSION AND FUTURE WORK
In this work, we used a lexical-based approach to build a fine-grained sentiment analysis system that classifies Libyan dialect tweets into seven categories based on the usage of adjective-adverb combinations as features. To conduct our study two data sets (study and test) were used. In the linguistic study, we used a study data set to carry out the linguistic analysis which included several studies: adverbs of degree, morphological forms of adjectives and syntactic constructions of adjective-adverb combinations. We also define a scoring axiom which used to map between sentiment score and its polarity and vice versa. Two annotators assigned a score from +2 to -2 to adjectives to denote their sentiment strengths. Scores between 0 and 1 were assigned to adverbs by authors. These adjectives and adverbs with their scores were used to construct sentiment lexicons or dictionaries. The agreement study showed that the semantic orientation of adjectives and the test data set are reliable data annotations, where k=(0.814,0.781) respectively. In this paper, an adjective priority scoring algorithm was implemented to compute the final sentiment of tweets. The proposed system has been tested on 2500 annotated tweets called test data set, the obtained result showed that the overall accuracy is well reasonable, achieved an F-score of %82.19 on the test data set. It is clear to note that, the necessity of more additional NLP tools and resources is very crucial to carry out Libyan dialect research. The lack of these tools makes the research more difficult to conduct. Finally, from a linguistic point of view, we can conclude that the method of our system could be adapted and applied to the other Maghreb dialects (Morocco, Algeria and Tunisia), because they have a similar structure of Libyan dialect.
As for future work, we plan to extend our existing dictionaries entries. And we intend to use the other (POS) syntactic constructions such as a verb, adverb and noun combinations rather than only adverb-adjective combinations to improve overall sentiment analysis accuracy. In a short-term plan, we also aim to create some essential NLP tools for the Libyan dialect such as the (POS) tagger.