This material may consist of step-by-step explanations on how to solve a problem or examples of proper writing, including the use of citations, references, bibliographies, and formatting. This material is made available for the sole purpose of studying and learning - misuse is strictly forbidden.KNOWLEDGE EXTRACTION IN BIG DATA: LITERATURE REVIEW & SUMMARY MATRIX
TABLE OF CONTENTS
ARTICLE SUMMARIES 3
A Survey of Text Mining Techniques and Applications 3
Implementing WEKA for medical data classification and early disease prevention 5
Measuring Post Traumatic Stress Disorder in Twitter 6
Perception Differences between the Depressed and Non-Depressed Users in Twitter 7
Predicting Depression via Social Media 8
Predicting Depression Levels Using Social Media Posts 9
Identifying Depression on Twitter 10
Analyzing Clinical Depressive Symptoms in Twitter 11
SUMMARY MATRIX 12
This paper is a review of eight published papers and presentations that focus on Knowledge Extraction. The term generally refers to the gleaning of information, intelligence, and insight from large datasets. The articles examined generally focus on depression, and utilize user profile information, depression survey and screening tools, and Twitter postings, although some use other Social Networking Sites (SNS), including Facebook and LiveJournal. Most focus on text processing, contextual analysis, and classifiers, as tweets are text/word based and linguistic in nature. Various statistical and machine learning techniques are explored, and some articles compare predictive results among various methodological approaches and computational/statistical techniques.
Social media and its posts provide and extremely rich source of actual behavior information together with attitudinal and emotional clues. The mental state and condition of individuals, most of the authors of the papers discussed believe, can be determined and classified, and some research is geared on not just a binary classification of depressive disorder, but indicate illness severity if properly analyzed. In total, the articles indicate current weaknesses in the self-reported, generally after-onset mental illness occurrence, specifically depression, as diagnosed today. Data mining, machine learning, predictive modelling, pattern recognition, and language processing all present opportunities for better prognosis, diagnosis, monitoring, and treatment of mental illness, it is asserted.
The first paper discussed, A Survey of Text Mining Techniques and Applications is presented first, since it gives the broadest overview of semantic and linguistic processing utilized in the majority of the following articles. The last article, Analyzing Clinical Depressive Symptoms in Twitter, is last because it is the most high-level and presents the least new information.
A Survey of Text Mining Techniques and Applications - Vishal Gupta and Gurpreet S. Lehal
The purpose of this paper is to present a broad overview of data mining techniques and applications in respect to textural analysis. While computers are, today, good at analyzing structured date, humans are more adept at interpreting unstructured and linguistic information.
The paper begins by exploring the differences between traditional data mining and text data mining, also called Knowledge-Discover in Text (KDT). It highlights difficulties in machine processing related to semantic relations between concepts, issues related to slang, spelling variations, and contextual meaning, and variations in document types and formats. It outlines It discusses eight areas of textural analysis:
• Initial information extraction, including pattern matching and categorization methods, including the identification of keywords, removal of fillers, and creation of a more structured database which can be used further along the analysis chain....