Mining Question and Answer Archives for Relation Discovery


Information extraction from unstructured text has been widely used to construct new and extend existing knowledge bases. Such systems operate over individual sentences and can extract facts about entities if stated in a nicely structured sentence. However, a lot of knowledge that users care about isn’t stated in such a way or just hard to extract. On the other hand, community question answering (CQA) websites contain millions of question and answer (QnA) pairs that represent real users’ interests. This makes such data especially attractive for information extraction. However answer text is sometimes hard to understand without knowing the question, e.g., it may not name the subject or relation of the question. In this cases existing information extraction methods won’t be able to learn anything as they ignore the context and focus on individual sentences. During this project we developed a novel model for relation extraction from CQA data, which uses discourse of QnA pairs to predict relations between entities mentioned in question and answer sentences. Experiments on 2 publicly available datasets demonstrated that the model can extract from ~20% to ~40% additional relation triples, not extracted by existing sentence-based models. The results of this work were published in a paper “Relation Extraction from Community Generated Question-Answer Pairs” D. Savenkov et al, NAACL 2015 Student Research Workshop.