BetterSearch Demo: Interactive Query Generation Interface using LLM based Prompt Modification and User Feedback


BetterSearch, a cross-lingual search interface that supports automatic and interactive query generation over the BETTER dataset. BetterSearch provides a simple document search interface that displays documents in their original language along with their English translations, making it simple for researchers to navigate and analyze search results. The tool also supports diverse query generation, allowing users to explore search results more comprehensively. More importantly, it combines search with a prompting-based query generation interface which permits users to refine their queries and prompts with retrieval information. The BetterSearch interface could work as an effective starting template for performing qualitative analysis over other information retrieval experiments and datasets as well as serve as a tool to incorporate retrieval feedback and Human-In-The-Loop (HITL) studies.

Kaustubh Dhole, Ramaraj Chandradevan, Eugene Agichtein (Emory IR Lab) and the JHU Team

Our system and interface was designed to cater to the BETTER dataset, a collection of natural language processing resources developed by US Intelligence Advanced Research Projects Activity(IARPA) in order to assist their intelligence analysts to process and analyze huge amounts of unstructured, multilingual information quickly and effectively. The collection also contains ancillary information like event spans from text across many languages and topics. Particularly, search systems are supposed to perform accurate retrieval of Arabic, Persian, Chinese, Korean, and Russian documents on being queried in English.

Our lab developed the BetterSearch User Interface which is made up of 3 subsystems. Each of them are described below. Check the following video for a complete demonstration:

The BetterSearch User Interface is made up of 3 subsystems. Each of them are described below. The interface is built using HuggingFace's Gradio platform.

  • Cross-Lingual Event based Retrieval
    • This tab is the simplest interface which permits users to write search queries. The user is presented with the output of a cross-lingual event-based retriever which returns, the most relevant documents, their English translations, and the highlighted events in each document.
    • The documents are retrieved using a cross-lingual retrieval model, COLBERT (Khattab et al, 2020). The documents are further reranked using event features. All the documents are translated before indexing using Google Translate.
    • The collection also contains ancillary information like event spans from text across many languages and topics. Particularly, the program seeks search systems to perform accurate retrieval of Arabic, Persian, Chinese, Korean, and Russian documents on being queried with example English documents.
  • Automatic Query Generation
    • The BETTER task seeks to benchmark systems to be able to look for documents in specified target languages which are similar to a user’s example document. Our system attempts to achieve this via generating intermediate human-understandable editable queries from the example document, and performing retrieval over the same. Diverse queries are generated using Hamming Diversity Beam Search. However, the effectiveness of the generated queries is crucial to retrieve relevant documents, while also ensuring query interpretability.
    • Inspired by the recent success of pre-trained generation models, we fine-tune a T5 (Raffel et al, 2020) model on (document, query) pairs. To evaluate the performance of our approach, we compare the original T5 model with a docT5query (R. Nogueir et al, 2019) model, which has already been fine-tuned on the MSMarco dataset. Our results indicate that the docT5query model outperforms the original T5 model, and thus we utilize it for our demonstration.

  • Prompting Based Interactive Query Generation & Feedback
    • Large language models in recent years have shown excellent strides in multi task learning and few shot learning. With just a handful of examples, large language models have shown impressive generative capabilities, albeit with risks of hallucinations. Prompting has been an effective and seemingly natural way to interact with such models.
    • While few-shot prompting has been a powerful approach to teach models new tasks, models have generally shown different outputs on varying various prompting parameters. For example, varying the types of examples in the prompt, changing the order of the examples and number of examples vastly influence the generations. We use this feature to our advantage for query generation by letting people edit their prompts either directly or through user relevance feedback so as to improve subsequent query generations and corresponding retrieval.
    • We choose FlanT5 (Chung et al, 2022) as it has been fine-tuned already on a large amount of tasks making it arguably convenient for learning on new tasks. The interface permits prompting FlanT5 by default with two editable (document, query) pairs alongwith an instruction. We present users with options of multiple instructions and their choice of document to generate query from. The interface is shown below.
    • Each of the generated queries can at once or together be used to retrieve documents. On retrieval, each of the retrieved document is provided with a checkbox to permit it to be directly added to the prompt alongwith its original query. This is intended to incorporate user search feedback back directly into the prompt to make the prompt more consistent with the users' requests. In terms of few-shot prompting, when models are prompted to generate responses based on a limited set of examples, the quality of the generations depends on the quality and relevance of the examples provided as models are known to be also less robust towards prompt perturbations.

The primary objective of BetterSearch is to provide researchers a tool to qualitiavely investigate cross-lingual retrieval. Researchers and practitioners can quickly and easily perform qualitative analysis with the tool's search interface and query generation features, allowing them to evaluate search systems more thoroughly. The prompting-based search interface also provides an avenue to perform human-in-the-loop (HITL) studies. Apart from qualitative studies, we believe BETTERSearch could be used as an effective starting template to perform more sophisticated information retrieval experiments as well as serve as a tool to incorporate retrieval feedback and conduct Human-In-The-Loop studies.

The project is funded by the Intelligence Advanced Research Projects Activity (IARPA).