- NLP
Video Question Answering
Visual Question Answering (VQA) is a challenging multi- modal task that requires not only the semantic understanding of both images and questions, but also the sound perception of a step-by-step reasoning process that would lead to the correct answer. Our best model significantly outperforms the previous state-of-the-art on GQA dataset, which delivers 92.7% on the validation set, and 73.1% on the test-dev set.
- NLP
Incremental Learnable Dialog System
Intelligent dialog system has been widely used to help human with different aspects such as Apple Siri, Microsoft Xiaoice in nowadays. However, for most of the dialog systems, if the users provide feedback about what errors that system has made, the system can not update automatically but only with the manual. Therefore, we build a hybrid incremental learnable rule-based deep learning dialog system, named Cassandra (. First, the system can enable the manual to update the rules. Moreover, with using deep learning, the system can update those rules automatically by analyzing user’s feedback. We intend to implement this system in the university course selection first and explore more application possibilities in the future.
- NLP
Iterative text-image generation with dialogue
Iterative Contextual Image Generation (ICIG) research focuses on image generation and modification based on stepwise text description using a time-series Neural Network (NN) model. We implemented an image-generation system, IterDraw, that follows text instructions to generate images of multiple objects. This system focuses on drawing the objects iteratively with accurate spatial relationships and temporal order, as such information is of critical importance when used in presenting ideas, providing instructions and illustrating specific procedures. IterDraw receives input of human natural language, and via a recurrent generative network it generates images with multiple objects and specific details, which has been challenging for non-iterative networks. We demonstrated the application with the Iterative Compositional Language and Elementary Visual Reasoning (i-CLEVR) dataset and Collaborative Drawing (CoDraw) dataset and received positive feedback from the users.
- NLP
In-Game Toxicity Detection
: Traditional toxicity detection models have focused on the single utterance level without deeper understanding of context. We introduce CONDA, a new dataset for in-game toxic language detection enabling joint intent classification and slot filling analysis, which is the core task of Natural Language Understanding (NLU). The dataset consists of 45K utterances from 12K conversations from the chat logs of 1.9K completed Dota 2 matches. We propose a robust dual semantic-level toxicity framework, which handles utterance and token-level patterns, and rich contextual chatting history.