USydNLP:::Honours & Special Research Topics

Honours & Special Research Topics

Information for University of Sydney Students

If you are interested in our Capstone or SSP, please have a look at the USydNLP_2021 projects

The following list shows the honours research topics that our group is interested in. If you have any other NLP and deep learning related research topic that you would like to work on, please contact us.

Desirable: Student has done COMP3X08 (Introduction to Artificial Intelligence) or equivalent, and prepare to take COMP5046 (Natural Language Processing)

Explainable or Interpret-able Neural Network (in natural language processing)

According to Wikipedia, “… Explainable AI (XAI) or Transparent AI is an artificial intelligence (AI) whose actions can be easily understood by humans. It contrasts with the concept of the “black box” in machine learning, meaning the “interpretability” of the workings of complex algorithms, where even their designers cannot explain why the AI arrived at a specific decision…” With the advance and popularity of deep learning, it is even more important (1) to know why a system comes up with a decision and (2) for human to learn from the system. This is really an exploration research project. We will use health domain for our testing purpose.

Incremental Learnable Dialog System

Intelligent dialog system has been widely used to help human with different aspects such as Apple Siri, Microsoft Xiaoice in nowadays. However, for most of the dialog systems, if the users provide feedback about what errors that system has made, the system can not update automatically but only with the manual. Therefore, the primary goal of this project is to build a hybrid incremental learnable rule-based deep learning dialog system. First, the system can enable the manual to update the rules. Moreover, with using deep learning, the system can update those rules automatically by analyzing user’s feedback. We intend to implement this system in the university course selection first and explore more application possibilities in the future.

Cross Lingual Knowledge Transfer for Low-Resource Language

Most languages have no established writing system and minimal written records. However, textual data is essential for natural language processing, and particularly important for training language models to support speech recognition. Even in cases where text data is missing, there are some languages for which bilingual lexicons are available, since creating lexicons is a fundamental task of documentary linguistics. We investigate the use of such lexicons to improve language models when textual training data is limited to as few as a thousand sentences. The method involves learning cross-lingual word embeddings as a preliminary step in training monolingual language models. In recent years, various models for learning cross-lingual representations have been proposed; such as Monolingual mapping, Pseudo-cross-lingual, Cross-lingual training, and Joint optimization.

Fake News Detection using deep learning

Social media is becoming popular for news consumption due to its fast dissemination, easy access, and low cost. However, it also enables the wide propagation of fake news, i.e., news with intentionally false information. Detecting fake news is an important task, which not only ensures users receive authentic information but also helps maintain a trustworthy news ecosystem. The majority of existing detection algorithms focus on finding clues from news contents, which are generally not effective because fake news is often intentionally written to mislead users by mimicking true news. In this research, we will use different types of deep learning techniques and embedding.

Iterative text-image generation with dialogue

Iterative Contextual Image Generation (ICIG) research focuses on image generation and modification based on stepwise text description using a time-series Neural Network (NN) model. We aim to computationally connect language and image in a better way, in this case by realizing better control on image contents generation and modification using natural language. The model uses NLP techniques to infer the user’s text instructions, encode and pass the information through generative NN architecture for image generation. Comparing to traditional NN based image generation models, the ICIG framework enables us to control the contents of generated images more interactively. The core of ICIG model can also be used for other related research fields.

Mental health and Suicide ideation assessment with social media

Suicide has been among the leading causes of death in various countries around the world however, little progress has been made in improving current prevention systems. The primary goal of this project is to leverage the large volume of social media data in creating a usable decision support system that assists in mental health and suicide ideation risk detection. The project first establishes the feasibility of using only textual embeddings to minimize any dependency on subjective feature selection. Multiple datasets are examined to provide a comparative study among different social media platforms (currently Reddit and Twitter) and establish some generalisation. Various deep learning architectures are used including CNN, LSTM, Bi-LSTM and Transformer.

Extracting Information from Tables (or PDF)

Tables of information are everywhere. They can be found in academic literature, business reports, news articles or even sport webpages. The numbers in a table are useful for analysis but the extraction is tedious and error-prone manual process. Of course, it is not simply a copy-and-paste of numbers, but also to acquire the semantics of a value in each cell according to the column and row labels. This task turns out to be easier said than done. There were some attempts to this task but they all had a lot of restrictions & assumptions. A successful project not only benefits the world, but you may have a company selling this service.