IQ-VQA: Intelligent Visual Question Answering

2021
Despite tremendous progress in the field of Visual Question Answering, models today still tend to be inconsistent and brittle. Thus, we propose a model-independent cyclic framework which increases consistency and robustness of any VQA architecture. We train our models to answer the original question, generate an implication based on the answer and then learn to answer the generated implication correctly. As part of the cyclic framework, we propose a novel implication generator which generates implied questions from any question-answer pair. As a baseline for future works on consistency, we provide a new human-annotated VQA-Implications dataset. The dataset consists of 30k implications of 3 types - Logical Equivalence, Necessary Condition and Mutual Exclusion - made from the VQA validation dataset. We show that our framework improves consistency of VQA models by Open image in new window on the rule-based dataset, Open image in new window on VQA-Implications dataset and robustness by Open image in new window , without degrading their performance.
    • Correction
    • Source
    • Cite
    • Save
    0
    References
    0
    Citations
    NaN
    KQI
    []
    Baidu
    map