Benchmark Study #4: AI2 Reasoning Challenge (Task(s), MCQ)

post by Bruce W. Lee (bruce-lee) · 2024-01-07T17:13:00.209Z · LW · GW · 0 comments

Contents

  TL;DR
  Timeline Note: Everything below is written from the perspectives of 2018 when the latest version (at the time of writing) of "Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge" was published
  Section: Abstract
  Section: Introduction
    Introduction to the AI2 Reasoning Challenge (ARC)
    Structure and Composition of the ARC Dataset
    Additional Resources Released with ARC
    Distinction from Previous Challenges
    Organization and Content Analysis of the Paper
  Section: ARC Dataset
    Overview of ARC Dataset
    Question Characteristics
    Identifying Challenge Questions
    Question Types in ARC
    Methodology for ARC Challenge Set Definition
    Knowledge and Reasoning Styles in ARC
  Section: ARC Corpus
    ARC Corpus
    Creation of ARC Corpus
    Characteristics of ARC Corpus
    Coverage and Relevance
    Challenges
    Utility
  Section: Baseline / Results
    Overview of Baseline Systems
    IR and PMI Performance
    Advanced Neural Models
    Results on Challenge and Easy Sets
    Limitations and Observations
None
No comments

Background Note: Benchmark Study is a blog post series to record and study benchmark papers. I am in the process of developing a new LLM evaluation framework for more flexibility over EleutherAI LM Harness. For the initial release, I'm only adding benchmarks that I've studied. All study notes are meant to be read within 10 minutes. I will receive GPT assistance here and there while writing these blog posts. I'm publicly sharing study notes partly to keep myself going and help whoever hasn't read the paper yet. 

@misc{clark2018think,
     title={Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge}, 
     author={Peter Clark and Isaac Cowhey and Oren Etzioni and Tushar Khot and Ashish Sabharwal and Carissa Schoenick and Oyvind Tafjord},
     year={2018},
     eprint={1803.05457},
     archivePrefix={arXiv},
     primaryClass={cs.AI}
}

TL;DR

Timeline Note: Everything below is written from the perspectives of 2018 when the latest version (at the time of writing) of "Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge" was published


Section: Abstract

Section: Introduction

Introduction to the AI2 Reasoning Challenge (ARC)

Structure and Composition of the ARC Dataset

Additional Resources Released with ARC

Distinction from Previous Challenges

Organization and Content Analysis of the Paper

Section: ARC Dataset

Overview of ARC Dataset

Question Characteristics

Identifying Challenge Questions

Question Types in ARC

Methodology for ARC Challenge Set Definition

A snippet of the paper, which I think contains an impressively intuitive yet effective logic.

Knowledge and Reasoning Styles in ARC

Section: ARC Corpus

ARC Corpus

Creation of ARC Corpus

Characteristics of ARC Corpus

Coverage and Relevance

Challenges

Utility

Section: Baseline / Results

Overview of Baseline Systems

IR and PMI Performance

Advanced Neural Models

Results on Challenge and Easy Sets

Limitations and Observations

0 comments

Comments sorted by top scores.