From Tables to Knowledge: Recent Advances in Table Understanding

Jay Pujara,Pedro Szekely,Huan Sun,Muhao Chen

From Tables to Knowledge: Recent Advances in Table Understanding

2021

A wealth of human knowledge is expressed in structured tables, across web pages, scientific articles, spreadsheets, and databases. This wealth of knowledge is mirrored by diversity in the vast number of layout structures, content types, formats, and surface forms used to express tables. Recent advances in representation learning and knowledge representation have made progress in exploiting structural regularities in tabular data to unlock this knowledge. In this tutorial, we provide a survey of these advances for a host of table understanding tasks, including table segmentation, semantic typing of cells, transforming tables to knowledge graphs, entity linking, and table retrieval tasks for question answering. The structure of the tutorial will include three major modules. The first will provide attendees an introduction to the seminal work in organization of data in tables, and cover the major goals and approaches of computational systems that undertake table understanding. The second module will cover specific models used for table understanding tasks, such as table discovery, table segmentation and layout detection, cell classification and semantic typing, mapping tables to knowledge graphs and linking to known entities, and table retrieval in search and question answering. The final tutorial module will provide a primer for researchers who want to get involved with the table understanding community, providing them a guide to the most commonly used benchmark datasets and models, downstream applications and evaluations, and a sketch of the open problems in table understanding. Our tutorial is designed to be approachable to many different audiences, and will serve as a timely resource in a field that is quickly progressing. To engage audiences, we will incorporate opportunities to ask questions and discuss, as well as small group activities and exercises with other participants. When possible, we will incorporate practical demos that help illustrate the operation of the tools and models we discuss, as well as pointing out places where the existing state-of-the-art system can be improved. Additionally, we will introduce participants to many of the open-source tools and datasets that the tutors have built and curated, helping new researchers begin working on these problems quickly and effectively.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations