Our research aims at developing methods and software for extracting data from arbitrary tables in weakly- and semi-structured documents. It covers the following issues:

  • Heuristic- and deep-learning-based table extraction from untagged PDF documents
  • Rule-based spreadsheet data transformation from an arbitrary to the relational form
  • Semantic interpretation of tabular data by using Linking Open Data

The presentation on the topic is available at Figshare
Shigarov, A. (2021). Table Understanding: Rethinking of the Problem. https://doi.org/10.6084/m9.figshare.14836122.v1

The research was supported by the Russian Science Foundation (grant no. 18-71-10001). The prior works were supported by the Russian Foundation for Basic Research (grant no. 12-07-31051 and grant no. 15-37-20042) and the Council for grants of the President of the Russian Federation (Scholarship No. SP-3387.2013.5)