Our research aims at developing methods and software for extracting data from arbitrary tables in weakly- and semi-structured documents. It covers the following issues:

  • Heuristic- and deep-learning-based table extraction from untagged PDF documents
  • Rule-based spreadsheet data transformation from an arbitrary to the relational form
  • Semantic interpretation of tabular data by using Linking Open Data

Currently, the research is supported by the Russian Science Foundation (grant no. 18-71-10001). The prior works were supported by the Russian Foundation for Basic Research (grant no. 12-07-31051 and grant no. 15-37-20042) and the Council for grants of the President of the Russian Federation (Scholarship No. SP-3387.2013.5)