TabbyDOC
Table Understanding Research
About
This research project aims at developing methods and software for the extraction of entities and their relationships from tables represented in unstructured and semi-structured data formats
This work was supported by the Russian Science Foundation (grant no. 18-71-10001). Our prior works were supported by the Russian Foundation for Basic Research (grant no. 12-07-31051 and grant no. 15-37-20042) and the Council for grants of the President of the Russian Federation (Scholarship No. SP-3387.2013.5)
Source code
- tabbypdf, Rule-based PDF table extraction
- tabbypdf2, Deep-learning-based PDF table extraction
- tabbyxl, Rule-based spreadsheet data extraction
- tabbyld, Semantic table interpretation using open knowledge graphs
Publications
2022
-
Shigarov A. (2022). Table understanding: Problem overview. WIREs Data Mining and Knowledge Discovery, 13(1), e1482. https://doi.org/10.1002/widm.1482
-
Kostyleva O., Paramonov V., Shigarov A., Vetrova V. (2022). Towards comparison of table type taxonomies. 45th Jubilee Int. Conv. on Information, Communication and Electronic Technology (MIPRO), 1461-1465. https://doi.org/10.23919/MIPRO55190.2022.9803520.
2021
-
Dorodnykh N., Yurin A., Shigarov A., Turdakov D. (2021). Ontology engineering at the assertion level based on semantic annotation of tabular data. 2021 Ivannikov Memorial Workshop (IVMEM). 28-34. https://doi.org/10.1109/IVMEM53963.2021.00011
-
Yurin A., Dorodnykh N., Shigarov A. (2021). Semi-automated formalization and representation of the engineering knowledge extracted from spreadsheet data. IEEE Access. 9, 157468-157481. https://doi.org/10.1109/ACCESS.2021.3130172
-
Paramonov V., Shigarov A., Vetrova V. (2021). Rule-driven spreadsheet data extraction from statistical tables: case study. Information and Software Technologies. ICIST 2021. CCIS 1486, 84-95. https://doi.org/10.1007/978-3-030-88304-1_7
-
Dorodnykh N., Yurin A. (2021). TabbyLD: a tool for semantic interpretation of spreadsheets data. Modelling and Development of Intelligent Systems. MDIS 2020. CCIS 1341, 315-333. https://doi.org/10.1007/978-3-030-68527-0_20
-
Dorodnykh N., Shigarov A., Yurin A. (2022). Using the semantic annotation of web table data for knowledge base construction. Proc. 4th Artificial Intelligence and Cloud Computing Conference. AICCC’21, 122-129. https://doi.org/10.1145/3508259.3508277
-
Dorodnykh N., Yurin A. (2022). Extraction of facts from web-tables based on semantic interpretation tabular data. 2022 Ivannikov Memorial Workshop (IVMEM), 7-17. https://doi.org/10.1109/IVMEM57067.2022.9983959
-
Mikhailov A., Shigarov A. Page layout analysis for refining table extraction from PDF documents. 2021 Ivannikov Ispras Open Conference (ISPRAS), 114-119. https://doi.org/10.1109/ISPRAS53967.2021.00021
2020
-
Mikhailov A., Shigarov A., Rozhkov E., Cherepanov I. (2020). On graph-based verification for PDF table detection. 2020 Ivannikov ISPRAS Open Conference (ISPRAS). 91-95. https://doi.org/10.1109/ISPRAS51486.2020.00020
-
Cherepanov I., Mikhailov A., Shigarov A., Paramonov V. (2020). On automated workflow for fine-tuning deep neural network models for table detection in document images. 2020 43rd International Convention on Information, Communication and Electronic Technology. 1130-1133. https://doi.org/10.23919/MIPRO48935.2020.9245241
-
Dorodnykh N. & Yurin A. (2020). Towards a universal approach for semantic interpretation of spreadsheets data. Proc. 24th Symposium on International Database Engineering & Applications. Article 22, 1-9. https://doi.org/10.1145/3410566.3410609
-
Paramonov V., Shigarov A., Vetrova V. (2020). Table header correction algorithm based on heuristics for improving spreadsheet data extraction. Information and Software Technologies. 1283 CCIS, 147-158. https://doi.org/10.1007/978-3-030-59506-7_13
-
Yurin A. & Dorodnykh N. (2020). Experimental evaluation of a spreadsheets transformation in the context of domain model engineering. Ural S. Biomedical Engineering, Radioelectronics and Information Technology. 0388-0391. https://doi.org/10.1109/USBEREIT48449.2020.9117674
-
Dorodnykh N., Yurin A., Shigarov A. (2020). Conceptual model engineering for industrial safety inspection based on spreadsheet data analysis. Modelling and Development of Intelligent Systems. 1126 CCIS, 51-65. https://doi.org/10.1007/978-3-030-39237-6_4
-
Paramonov, V., Shigarov, A., Vetrova, V., Mikhailov, A. (2020). Heuristic algorithm for recovering a physical structure of spreadsheet header. Information Systems Architecture and Technology. 1050 AISC, 140-149. https://doi.org/10.1007/978-3-030-30440-9_14
2019
-
Yurin A. & Dorodnykh N. (2019). A reverse engineering process for inferring conceptual models from canonicalized tables. 2019 Int. Multi-Conf. on Engineering, Computer and Information Sciences (SIBIRCON). 0485-0490. https://doi.org/10.1109/SIBIRCON48586.2019.8958458
-
Shigarov, A., Khristyuk, V., Mikhailov, A., Paramonov, V. (2019). TabbyXL: rule-based spreadsheet data extraction and transformation. Information and Software Technologies. 1078 CCIS, 59-75. https://doi.org/10.1007/978-3-030-30275-7_6
Preprint
Presentation -
Shigarov, A., Khristyuk, V., Mikhailov, A. (2019). TabbyXL: software platform for rule-based spreadsheet data extraction and transformation. SoftwareX, 10. https://doi.org/10.1016/j.softx.2019.100270
Preprint -
Shigarov, A., Cherepanov, I., Cherkashin, E., Dorodnykh, N., Khristyuk, V., Mikhailov, A., Paramonov, V., Rozhkow, E., Yurin A. (2019). Towards end-to-end transformation of arbitrary tables from untagged portable documents (PDF) to linked data. CEUR-WS Proc. 2463, 1-12.
Article -
Shigarov, A., Khristyuk, V., Mikhailov, A., Paramonov, V. (2019). Software development for rule-based spreadsheet data extraction and transformation. Proc. 42nd Int. Convention on Information and Communication Technology, Electronics and Microelectronics. 1132-1137. https://doi.org/10.23919/MIPRO.2019.8756829
Preprint -
Cherkashin, E., Shigarov, A., Paramonov, V., Mikhailov, A. (2019). Digital archives supporting document content inference. Proc. 42nd Int. Convention on Information and Communication Technology, Electronics and Microelectronics. 1037-1042. https://doi.org/10.23919/MIPRO.2019.8757196
Preprint -
Dorodnykh, N., Yurin, A. (2019). Towards ontology engineering based on transformation of conceptual models and spreadsheet data: a case study. Intelligent Systems Applications in Software Engineering. 1046 AISC, 233-247. https://doi.org/10.1007/978-3-030-30329-7_22
Preprint -
Paramonov, V., Shigarov, A., Ruzhnikov, G., Cherkashin, E. (2019). Phonetic string matching for languages with Cyrillic alphabet. Information Systems Architecture and Technology. 852 AISC, 301-311. https://doi.org/10.1007/978-3-319-99981-4_28
Preprint
2018
-
Shigarov, A., Altaev, A., Mikhailov, A., Paramonov, V., Cherkashin, E. (2018). TabbyPDF: web-based system for PDF table extraction. Information and Software Technologies. 920 CCIS, 257-269. https://doi.org/10.1007/978-3-319-99972-2_20
Preprint -
Yang, S., Wei, R., Shigarov, A. (2018). Semantic interoperability for electronic business through a novel cross-context semantic document exchange approach. Proc. 18th ACM Symposium on Document Engineering. 28:1-28:10. https://doi.org/10.1145/3209280.3209523
-
Cherkashin, E., Kopaygorodsky, A., Kazi, L., Shigarov, A., Paramonov, V. (2018). Model driven architecture implementation using linked data. Information and Software Technologies. 920 CCIS, 412-423. https://doi.org/10.1007/978-3-319-99972-2_34
Preprint
2017
- Shigarov, A., Mikhailov, A. (2017). Rule-based spreadsheet data transformation from arbitrary to relational tables. Information Systems. 71, 123-136. https://doi.org/10.1016/j.is.2017.08.004
Preprint
2016
-
Shigarov, A., Mikhailov, A., Altaev, A. (2016). Configurable table structure recognition in untagged PDF documents. Proc. 16th ACM Symposium on Document Engineering. 119-122. https://doi.org/10.1145/2960811.2967152
Preprint
Poster -
Shigarov, A., Paramonov, V., Belykh, P., Bondarev, A. (2016). Rule-based canonicalization of arbitrary tables in spreadsheets. Information and Software Technologies. 639 CCIS, 78-91. https://doi.org/10.1007/978-3-319-46254-7_7
Preprint -
Paramonov, V., Shigarov, A., Ruzhnikov, G., Belykh, P. (2016). Polyphon: an algorithm for phonetic string matching in Russian language. Information and Software Technologies. 639 CCIS, 568-579. https://doi.org/10.1007/978-3-319-46254-7_46
Preprint -
Шигаров, А. (2016). Методологическое и программное обеспечение трансформации табличных данных от произвольной к реляционной форме. Научная секция заседания Объединенного ученного совета СО РАН по нанотехнологиям и информационным технологиям.
Presentation
2015
-
Shigarov, A. (2015). Table understanding using a rule engine. Expert Systems with Applications. 42(2), 929-937. https://doi.org/10.1016/j.eswa.2014.08.045
Preprint
Presentation -
Shigarov, A. (2015). Rule-based table analysis and interpretation. Information and Software Technologies. 538 CCIS, 175-186. https://doi.org/10.1007/978-3-319-24770-0_16
Preprint -
Шигаров, А. О., Бычков, И. В., Парамонов, В. В., Белых, П. В. (2015). Анализ и интерпретация произвольных таблиц на основе исполнения CRL-правил. Вычислительные технологии. 20(6), 87-112.
Preprint -
Shigarov, A., Paramonov, V. (2015). CRL: a rule language for analysis and interpretation of arbitrary tables. CEUR-WS Proc. 1536, 22-29.
Article
Presentation
2014
-
Шигаров, А. О. (2014). Восстановление логической структуры таблиц из неструктурированных текстов на основе логического вывода. Вычислительные технологии. 19(1), 87-99.
Preprint -
Shigarov, A. (2014). Automated table understanding using a rule engine. CEUR-WS Proc. 1297, 216-223.
Article
2013
- Шигаров, А. О., Бычков, И. В., Ружников, Г. М., Хмельнов, А. Е., Федоров, Р. К. (2013). Система трансформации таблиц. Информационные технологии и вычислительные системы. 3, 15-26.
Preprint
2011
- Shigarov, A., Fedorov, R. (2011). Simple algorithm for page layout analysis. Pattern Recognition and Image Analysis. 21(2), 324-327. https://doi.org/10.1134/S1054661811021008
Preprint
2009
-
Шигаров, А. О. (2009). Технология извлечения табличной информации из электронных документов разных форматов. Дис. канд. техн. наук.
PhD Thesis
PhD Abstract
Presentation -
Shigarov, A., Bychkov, I., Hmelnov, A., Ruzhnikov, G. (2009). A method for table detection in metafiles. Pattern Recognition and Image Analysis. 19(4), 693-697. https://doi.org/10.1134/S1054661809040191
Preprint
Poster -
Бычков, И. В., Ружников, Г. М., Хмельнов, А. Е., Шигаров, А. О. (2009). Эвристический метод обнаружения таблиц в разноформатных документах. Вычислительные технологии. 14(2), 58-73.
Preprint
2008
- Хмельнов, А. Е., Шигаров, А. О. (2008). Метод извлечения таблиц из неформатированного текста. Вычислительные технологии. 13(1), 93-101.
Preprint
Contacts
Office 222, Block EVM, Lermontov st. 134, Irkutsk, Russia, 664033 Department of Information Technology and Systems, Matrosov Institute for System Dynamics and Control Theory, Siberian Branch of the Russian Academy of Sciences
Alexey Shigarov (e-mail: shigarov@gmail.com)