Column Type Annotation by DBpedia (CTA-DBP)
This is a task of ISWC 2021 “Semantic Web Challenge on Tabular Data to Knowledge Graph Matching”. It is to annotate an entity column (i.e., a column composed of phrases) in a table with classes of DBpedia (2016-10) Ontology. Click here for the official challenge website.
Task Description
The task is to annotate each of the given entity columns with classes of DBpedia ontology. The annotation class should come from DBpedia ontology classes (excluding owl:Thing and owl:Agent). Each column can be annotated by multiple classes: the one that is as fine grained as possible and correct to all its cells, is regarded as a perfect annotation; the one that is the ancestor of the perfect annotation is regarded as an okay annotation; others are regarded as wrong annotations. Case is NOT sensitive.
As CTA tasks in SemTab 2019 and 2020, each submission should be a CSV file. Each line should include a column identified by table id and column id and its class annotation. It means one line should include three fields: “Table ID”, “Column ID” and “DBpedia class IRI” (these field headers should be excluded from the submission file). Annotation classes should be separated by space, and their order does not matter. Here is one line example: “9206866_1_8114610355671172497”,”0”,”http://dbpedia.org/ontology/Country http://dbpedia.org/ontology/PopulatedPlace http://dbpedia.org/ontology/Place”
In Round #1 in SemTab 2021, only one annotation (perfect annotation) is scored. So each line in the submission file should have just one class annotation. We may consider both perfect annotation and okay annotation in the following rounds.
Notes:
1) Table ID does not include the file name extension; make sure you remove the .csv extension from the filename.
2) Column ID is the position of the column in the input, starting from 0, i.e., first column’s ID is 0.
3) One submission file should have NO duplicate annotations for one target column.
4) Annotations for columns out of the target columns are ignored.
Datasets
Table set for Round #1: Tables, Target Columns
Data Description: One table is stored in one CSV file. Each line corresponds to a table row. The first row may either be the table header or content. The target columns for annotation are saved in a CSV file.
Evaluation Criteria [Round #1]
Precision, Recall and F1 Score will be calculated for ranking:
Precision = (Perfect Annotations #) / (Submitted Annotations #)
Recall = (Perfect Annotations #) / (Ground Truth Annotations #)
F1 Score = (2 * Precision * Recall) / (Precision + Recall)
Note:
1) # denotes the number.
2) One target column, one ground truth annotation which is the column's perfect annotation, i.e., # ground truth annotations = # target columns.
3) F1 Score is used as the primary score, Precision is used as the secondary score.
Prizes
To appear. :=)
Submission
1. One participant is allowed to make at most 5 submissions per day in Round #1.
Rules
-
Selected systems with the best results will be invited to present their results during the ISWC conference and the Ontology Matching workshop.
-
The prize winners will be announced during the ISWC conference (October 24 - 28, 2021). We will take into account all evaluation rounds specially the ones running till the conference dates.
-
Participants are encouraged to submit a system paper describing their tool and the obtained results. Papers will be published online as a volume of CEUR-WS as well as indexed on DBLP. By submitting a paper, the authors accept the CEUR-WS and DBLP publishing rules.
-
Please see additional information at our official website