Personal tools

Wikidata Wednesdays

Creators: Tom Elliott Copyright © The Contributors. Sharing and remixing permitted under terms of the Creative Commons Attribution 3.0 License (cc-by).
Last modified Oct 03, 2024 08:51 AM
tags:
Alignments between Wikidata and Pleiades (and other gazetteers) have been growing for some time, but there's still plenty of work to do.

Over the years, the Wikidata community has added support for linking Wikidata items to the identifiers used in other authoritative datasets. A number of gazetteers, including Pleiades, are supported in this way. For example, consider the Wikidata item for Apollonia in Mygdonia, where we find identifiers not only for the corresponding Pleiades place resource, but also for the corresponding records in the Digital Atlas of the Roman Empire, ToposText, Trismegistos, Vici.org, and more.

The practice of crosslinking among domain-specific gazetteers -- and also from those gazetteers to Wikidata -- has been growing too. The benefits are multiple. Not only can a gazetteer like Pleiades provide its users with ready access to additional information (and alternative approaches to the structure of information) found in other gazetteers and data sources, but we can also make our own datasets more ready to connect to information that uses identifiers drawn from third party datasets. For example, the World Historical Gazetteer provides for the ingest of user-created datasets and facilitates quick and easy alignment of the place information in an uploaded dataset with other content already in the WHG if the uploaded dataset contains Wikidata identifiers. If Pleiades provides a Wikidata ID for each of its places, then Pleiades resources that are copied into a researcher's or a teacher's custom dataset and then uploaded into WHG will automatically link up in this way. Similar joinings of data drawn from disparate sources are enabled by ubiquitous cross-linking in a variety of context and tools.

In 2023, I started running queries of Wikidata periodically in order to keep tabs on how many Pleiades identifiers have been incorporated into Wikidata items. I've settled into a pattern of trying to do this once a week, usually on Wednesdays. The Wikidata community seems to be adding Pleiades identifiers at a rate of 20-30 each week. The latest run of the query netted 10,689 (i.e., roughly a quarter of Pleiades place resources). If you want to try this yourself, grab the SPARQL query from our GitHub repository and paste it into the Wikidata Query Service web page. Or you can check out the results in the form of a CSV file, also in our GitHub repository.

To see how many of these links from Wikidata to Pleiades are reciprocated (where we also link to Wikidata in the form of a reference on a place resource), I wrote a Python script yesterday to look through the Wikidata query results and compare them to the references in our own JSON exports. It tells me that there are currently 3,255 mutual (bidirectional) links across Wikidata and Pleiades. This means that some contributor to Wikidata has checked the corresponding Pleiades resource and, finding it be appropriate (i.e., for the same place) has added it to the Wikidata item. It also means that Pleiades contributors have done the same thing the other way round, and the resulting references have been checked and published by the Pleiades editors.

The script also tells me that there are currently:

Closing these gaps is activity that members of the Pleiades community could usefully take on. Accordingly, I intend to keep running the Wikidata query and the Python script on a weekly basis and posting the results to the GitHub repository where community members can view them. The links above to files in the repository are constructed so that they will always take you to the latest version. Whenever I update, I'll also publish a notice like this to the Pleaides Gazetteer Mastodon Bot account, which you can visit in a web browser, subscribe to in a news/feed reader via its RSS feed, or follow in the Fediverse (@pleiades@botsin.space) using an ActivityPub-enabled application.