Mailing List Archive

WDQS Scaling update
Hello all!

We’ve been moving forward on the WDQS Graph Split [1], time for an update!

We have new documentation to help the migration to the split graph:
* Federation limits [2]: Explanation of the limitations of the SPARQL
federation as used on the graph split. This might help you understand what
is possible and what isn’t when you need to federate the main WDQS graph
with the scholarly subgraph.
* Federated queries examples [3]: This document explains how to rewrite
queries to use SPARQL federation over the split graph. We’ve taken a number
of real life examples, and we’ve rewritten them to use federation. While
rewriting queries is not always trivial, the examples that we tried are all
possible to make work over a split graph.

We have been reaching out to people who will be impacted by the graph
split. In particular, we have been having conversations with community
members close to the Scholia and Wikicite projects. In that context, we are
realizing that our initial split proposal (moving all instances of
Scholarly articles to a separate graph - ?entity wdt:P31 wd:Q13442814) is
not sufficient. We have prepared a second and last proposal that will
refine this split to make it easier to use. See "WDQS Split Refinement" [4]
for details. We are open for feedback until May 15th 2024, please send it
to the related talk page [5].

While we refine this split, we are starting work on the implementation of
the missing pieces to make the graph split available. This includes
modifying the update pipeline to support the split and better automation of
the data loading process. We are also working on a migration plan, which we
will communicate as soon as it is ready. Our current assumption is that we
will leave ~6 months for the migration once the split services are
available before shutting down the full graph endpoint.

We need your help more than ever!
If you have use cases that need access to scholarly articles, please read
"Federation Limits" [2] and "Federated Queries Examples" [3], rewrite and
test your queries, and add your working examples to "Federated Queries
Examples" [3].
Send your general feedback to the project page [1].

On a side note, WDQS isn’t the only SPARQL endpoint exposing the Wikidata
graph. You can have a look at "Alternative endpoints" [6], which lists a
number of alternatives not hosted by WMF, which might be helpful during the
transition.

Thanks!

Guillaume

[1]
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split
[2]
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Federation_Limits
[3]
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Federated_Queries_Examples
[4]
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/WDQS_Split_Refinement
[5]
https://www.wikidata.org/w/index.php?title=Wikidata_talk:SPARQL_query_service/WDQS_graph_split/WDQS_Split_Refinement&action=edit
[6]
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/Alternative_endpoints

--
*Guillaume Lederrey* (he/him)
Engineering Manager
Wikimedia Foundation <https://wikimediafoundation.org/>