Ticket #937 (new Enhancement)
wikidata: for continuing the experiments and rewriting the dm4-relationship with wikidata
Reported by: | Malte | Owned by: | |
---|---|---|---|
Priority: | Major | Milestone: | |
Component: | 3rd Party Plugins | Version: | 4.7 |
Keywords: | Cc: | jri, dgf, JuergeN | |
Complexity: | 3 | Area: | GUI / Usability |
Module: |
Description
Our experiment has shown that DM4 is not end-user ready to process such large amounts of data as received by requests to the Wikidata API. See http://wikidata-topicmaps.wmflabs.org/
As long as this is not solved the approach of building on a HTTP based on-demand connection to Wikidata should not be further developed, except when integrating just special cases or domains, or vocabulary present in wikidata into DM4.
Furthermore, the other approach of integrating wikidata via importing all of wikidata in before has similar issues: The import takes just too much time and data storage. Furthermore the dm4-webclient is not ready to deal with such larges amounts of data.
A different, third approach needs to be developed.
Change History
comment:2 Changed 8 years ago by Malte
Hi dear Juergen, as discussed, all my current approaches use (very limited) "proxy objects" to wikidata entities and properties, they are basically just references to those.
What i see is:
Perfomance wise 1) the current webclient is not usable with that many or huge proxy topics/objects and 2) the create/update operations of the thousands of topics (just one wikidata proxy object is made of) consumes so much time that standard wikidata topicmaps are not a joy to use.
So, going forward would be to provide a single proxy object for the whole of wikidata which would provide a basic UI for their SPARQL endpoint [1].
Another point might be to write a RDF description of the DeepaMehta 4 Standard Types based on the resources and items developed and described in Wikidata. That way a service could theoretically "seemingless" integrate and translate entries from both.
Dear Malte, I have been thinking about this for quite some time, too. Not just in the context of Wikidata, but all other external data sources with "big data". One way to deal with this in my opinion could be "proxy objects" - cached references to the original datasets that are paged and updated on demand only. Happy to discuss this more in detail.