Ticket #937 (new Enhancement)

Opened 9 years ago

Last modified 8 years ago

wikidata: for continuing the experiments and rewriting the dm4-relationship with wikidata

Reported by: Malte Owned by:
Priority: Major Milestone:
Component: 3rd Party Plugins Version: 4.7
Keywords: Cc: jri, dgf, JuergeN
Complexity: 3 Area: GUI / Usability
Module:

Description

Our experiment has shown that DM4 is not end-user ready to process such large amounts of data as received by requests to the Wikidata API. See http://wikidata-topicmaps.wmflabs.org/

As long as this is not solved the approach of building on a HTTP based on-demand connection to Wikidata should not be further developed, except when integrating just special cases or domains, or vocabulary present in wikidata into DM4.

Furthermore, the other approach of integrating wikidata via importing all of wikidata in before has similar issues: The import takes just too much time and data storage. Furthermore the dm4-webclient is not ready to deal with such larges amounts of data.

A different, third approach needs to be developed.

Change History

comment:1 Changed 9 years ago by JuergeN

Dear Malte, I have been thinking about this for quite some time, too. Not just in the context of Wikidata, but all other external data sources with "big data". One way to deal with this in my opinion could be "proxy objects" - cached references to the original datasets that are paged and updated on demand only. Happy to discuss this more in detail.

comment:2 Changed 8 years ago by Malte

Hi dear Juergen, as discussed, all my current approaches use (very limited) "proxy objects" to wikidata entities and properties, they are basically just references to those.

What i see is:
Perfomance wise 1) the current webclient is not usable with that many or huge proxy topics/objects and 2) the create/update operations of the thousands of topics (just one wikidata proxy object is made of) consumes so much time that standard wikidata topicmaps are not a joy to use.

So, going forward would be to provide a single proxy object for the whole of wikidata which would provide a basic UI for their SPARQL endpoint [1].

Another point might be to write a RDF description of the DeepaMehta 4 Standard Types based on the resources and items developed and described in Wikidata. That way a service could theoretically "seemingless" integrate and translate entries from both.

[1] http://query.wikidata.org

Note: See TracTickets for help on using tickets.