Ticket #233 (new Feature Request)

Opened 13 years ago

Last modified 11 years ago

Import Tool

Reported by: jri Owned by:
Priority: Major Milestone:
Component: DeepaMehta Standard Distribution Version: 4.0.9
Keywords: Cc: x28de, dgf, JuergeN, Malte
Complexity: 8 Area:
Module:

Description (last modified by jri) (diff)

Aim is to create a bulk of topics programmatically based on an input text.
In general, the functionality is similar to DM3's Bulk Creation tool.

Only 2 formats for the input text shall be provided for the moment:

  • Comma separated plain text (CSV)
  • Tab separated plain text (TSV)

Result is a bunch of topics, all of the same type (called the *target topic type*).

Principal usage:

  1. The user initiates the bulk creation tool. A settings panel appears.
  2. The user specifies the input text and its format (CSV or TSV).
  3. The user selects the target topic type.
  4. Column mapping (in case of a composite target type): The user assigns each column (of the input text) to a child type of the target type.
  5. The user starts the creation process.
  6. Bulk creation: for each row of input text the bulk creation tool creates a topic.

Display:

  • The created topics appear in a grid-layout on the canvas
  • Order preservation: The topic order in the grid-layout (left-to-right, top-to-bottom) reflects the row order of the input text.
  • The grid-topics must not overlap topics which exists already on the canvas.

Special features:

  • Verification display: *before* the user starts the actual creation process the tool displays what topics *would* be created (e.g. as a table) according to the current user settings. This allows the user to fix the input text (resp. the format setting) or the column mapping.
  • Undo: if a bulk creation result proves to be malformed the user can undo the creation process. That is the tool deletes all topics created while this bulk process. The user is free to retry with adjusted settings.

Perhaps it is sufficient to realize only 1 of these 2 special features.

Realisation suggestions:

  • The user could specify the input text by pasting it to a text area (part of the settings panel). This would reduce development effort as no file operations are involved.
  • The bulk creation process itself could be represented as a topic (of type "Bulk Creation") on the canvas. Its settings panel would appear in DM's detail panel (right side). This would be conform to DM's interface concept that everything visible in the detail panel has a representation on the canvas. The user would be able to perform regular DM actions and return to the bulk creation settings afterwards.

Implementation notes:

  • The canvas already provides a grid-layout option (as well as a free-space-finder).

Change History

comment:1 Changed 13 years ago by jri

  • Description modified (diff)

comment:2 Changed 13 years ago by jri

  • Cc x28de, dgf, JuergeN, Malte added
  • Complexity changed from 3 to 8
  • Type changed from Defect to Feature Request
  • Description modified (diff)
  • Milestone set to Release 4.1

comment:3 Changed 13 years ago by jri

  • Description modified (diff)

comment:4 in reply to: ↑ description Changed 13 years ago by x28de

Replying to jri:

Aim is to create a bulk of topics programmatically based on an input text.

Initial aim is to import simple text snippets from semistructured pages that exist outside, into a DeepaMehta map.

In the meantime, the desired functionality cn be viewed at this proof of concept: http://www.rzuser.uni-heidelberg.de/~x28/temporary/

(Note: The term “import” might sound confusing as long as DeepaMehta’s restore of backups, or loading from foreign DeepaMehta exchange files, is also called import. However, it is correct and universally common for exchanges with different applications which use different formats. If some other application could read and process DeepaMehta’s zipped ISO13250 files, or if vice-versa, DeepaMehta could load those generated from other application, this would not be called import/ export, which always suggests crossing of major barriers. Howeverr, “creation” suggests a major discontinuity of the user’s intellectual work which is not created in DM but was previously grown outside, and will probably be reused (remixed, repurposed) subsequently outside again.)

In general, the functionality is similar to DM3's Bulk Creation tool.

In general, the functionality is similar to DM3's “Special > Topic Importer” tool, except that it should be again as transparent as the import wizard in DM2, i.e., should not corrupt text when uninterpretable HTML formatting is present.

Only 2 formats for the input text shall be provided for the moment:

  • Comma separated plain text (CSV)
  • Since Comma separated plain text (a special case of Character separated text, CSV) is rarely useful, support for this format could be dropped and replaced by support for rows with only one input column but with an autogenerated second column. Typically, simple text paragraphs would be numbered with the numbers stored in the label/ title column while the full text (including LFs, but without CRLFs), would be stored in the “description”. The autonumbering could be opted-out, in which case the only column would give the label/ title.
  • Tab separated plain text (TSV)
  • Tab separated text (not “plain” text which would exclude reformatted XML content such as RSS titles and “content:encoded” nodes with #cdata-section, which worked in DM2 Express as soon as it was transformed into one row per item node.)

Result is a bunch of topics, all of the same type (called the *target topic type*).

Principal usage:

  1. The user initiates the bulk creation tool. A settings panel appears.
  1. The user initiates the import tool. A settings wizard appears.
  1. The user specifies the input text and its format (CSV or TSV).
  1. The user specifies the input text and whether number labels should be added
  1. Column mapping (in case of a composite target type): The user assigns each column (of the input text) to a child type of the target type.
  1. Column mapping is simply as follows: the first input column gives the label (“title”), second column gives the “description”, single column (only one present) gives the “description” if autogenerated labels are ordered, else the label.
  1. The user starts the creation process.
  2. Bulk creation: for each row of input text the bulk creation tool creates a topic.
  1. Import: for each row of input text the import tool creates a topic.

Display:

  • The created topics appear in a grid-layout on the canvas
  • Order preservation: The topic order in the grid-layout (left-to-right, top-to-bottom) reflects the row order of the input text.
  • The grid-topics must not overlap topics which exists already on the canvas.
  • The center of grid-topics must not overlap topics which exists already on the canvas.

Special features:

(deleted)

Perhaps it is sufficient to realize only 1 of these 2 special features.

Perhaps it is sufficient to realize none of the fixing/ redo special features and wait for the restoration of multiple selection functionaliy, and ideally for multiple deletion (where “bulk deletion” would be an appropriate term), or a generic “undo”.

Some kind of dialog during the input specification would ideally involve a “split” and a “join” control to be applied at the paragraph/ row delimiters, and ideally some heuristics to suggest a title/ label from any identifiable input enumeration or from the first words or characters.

Realisation suggestions:

  • The user could specify the input text by pasting it to a text area (part of the settings panel). This would reduce development effort as no file operations are involved.

This might be ideal for web snippets and might be ok for files at the desktop, but beware that it did nor work in DM3 as soon as HTML tags were present (which need not be honored but must nut currupt content)

  • If there is a master switch to hide all non-payload, the import feature might be expanded for more types, e.g. more XML subnodes of “item” than just “title” and “description” or “content:encoded” RSS nodes. In this case, the term “bulk creation feature” might be justified for usage “under the hood” Then

The bulk creation process itself could be represented as a topic (of type "Bulk Creation") on the canvas. Its settings panel would appear in DM's detail panel (right side). This would be conform to DM's interface concept that everything visible in the detail panel has a representation on the canvas. The user would be able to perform regular DM actions and return to the bulk creation settings afterwards.

  • A good fundament for significantly extending the scope of this feature, would be to pick the title and description/ content:encoded directly from RSS items, which could then easily be expanded to including important nodes like dc:author, or other semantic web data.

Implementation notes:

  • The canvas already provides a grid-layout option (as well as a free-space-finder).

but the free space finder had problems (details later) and caused annoying scrolldown.

Last edited 11 years ago by x28de (previous) (diff)

comment:5 Changed 13 years ago by x28de

  • Summary changed from Bulk Creation Tool to Import Tool

comment:6 Changed 12 years ago by jri

  • Milestone changed from Release 4.1 to Release 4.2

comment:7 Changed 11 years ago by jri

  • Milestone Release 4.2 deleted
Note: See TracTickets for help on using tickets.