Projects/Open Shakespeare/AnnotationSystem
From Open Knowledge Foundation
< Projects | Open Shakespeare(Redirected from P/Open Shakespeare/AnnotationSystem)
Problems to address/features we want:
- co-ment is very slow with large documents. Adds etc to every word and every piece of whitespace in document, clientside, after load (or so it would seem).
- want to scale to arbitrarily large documents.
- want to be able to display annotations for one version of document (e.g. Folio) on a different version (e.g. 2nd Quarto).
- want to be able to display differences between versions (plug into, say, docdiff).
A possible architecture.
- Texts stored in appropriate format (XML?) in version control repository.
- Annotations stored as {document id, {start address, end address}, text, [other annotation data], ...}
- Addresses: character offsets in XML? XPath + character offset? CDATA-only character offset (allows for clean transitions to/from text/XML)? My preference would probably be for the latter -- NickStenning.
- Any (hopefully few) changes to document can be detected through VCS, and database updated: addresses modified to make sure annotations remain in the right place. Could easily(?) be made into a nice abstraction: given a directional diff between two versions of the text and a list of {start addr, end addr}, return a list of the new {start addr, end addr}. (Dropping annotations that correspond to removed text, etc.)
- Copy annotations to alternate version? Annotations with multiple locations (i.e. 'Annotation' has-many 'Documents' through 'Locations')?
- A possible algorithm for doing address translation:
- Take a smallish chunk of the text centered on the relevant address: chunk
- Find the address in the new document corresponding to the same proportional position (i.e. chunk appears 37% through the first document, so assume it will appear near there in the new document): pivot
- Fuzzy full-text search the document for chunk, weighting text nearer pivot higher than that further away.
- We'll end up with a few possible alignments, hopefully one being far more likely than the others. If so, we're sorted. If not, then at the very least we have an easy machine-assisted means of identifying the correct alignment.
I'm aware that this is all very vague. Any feedback or attempts to make my writing a bit clearer will be most appreciated, -- NickStenning.
Useful links:
- http://clientside.svn.calyptus.se/JavaScript/Native/Selection.js
- http://www.quirksmode.org/dom/range_intro.html
- http://www.webreference.com/js/column12/crossbrowser.html
Reviews of available technology:
- http://wiki.laptop.org/go/Annotation
- Not much that meets our needs
- http://blog.okfn.org/2007/01/24/thinking-about-annotation/ - summary of what we found back then