Doc⚡split
december 2010 by jonty
"Docsplit is a command-line utility and Ruby library for splitting apart documents into their component parts: searchable UTF-8 plain text via OCR if necessary, page images or thumbnails in any format, PDFs, single pages, and document metadata (title, author, number of pages...)"
ruby
pdf
document
parsing
ocr
documents
data
processing
split
from delicious
december 2010 by jonty
Pandoc
march 2010 by jonty
Pandoc is a tool for converting from one markup format to another. It can read markdown and (subsets of) reStructuredText, HTML, and LaTeX, and it can write markdown, reStructuredText, HTML, LaTeX, ConTeXt, PDF, RTF, DocBook XML, OpenDocument XML, ODT, GNU Texinfo, MediaWiki markup, groff man pages, and S5 HTML slide shows.
convert
document
documentation
html
markdown
pdf
text
restructuredtext
writing
latex
conversion
docbook
haskell
markup
pandoc
publishing
tex
converter
march 2010 by jonty
WaveNZ Development: Introduction to Operational Transformation
august 2009 by jonty
OT is a technology for supporting concurrent editing of single shared document by a number of parties. OT provides for a way of transmitting the edits that one party is performing against the document to all other parties, but much more importantly it provides a way to fix issues that arise along the way from concurrent edits being performed.
algorithms
google
text
realtime
distributedsystems
distributed
editing
editor
document
concurrent
august 2009 by jonty
related tags
algorithms ⊕ concurrent ⊕ conversion ⊕ convert ⊕ converter ⊕ data ⊕ distributed ⊕ distributedsystems ⊕ docbook ⊕ document ⊖ documentation ⊕ documents ⊕ editing ⊕ editor ⊕ google ⊕ haskell ⊕ html ⊕ latex ⊕ markdown ⊕ markup ⊕ ocr ⊕ pandoc ⊕ parsing ⊕ pdf ⊕ processing ⊕ publishing ⊕ realtime ⊕ restructuredtext ⊕ ruby ⊕ split ⊕ tex ⊕ text ⊕ writing ⊕Copy this bookmark: