threedaymonk + text   9

Goose
Instapaper/Readability-style article content extraction library.
java  text  html  webdev 
may 2011 by threedaymonk
Extractomatic
Web API interface to Boilerpipe.
text  web 
march 2010 by threedaymonk
Boilerpipe
‘Boilerplate Removal and Fulltext Extraction from HTML pages’
java  library  text  html  web 
march 2010 by threedaymonk
TxtSushi
‘TxtSushi is a collection of command line utilities for processing comma-separated and tab-delimited files.’ Can do SQL-style SELECT queries joining multiple files. Written in Haskell.
text  sql  csv  haskell 
may 2009 by threedaymonk
Yoshikoder
‘[A] a cross-platform multilingual content analysis program.’
text  linguistics  analysis 
october 2008 by threedaymonk
The Xapian Project
‘Xapian is an Open Source Search Engine Library, released under the GPL.’
search  c++  library  text  open-source 
july 2008 by threedaymonk
crossmark-spec
An extensible markdown-based format.
webdev  text  markup 
april 2008 by threedaymonk

Copy this bookmark:



description:


tags: