jpfinley + text   6

Overview: Extracting article text from HTML documents | My tech blog.
In the world of web scraping, text mining and article reading utilities (readability bookmarklet) there is an ever growing demand for utilities that are capable of distinguishing parts of a HTML document which represent an article apart from other common website building blocks like menus, headers, footers, ads etc.

In the following chapters I’ll try to review some article text extraction methods that are applicable to today’s websites. They mostly leverage on machine learning, statistics and a wide rage of heuristics.
html  scrape  scraping  extraction  text  scripting 
march 2011 by jpfinley
Why you should know just a little Awk
In grad school, I once saw a prof I was working with grab a text file and in seconds manipulate it into little pieces so deftly it blew my mind. I immediately decided it was time for me to learn awk, which he had so clearly mastered.
awk  unix  programming  text 
september 2010 by jpfinley
ASCIImeo [WebApp]
ASCIImeo (asciimeo.com) is a project by Peter Nitsch that renders Vimeo videos in three different textmodes.

Here are some of Peter’s favorite videos in ASCII:
Metamorphosis by Glenn Marshall
MOTOR / AMBIENT REEL by KU-SCHNEIDER
Magnetic Ink by flight404
Still Run by DANTE NOU
Look At Me by Patrick Lawler

Read more incl technical info on Peter’s blog. Video below is of Glenn Marshall’s generative app also available for the iPhone.

(via today and tomorrow incl images = ASCIImeo version of Scintillation)

ASCII Animation (Scripts)WriteRoom Now with WebApp Sync [iPhone, WebApp]Chrome Experiments [News]12seconds [WebApp]A Tool to Deceive and Slaughter [Objects]AsciiMe [iPhone]
WebApp  ascii  peternitsch  render  text  video  vimeo  from google
january 2010 by jpfinley
UTI: The Universal Text Imitator
I decided to take on the task of making a general-purpose text generator, that will imitate the style and content of any arbitrary document.
perl  language  fun  text  speech  word 
august 2007 by jpfinley
WriteRoom | Hog Bay Software
WriteRoom is a full screen, distraction free, writing environment.
software  mac  text  editor  osx  apps 
september 2006 by jpfinley

Copy this bookmark:



description:


tags: