Overview: Extracting article text from HTML documents | My tech blog.
march 2011 by jpfinley
In the world of web scraping, text mining and article reading utilities (readability bookmarklet) there is an ever growing demand for utilities that are capable of distinguishing parts of a HTML document which represent an article apart from other common website building blocks like menus, headers, footers, ads etc.
In the following chapters I’ll try to review some article text extraction methods that are applicable to today’s websites. They mostly leverage on machine learning, statistics and a wide rage of heuristics.
html
scrape
scraping
extraction
text
scripting
In the following chapters I’ll try to review some article text extraction methods that are applicable to today’s websites. They mostly leverage on machine learning, statistics and a wide rage of heuristics.
march 2011 by jpfinley
Why you should know just a little Awk
september 2010 by jpfinley
In grad school, I once saw a prof I was working with grab a text file and in seconds manipulate it into little pieces so deftly it blew my mind. I immediately decided it was time for me to learn awk, which he had so clearly mastered.
awk
unix
programming
text
september 2010 by jpfinley
ASCIImeo [WebApp]
january 2010 by jpfinley
ASCIImeo (asciimeo.com) is a project by Peter Nitsch that renders Vimeo videos in three different textmodes.
Here are some of Peter’s favorite videos in ASCII:
Metamorphosis by Glenn Marshall
MOTOR / AMBIENT REEL by KU-SCHNEIDER
Magnetic Ink by flight404
Still Run by DANTE NOU
Look At Me by Patrick Lawler
Read more incl technical info on Peter’s blog. Video below is of Glenn Marshall’s generative app also available for the iPhone.
(via today and tomorrow incl images = ASCIImeo version of Scintillation)
ASCII Animation (Scripts)WriteRoom Now with WebApp Sync [iPhone, WebApp]Chrome Experiments [News]12seconds [WebApp]A Tool to Deceive and Slaughter [Objects]AsciiMe [iPhone]
WebApp
ascii
peternitsch
render
text
video
vimeo
from google
Here are some of Peter’s favorite videos in ASCII:
Metamorphosis by Glenn Marshall
MOTOR / AMBIENT REEL by KU-SCHNEIDER
Magnetic Ink by flight404
Still Run by DANTE NOU
Look At Me by Patrick Lawler
Read more incl technical info on Peter’s blog. Video below is of Glenn Marshall’s generative app also available for the iPhone.
(via today and tomorrow incl images = ASCIImeo version of Scintillation)
ASCII Animation (Scripts)WriteRoom Now with WebApp Sync [iPhone, WebApp]Chrome Experiments [News]12seconds [WebApp]A Tool to Deceive and Slaughter [Objects]AsciiMe [iPhone]
january 2010 by jpfinley
UTI: The Universal Text Imitator
august 2007 by jpfinley
I decided to take on the task of making a general-purpose text generator, that will imitate the style and content of any arbitrary document.
perl
language
fun
text
speech
word
august 2007 by jpfinley
WriteRoom | Hog Bay Software
september 2006 by jpfinley
WriteRoom is a full screen, distraction free, writing environment.
software
mac
text
editor
osx
apps
september 2006 by jpfinley
Copy this bookmark: