heyitsnoah + scraping 4
apexdodge/NASCAR-Screen-Scraper
january 2012 by heyitsnoah
"NASCAR.com, to my knowledge, does not provide an API for acquiring driver stats. Here is a screen scraper for NASCAR.com to acquire all the relevant stats and races."
nascar
scraping
code
python
january 2012 by heyitsnoah
List of resources: Article text extraction from HTML documents
march 2011 by heyitsnoah
"Following up to my overview of article text extractors, I’ll try to compile a list of research papers, articles, web APIs, libraries and other software that I encountered during my research."
scraping
march 2011 by heyitsnoah
boilerpipe
march 2011 by heyitsnoah
"The boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page."
code
opensource
scraping
march 2011 by heyitsnoah
Overview: Extracting article text from HTML documents
march 2011 by heyitsnoah
"In the world of web scraping, text mining and article reading utilities (readability bookmarklet) there is an ever growing demand for utilities that are capable of distinguishing parts of a HTML document which represent an article apart from other common website building blocks like menus, headers, footers, ads etc."
scraping
march 2011 by heyitsnoah