heyitsnoah + scraping   4

apexdodge/NASCAR-Screen-Scraper
"NASCAR.com, to my knowledge, does not provide an API for acquiring driver stats. Here is a screen scraper for NASCAR.com to acquire all the relevant stats and races."
nascar  scraping  code  python 
january 2012 by heyitsnoah
List of resources: Article text extraction from HTML documents
"Following up to my overview of article text extractors, I’ll try to compile a list of research papers, articles, web APIs, libraries and other software that I encountered during my research."
scraping 
march 2011 by heyitsnoah
boilerpipe
"The boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page."
code  opensource  scraping 
march 2011 by heyitsnoah
Overview: Extracting article text from HTML documents
"In the world of web scraping, text mining and article reading utilities (readability bookmarklet) there is an ever growing demand for utilities that are capable of distinguishing parts of a HTML document which represent an article apart from other common website building blocks like menus, headers, footers, ads etc."
scraping 
march 2011 by heyitsnoah

Copy this bookmark:



description:


tags: