rybesh + scraping   10

JSDOM Memory leaks — Luke Berndt
JSDOM is a great little module for NodeJS which lets you parse a DOM on the server. The only problem is that it has a memory leak. Not a big deal if you are only going to instantiate a couple times. A little trickier if you are screen scraping and need to call it 1000s of times. I luckily found a work around. Instead of creating a new window every time you want to parse some code, simply keep the same window around and switch what it is displaying.
nodejs  jsdom  scraping 
7 weeks ago by rybesh
any23 - Anything to Triples - Google Project Hosting
Anything To Triples (Any23) is a library, a Web service and a set of command line tools for extracting structured data in RDF format from a variety of Web documents.
rdf  semweb  tools  scraping 
february 2012 by rybesh
PhantomJS: Headless WebKit with JavaScript API
PhantomJS is a headless WebKit with JavaScript API. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG.

PhantomJS is an optimal solution for fast headless testing, site scraping, pages capture, SVG renderer, network monitoring and many other use cases.
javascript  scraping  testing 
february 2012 by rybesh
tmpvar/jsdom - GitHub
A javascript implementation of the W3C DOM.
dom  javascript  nodejs  jquery  scraping 
january 2012 by rybesh
mape/node-scraper - GitHub
A little module that makes scraping websites a little easier. Uses node.js and jQuery.
jquery  nodejs  scraping 
april 2011 by rybesh
Scraping Made Easy with jQuery and SelectorGadget - David Trejo's Thoughts
A list of scraping tools and resources which will make your life MUCH easier the next time you need some information from a crufty old website.
nodejs  jquery  scraping  howto 
april 2011 by rybesh
List of resources: Article text extraction from HTML documents | My tech blog.
A list of research papers, articles, web APIs, libraries and other software for article text extraction.
datamining  extraction  html  scraping 
march 2011 by rybesh
Overview: Extracting article text from HTML documents | My tech blog.
In the world of web scraping, text mining and article reading utilities (readability bookmarklet) there is an ever growing demand for utilities that are capable of distinguishing parts of a HTML document which represent an article apart from other common website building blocks like menus, headers, footers, ads etc.
datamining  extraction  html  scraping 
march 2011 by rybesh
jsdom + jQuery in 5 lines with node.js - blog.nodejitsu.com - scaling node.js applications one callback at a time.
By working with server-side Javascript (in this case node.js) developers can use widely accepted and battle-hardened libraries such as jQuery on the server thanks to jsdom, a server-side implementation of the DOM apis.
nodejs  scraping  jquery 
february 2011 by rybesh
ScraperWiki
Anyone can write a screen scraper using the online editor, and the code and data are shared with the world.
datamining  opendata  scraping 
july 2010 by rybesh

Copy this bookmark:



description:


tags: