JSDOM Memory leaks — Luke Berndt
7 weeks ago by rybesh
JSDOM is a great little module for NodeJS which lets you parse a DOM on the server. The only problem is that it has a memory leak. Not a big deal if you are only going to instantiate a couple times. A little trickier if you are screen scraping and need to call it 1000s of times. I luckily found a work around. Instead of creating a new window every time you want to parse some code, simply keep the same window around and switch what it is displaying.
nodejs
jsdom
scraping
7 weeks ago by rybesh
any23 - Anything to Triples - Google Project Hosting
february 2012 by rybesh
Anything To Triples (Any23) is a library, a Web service and a set of command line tools for extracting structured data in RDF format from a variety of Web documents.
rdf
semweb
tools
scraping
february 2012 by rybesh
PhantomJS: Headless WebKit with JavaScript API
february 2012 by rybesh
PhantomJS is a headless WebKit with JavaScript API. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG.
PhantomJS is an optimal solution for fast headless testing, site scraping, pages capture, SVG renderer, network monitoring and many other use cases.
javascript
scraping
testing
PhantomJS is an optimal solution for fast headless testing, site scraping, pages capture, SVG renderer, network monitoring and many other use cases.
february 2012 by rybesh
tmpvar/jsdom - GitHub
january 2012 by rybesh
A javascript implementation of the W3C DOM.
dom
javascript
nodejs
jquery
scraping
january 2012 by rybesh
mape/node-scraper - GitHub
april 2011 by rybesh
A little module that makes scraping websites a little easier. Uses node.js and jQuery.
jquery
nodejs
scraping
april 2011 by rybesh
Scraping Made Easy with jQuery and SelectorGadget - David Trejo's Thoughts
april 2011 by rybesh
A list of scraping tools and resources which will make your life MUCH easier the next time you need some information from a crufty old website.
nodejs
jquery
scraping
howto
april 2011 by rybesh
List of resources: Article text extraction from HTML documents | My tech blog.
march 2011 by rybesh
A list of research papers, articles, web APIs, libraries and other software for article text extraction.
datamining
extraction
html
scraping
march 2011 by rybesh
Overview: Extracting article text from HTML documents | My tech blog.
march 2011 by rybesh
In the world of web scraping, text mining and article reading utilities (readability bookmarklet) there is an ever growing demand for utilities that are capable of distinguishing parts of a HTML document which represent an article apart from other common website building blocks like menus, headers, footers, ads etc.
datamining
extraction
html
scraping
march 2011 by rybesh
jsdom + jQuery in 5 lines with node.js - blog.nodejitsu.com - scaling node.js applications one callback at a time.
february 2011 by rybesh
By working with server-side Javascript (in this case node.js) developers can use widely accepted and battle-hardened libraries such as jQuery on the server thanks to jsdom, a server-side implementation of the DOM apis.
nodejs
scraping
jquery
february 2011 by rybesh
ScraperWiki
july 2010 by rybesh
Anyone can write a screen scraper using the online editor, and the code and data are shared with the world.
datamining
opendata
scraping
july 2010 by rybesh
Copy this bookmark: