About Nutch
january 2011 by amy
Nutch is open source web-search software. It builds on Lucene and Solr, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc.
Nutch can run on a single machine, but gains a lot of its strength from running in a Hadoop cluster
search
solr
lucene
hadoop
open_source
Nutch can run on a single machine, but gains a lot of its strength from running in a Hadoop cluster
january 2011 by amy
[no title]
september 2010 by amy
"Search in files. Quickly.
VisualAck is like grep (or ack), except faster and with UI. For Mac."
osx
software
search
utilities
VisualAck is like grep (or ack), except faster and with UI. For Mac."
september 2010 by amy
Searchtastic.com - search Twitter history and export tweets to Excel
may 2010 by amy
Yes, @Bill_Romanos, I do like this tool.
twitter
search
from twitter
may 2010 by amy
TinEye Reverse Image Search
february 2010 by amy
TinEye is a reverse image search engine. You can submit an image to TinEye to find out where it came from, how it is being used, if modified versions of the image exist, or to find higher resolution versions. TinEye is the first image search engine on the web to use image identification technology rather than keywords, metadata or watermarks.
images
search
photography
february 2010 by amy
ElasticSearch - ElasticSearch Overview
february 2010 by amy
Elastic Search: open source, schema-free, documented-based, JSON, REST
apache
search
february 2010 by amy
gaelucene - Project Hosting on Google Code
january 2010 by amy
GAELucene is a lucene component that can help you to run search applications on Google AppEngine.
gae
md
google
search
january 2010 by amy
:: Blacklight
december 2009 by amy
Blacklight is a free and open source ruby-on-rails based discovery interface (a.k.a. “next-generation catalog”) especially optimized for heterogeneous collections. You can use it as a library catalog, as a front end for a digital repository, or as a single-search interface to aggregate digital content that would otherwise be siloed. (uses solr)
search
discovery
rails
ruby
from delicious
december 2009 by amy
StupidFilter :: Main / About
november 2009 by amy
...It's time to fight back.
The solution we're creating is simple: an open-source filter software that can detect rampant stupidity in written English. This will be accomplished with weighted Bayesian or similar analysis and some rules-based processing, similar to spam detection engines. The primary challenge inherent in our task is that stupidity is not a binary distinction, but rather a matter of degree. To this end, we're collecting a ranked corpus of stupid text, gleaned from user comments on public websites and ranked on a five-point scale.
internet
language
search
statistics
amusements
The solution we're creating is simple: an open-source filter software that can detect rampant stupidity in written English. This will be accomplished with weighted Bayesian or similar analysis and some rules-based processing, similar to spam detection engines. The primary challenge inherent in our task is that stupidity is not a binary distinction, but rather a matter of degree. To this end, we're collecting a ranked corpus of stupid text, gleaned from user comments on public websites and ranked on a five-point scale.
november 2009 by amy
Google Social Search Launches, Gives Results From Your Trusted
october 2009 by amy
for a good explanation of google's social search see post @sengineland http://bit.ly/EFBBX
twitter_fav
@emilychang
google
search
social_media
october 2009 by amy
Welcome to Solr
october 2009 by amy
Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, a web administration interface and many more features. It runs in a Java servlet container such as Tomcat.
search
open_source
apache
java
from delicious
october 2009 by amy
Able Grape, the wine information search engine
september 2009 by amy
Search 21 million pages of trustworthy wine information. - uses hadoop
wine
search
hadoop
mapreduce
from delicious
september 2009 by amy
Finding, Locating, Discovering | The Noisy Channel
september 2009 by amy
The difference between finding, locating and discovering: http://bit.ly/z7hVC (via my @GigaOMPro colleague @edgubbins) (via @celrae)
twitter_fav
@om
search
discovery
social_media
google
september 2009 by amy
Hospital Search
august 2009 by amy
Search over 2,800 U.S. Hospital Web Sites:
(click on “more results” for the regular Google window)
search
health
medicine
(click on “more results” for the regular Google window)
august 2009 by amy
Whoosh
april 2009 by amy
Whoosh is a fast, featureful full-text indexing and searching library implemented in pure Python.
python
search
april 2009 by amy
Lucene / Solr
february 2009 by amy
search and indexing technology
search
java
open_source
apache
framework
library
february 2009 by amy
Official Google Webmaster Central Blog: A spider's view of Web 2.0
october 2008 by amy
on spidering ajax-y sites
ajax
javascript
search
october 2008 by amy
related tags
@emilychang ⊕ @om ⊕ academia ⊕ ack ⊕ activism ⊕ aggregator ⊕ AI ⊕ ajax ⊕ algorithms ⊕ amazon ⊕ amusements ⊕ analysis ⊕ annotation ⊕ apache ⊕ api ⊕ apis ⊕ archive ⊕ arghh ⊕ art ⊕ audio ⊕ australia ⊕ aws ⊕ bioinformatics ⊕ blogging ⊕ books ⊕ business ⊕ censorship ⊕ charity ⊕ classification ⊕ cli ⊕ code ⊕ collaboration ⊕ comment ⊕ computing ⊕ concurrency ⊕ cool ⊕ copyright ⊕ creative_commons ⊕ culture ⊕ database ⊕ datamining ⊕ deep_web ⊕ design ⊕ discovery ⊕ distributed ⊕ education ⊕ email ⊕ essays ⊕ events ⊕ extensions ⊕ firefox ⊕ folksonomy ⊕ framework ⊕ gae ⊕ geekery ⊕ genetics ⊕ genomics ⊕ geography ⊕ google ⊕ gps ⊕ hadoop ⊕ hardware ⊕ health ⊕ history ⊕ images ⊕ image_processing ⊕ information_integration ⊕ information_management ⊕ information_retrieval ⊕ internet ⊕ interviews ⊕ java ⊕ javascript ⊕ journalism ⊕ json ⊕ kml ⊕ language ⊕ law ⊕ leopard ⊕ libraries ⊕ library ⊕ linguistics ⊕ linux ⊕ literature ⊕ lucene ⊕ mapreduce ⊕ maps ⊕ markup ⊕ md ⊕ media ⊕ medicine ⊕ microsoft ⊕ mobile ⊕ movies ⊕ music ⊕ mysql ⊕ networks ⊕ neuroscience ⊕ news ⊕ nlp ⊕ nonprofit ⊕ open_source ⊕ opinion ⊕ osx ⊕ personalization ⊕ photography ⊕ photos ⊕ photoshop ⊕ pipes ⊕ plugins ⊕ politics ⊕ portals ⊕ privacy ⊕ productivity ⊕ programming ⊕ python ⊕ rails ⊕ reference ⊕ research ⊕ rest ⊕ rss ⊕ ruby ⊕ scalability ⊕ search ⊖ security ⊕ semantics ⊕ semantic_web ⊕ seo ⊕ social_media ⊕ society ⊕ software ⊕ software/social ⊕ solr ⊕ sphinx ⊕ standards ⊕ statistics ⊕ storage ⊕ tagging ⊕ tbr ⊕ technology ⊕ tips ⊕ tools ⊕ travel ⊕ tutorials ⊕ twitter ⊕ twitter_fav ⊕ usa ⊕ usability ⊕ utilities ⊕ verticals ⊕ videos ⊕ visualizations ⊕ web_design ⊕ web_dev ⊕ web_services ⊕ wine ⊕ writing ⊕ xml ⊕ xmlhttprequest ⊕ yay ⊕Copy this bookmark: