Vaguery + search-engines 37
[1008.1191] Improved Fast Similarity Search in Dictionaries
august 2010 by Vaguery
"We engineer an algorithm to solve the approximate dictionary matching problem. Given a list of words $\mathcal{W}$, maximum distance $d$ fixed at preprocessing time and a query word $q$, we would like to retrieve all words from $\mathcal{W}$ that can be transformed into $q$ with $d$ or less edit operations. We present data structures that support fault tolerant queries by generating an index. On top of that, we present a generalization of the method that eases memory consumption and preprocessing time significantly. At the same time, running times of queries are virtually unaffected. We are able to match in lists of hundreds of thousands of words and beyond within microseconds for reasonable distances."
nudge-targets
strings
search-engines
clustering
algorithms
heuristics
august 2010 by Vaguery
[1005.4803] Hirsch index as a network centrality measure
july 2010 by Vaguery
"…The h index is compared with the Degree centrality (a local measure), the Betweenness and Eigenvector centralities (two non-local measures) in the case of a biological network (Yeast interaction protein-protein network) and a linguistic network (Moby Thesaurus II). In both networks, the Hirsch index has poor correlation with Betweenness centrality but correlates well with Eigenvector centrality, specially for the more important nodes that are relevant for ranking purposes, say in Search Engine Optimization. In the thesaurus network, the h index seems even to outperform the Eigenvector centrality measure as evaluated by simple linguistic criteria."
network-theory
linguistics
search-engines
algorithms
nudge-targets
classification
machine-learning
july 2010 by Vaguery
[1007.3799] Adapting to the Shifting Intent of Search Queries
july 2010 by Vaguery
"Search engines today present results that are often oblivious to abrupt shifts in intent. For example, the query `independence day' usually refers to a US holiday, but the intent of this query abruptly changed during the release of a major film by that name. … This paper shows that the signals a search engine receives can be used to both determine that a shift in intent has happened, as well as find a result that is now more relevant. We present a meta-algorithm that marries a classifier with a bandit algorithm to achieve regret that depends logarithmically on the number of query impressions, under certain assumptions. We provide strong evidence that this regret is close to the best achievable. Finally, via a series of experiments, we demonstrate that our algorithm outperforms prior approaches, particularly as the amount of intent-shifting traffic increases."
search-engines
search-algorithms
machine-learning
social-dynamics
algorithms
nudge-targets
intelligence-gathering
data-analysis
july 2010 by Vaguery
[1006.4270] Two-dimensional ranking of Wikipedia articles
june 2010 by Vaguery
"The Library of Babel, described by Jorge Luis Borges, stores an enormous amount of information. The Library exists {\it ab aeterno}. Wikipedia, a free online encyclopaedia, becomes a modern analogue of such a Library. Information retrieval and ranking of Wikipedia articles become the challenge of modern society. We analyze the properties of two-dimensional ranking of all Wikipedia English articles and show that it gives their reliable classification with rich and nontrivial features. Detailed studies are done for countries, universities, personalities, physicists, chess players, Dow-Jones companies and other categories."
wikipedia
search-engines
multiobjective-optimization
network-theory
network-culture
june 2010 by Vaguery
[1005.5516] On the Fly Query Entity Decomposition Using Snippets
june 2010 by Vaguery
"One of the most important issues in Information Retrieval is inferring the intents underlying users' queries. Thus, any tool to enrich or to better contextualized queries can proof extremely valuable. Entity extraction, provided it is done fast, can be one of such tools. Such techniques usually rely on a prior training phase involving large datasets. That training is costly, specially in environments which are increasingly moving towards real time scenarios where latency to retrieve fresh informacion should be minimal. In this paper an `on-the-fly' query decomposition method is proposed. It uses snippets which are mined by means of a na\"ive statistical algorithm. An initial evaluation of such a method is provided, in addition to a discussion on its applicability to different scenarios."
search-engines
natural-language-processing
algorithms
nudge-targets
text-mining
june 2010 by Vaguery
Countdown to web sentience: Oddhead Blog: Prediction Markets, Gambling, Electronic Commerce, Artificial Intelligence: David Pennock: Yahoo! Research
march 2010 by Vaguery
"I was recently explaining all this to a colleague. To make my point, we Googled that question. Low and behold, there it was: asked and answered — verbatim — on Yahoo! Answers. How many legs does a fish have? Zero. Apparently Yahoo! Answers also knows the number of legs of a crayfish, rabbit, dog, starfish, mosquito, caterpillar, crab, mealworm, and “about 133,000″ more."
web
search-engines
artificial-intelligence
digitization
susan-blackmore-comes-to-mind
march 2010 by Vaguery
Collecta Releases its Real Time API – issues challenge! « AltSearchEngines
september 2009 by Vaguery
"In conjunction with the API release, Collecta is launching a developer’s challenge with ChallengePost.com.
Dubbed “The AppMaster Challenge,” the contest will help drive the development of creative and powerful applications. From now through October 8th, developers can submit their Collecta-powered plug-in, webapp or application and the Collecta team will select the one that best exemplifies what real-time results can do. The winner will be announced on October 15th, and will receive both a featured spot as AppMaster Champion and a new 15″ MacBook Pro. There will be weekly prizes as well, and developers are encouraged to submit early and often."
search-engines
data
data-analysis
data-aggregation
competition
programming
Dubbed “The AppMaster Challenge,” the contest will help drive the development of creative and powerful applications. From now through October 8th, developers can submit their Collecta-powered plug-in, webapp or application and the Collecta team will select the one that best exemplifies what real-time results can do. The winner will be announced on October 15th, and will receive both a featured spot as AppMaster Champion and a new 15″ MacBook Pro. There will be weekly prizes as well, and developers are encouraged to submit early and often."
september 2009 by Vaguery
The Xapian Project |
august 2009 by Vaguery
"Xapian is a highly adaptable toolkit which allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also supports a rich set of boolean query operators.
If you're after a packaged search engine for your website, you should take a look at Omega: an application we supply built upon Xapian. Unlike most other website search solutions, Xapian's versatility allows you to extend Omega to meet your needs as they grow."
search-engines
library
open-source
If you're after a packaged search engine for your website, you should take a look at Omega: an application we supply built upon Xapian. Unlike most other website search solutions, Xapian's versatility allows you to extend Omega to meet your needs as they grow."
august 2009 by Vaguery
NKill Blog: NKill in PC World
june 2009 by Vaguery
"One of NKill's objectives is to catalog every referenced public machine or network. Starting with all .com, .net, .org domains, www.DOMAIN, mail exchange records, nameservers, etc. and grab the version banners of the software they are running.
Nkill will be really useful for profiling a target during a security assessment because IP4 transforms are hard to perform without a database. Given an IP4 address, shitty sites like domaintools will tell you which virtual hosts are sharing the same address, that's it and they will charge you a fee for that information. They won't tell you which organisations (domains) are trusting this IP address for their mail, nameservers, etc.
With NKill, when a new vulnerability is discovered (e.g. IIS, postfix, apache, php...) we can instantly known which domains are vulnerable; you can pull that information for a whole country and we can also monitor how long it takes for people to react and patch their boxes."
security
search-engines
database
networks
social-networks
system-administration
malware
transparency
Nkill will be really useful for profiling a target during a security assessment because IP4 transforms are hard to perform without a database. Given an IP4 address, shitty sites like domaintools will tell you which virtual hosts are sharing the same address, that's it and they will charge you a fee for that information. They won't tell you which organisations (domains) are trusting this IP address for their mail, nameservers, etc.
With NKill, when a new vulnerability is discovered (e.g. IIS, postfix, apache, php...) we can instantly known which domains are vulnerable; you can pull that information for a whole country and we can also monitor how long it takes for people to react and patch their boxes."
june 2009 by Vaguery
Beg the Internet...
may 2009 by Vaguery
"Google is beginning to fail to scale: there are now so many things on the internet and my memory for unique key words is so foggy that I can no longer find things I know exist."
anecdote
findability
Google
disintermediation-targets
search-engines
it's-people
may 2009 by Vaguery
OAIster | About
april 2009 by Vaguery
"OAIster is a union catalog of digital resources. We provide access to these digital resources by "harvesting" their descriptive metadata (records) using OAI-PMH (the Open Archives Initiative Protocol for Metadata Harvesting). The Open Archives Initiative is not the same thing as the Open Access movement."
open-archives
archive
union-catalog
digitization
open-access
reference
search-engines
collections
april 2009 by Vaguery
AltSearchEngines » Blog Archive » How to Search for Influencers with Datanetis
december 2008 by Vaguery
Be braced:
"For someone that has been working building software for the marketing automation industry over 8 years now and is familiar with multiple solutions for finding the right prospect out of many, it was an eye opener. I’m evidencing the progression from mass email campaigns through marketing to target individuals with a matching/relevant offers (data mining, behavioral pattern, collaborate filtering, recommendation engines) to finding customers that can market for you - agents."
social-networks
marketing
influence
advertising
data-mining
networks
search-engines
"For someone that has been working building software for the marketing automation industry over 8 years now and is familiar with multiple solutions for finding the right prospect out of many, it was an eye opener. I’m evidencing the progression from mass email campaigns through marketing to target individuals with a matching/relevant offers (data mining, behavioral pattern, collaborate filtering, recommendation engines) to finding customers that can market for you - agents."
december 2008 by Vaguery
I, Cringely . The Pulpit . War of the Worlds | PBS
may 2008 by Vaguery
"Because that's not the way we do it, that's why."
via:hrheingold
education
search-engines
pedagogy
futurism
teaching
public-policy
institutional-design
academia
cultural-norms
may 2008 by Vaguery
SEO Rapping - a thaumaturgical compendium
april 2008 by Vaguery
"...but make sure you use good color combinations..."
search-engines
rap
SEO
web-design
april 2008 by Vaguery
open...: Paying the Price for Google
january 2008 by Vaguery
"The web has become over-fitted to Google like a strain of wheat becomes over-designed to a specific ecology."
Google
search-engines
SEO
monoculture
opportunity
risk
convergence
innovation
extinction
january 2008 by Vaguery
open...: What's a Paglo?
november 2007 by Vaguery
"We are maniacally focused on delivering the most value, for the most users, as quickly as possible."
search-engines
Paglo
business-model
business-plan
openness
GNU
open-source
reputational-revenue
november 2007 by Vaguery
Kung Fu Grippe
october 2007 by Vaguery
"Most SEOs are making headphones out of coconuts, hoping it brings traffic, and then wondering why the gods are so angry at them. They never get that the headphones probably aren't hooked up to anything but their make-believe radio."
via:nelson
SEO
search-engines
optimization
Google
cargo-cult
analogy
Merlin-Mann
web-analytics
analytics
marketing
october 2007 by Vaguery
Science Direct-ly into Google
july 2007 by Vaguery
"Both information seekers and publishers bear the responsibility of remembering that the Lens of Google through which we increasingly seek the world is only one lens, albeit one with further and further vision."
elsevier
Google
ScienceDirect
publishing
search-engines
googlescholar
academia
web2.0
july 2007 by Vaguery
XPoogle - an Agile search tool
june 2007 by Vaguery
Collaborative special-topic Google search subsetting engine
Google
extreme-programming
XP
search-engines
hacking
social-networks
archive
idea
web2.0
june 2007 by Vaguery
MeLCat, the Michigan eLibrary Catalog and Resource Sharing System
april 2007 by Vaguery
The new "interlibrary loan" search /catalog aggregator system for Michigan libraries.
via:vielmetti
Michigan
local
Ann-Arbor
library2.0
archive
catalog
books
search-engines
april 2007 by Vaguery
O'Reilly Radar > Worldcat Identities
february 2007 by Vaguery
More info on Worldcat identities beta
Worldcat-identities
search-engines
networks
authors
research
tools
february 2007 by Vaguery
WorldCat Identities
february 2007 by Vaguery
New (beta) WorldCat search interface: type an author name, and see a list of hits ranked by the number of holdings in all cataloged libraries.
worldcat
book-search
library
library2.0
web2.0
search-engines
authors
worldcat-identities
february 2007 by Vaguery
related tags
academia ⊕ academic ⊕ administration ⊕ advertising ⊕ aggregation ⊕ algorithms ⊕ analogy ⊕ analytics ⊕ anecdote ⊕ Ann-Arbor ⊕ antiquarian ⊕ archive ⊕ archiving ⊕ artificial-intelligence ⊕ artist ⊕ authors ⊕ bibliography ⊕ bibliomania ⊕ blogging ⊕ book ⊕ book-search ⊕ books ⊕ bookselling ⊕ branding ⊕ business ⊕ business-model ⊕ business-plan ⊕ cargo-cult ⊕ catalog ⊕ citation ⊕ classification ⊕ clustering ⊕ collections ⊕ competition ⊕ convergence ⊕ crowdmining ⊕ cultural-norms ⊕ culture ⊕ data ⊕ data-aggregation ⊕ data-analysis ⊕ data-mining ⊕ database ⊕ del.icio.us ⊕ design ⊕ digitization ⊕ disintermediation-targets ⊕ education ⊕ elsevier ⊕ extinction ⊕ extreme-programming ⊕ findability ⊕ futurism ⊕ geek-port ⊕ GNU ⊕ google ⊕ googlescholar ⊕ hacking ⊕ heuristics ⊕ idea ⊕ images ⊕ indexing ⊕ influence ⊕ innovation ⊕ institutional-design ⊕ intellectual-property ⊕ intelligence-gathering ⊕ interactive ⊕ it's-people ⊕ journals ⊕ law ⊕ library ⊕ library2.0 ⊕ linguistics ⊕ lists ⊕ local ⊕ machine-learning ⊕ mailing-lists ⊕ malware ⊕ marketing ⊕ media ⊕ Merlin-Mann ⊕ metrics ⊕ Michigan ⊕ monoculture ⊕ multiobjective-optimization ⊕ natural-language ⊕ natural-language-processing ⊕ network-culture ⊕ network-theory ⊕ networks ⊕ NLP ⊕ nudge-targets ⊕ OCR ⊕ open-access ⊕ open-archives ⊕ open-source ⊕ openness ⊕ opportunity ⊕ optimization ⊕ Paglo ⊕ pedagogy ⊕ performance ⊕ photography ⊕ programming ⊕ proper-names ⊕ public-policy ⊕ publishing ⊕ ranking ⊕ rap ⊕ reference ⊕ reputational-revenue ⊕ research ⊕ risk ⊕ ScienceDirect ⊕ search-algorithms ⊕ search-engines ⊖ security ⊕ SEO ⊕ sexism ⊕ shopping ⊕ social-dynamics ⊕ social-networks ⊕ specialization ⊕ statistics ⊕ strings ⊕ susan-blackmore-comes-to-mind ⊕ system-administration ⊕ tagging ⊕ teaching ⊕ technology ⊕ text-mining ⊕ theme ⊕ tools ⊕ trademark ⊕ transparency ⊕ union-catalog ⊕ via:hrheingold ⊕ via:nelson ⊕ via:vielmetti ⊕ visualization ⊕ web ⊕ web-analytics ⊕ web-design ⊕ web2.0 ⊕ wikipedia ⊕ wordpress ⊕ worldcat ⊕ worldcat-identities ⊕ XP ⊕Copy this bookmark: