Vaguery + search-engines   37

[1008.1191] Improved Fast Similarity Search in Dictionaries
"We engineer an algorithm to solve the approximate dictionary matching problem. Given a list of words $\mathcal{W}$, maximum distance $d$ fixed at preprocessing time and a query word $q$, we would like to retrieve all words from $\mathcal{W}$ that can be transformed into $q$ with $d$ or less edit operations. We present data structures that support fault tolerant queries by generating an index. On top of that, we present a generalization of the method that eases memory consumption and preprocessing time significantly. At the same time, running times of queries are virtually unaffected. We are able to match in lists of hundreds of thousands of words and beyond within microseconds for reasonable distances."
nudge-targets  strings  search-engines  clustering  algorithms  heuristics 
august 2010 by Vaguery
[1005.4803] Hirsch index as a network centrality measure
"…The h index is compared with the Degree centrality (a local measure), the Betweenness and Eigenvector centralities (two non-local measures) in the case of a biological network (Yeast interaction protein-protein network) and a linguistic network (Moby Thesaurus II). In both networks, the Hirsch index has poor correlation with Betweenness centrality but correlates well with Eigenvector centrality, specially for the more important nodes that are relevant for ranking purposes, say in Search Engine Optimization. In the thesaurus network, the h index seems even to outperform the Eigenvector centrality measure as evaluated by simple linguistic criteria."
network-theory  linguistics  search-engines  algorithms  nudge-targets  classification  machine-learning 
july 2010 by Vaguery
[1007.3799] Adapting to the Shifting Intent of Search Queries
"Search engines today present results that are often oblivious to abrupt shifts in intent. For example, the query `independence day' usually refers to a US holiday, but the intent of this query abruptly changed during the release of a major film by that name. … This paper shows that the signals a search engine receives can be used to both determine that a shift in intent has happened, as well as find a result that is now more relevant. We present a meta-algorithm that marries a classifier with a bandit algorithm to achieve regret that depends logarithmically on the number of query impressions, under certain assumptions. We provide strong evidence that this regret is close to the best achievable. Finally, via a series of experiments, we demonstrate that our algorithm outperforms prior approaches, particularly as the amount of intent-shifting traffic increases."
search-engines  search-algorithms  machine-learning  social-dynamics  algorithms  nudge-targets  intelligence-gathering  data-analysis 
july 2010 by Vaguery
[1006.4270] Two-dimensional ranking of Wikipedia articles
"The Library of Babel, described by Jorge Luis Borges, stores an enormous amount of information. The Library exists {\it ab aeterno}. Wikipedia, a free online encyclopaedia, becomes a modern analogue of such a Library. Information retrieval and ranking of Wikipedia articles become the challenge of modern society. We analyze the properties of two-dimensional ranking of all Wikipedia English articles and show that it gives their reliable classification with rich and nontrivial features. Detailed studies are done for countries, universities, personalities, physicists, chess players, Dow-Jones companies and other categories."
wikipedia  search-engines  multiobjective-optimization  network-theory  network-culture 
june 2010 by Vaguery
[1005.5516] On the Fly Query Entity Decomposition Using Snippets
"One of the most important issues in Information Retrieval is inferring the intents underlying users' queries. Thus, any tool to enrich or to better contextualized queries can proof extremely valuable. Entity extraction, provided it is done fast, can be one of such tools. Such techniques usually rely on a prior training phase involving large datasets. That training is costly, specially in environments which are increasingly moving towards real time scenarios where latency to retrieve fresh informacion should be minimal. In this paper an `on-the-fly' query decomposition method is proposed. It uses snippets which are mined by means of a na\"ive statistical algorithm. An initial evaluation of such a method is provided, in addition to a discussion on its applicability to different scenarios."
search-engines  natural-language-processing  algorithms  nudge-targets  text-mining 
june 2010 by Vaguery
Countdown to web sentience: Oddhead Blog: Prediction Markets, Gambling, Electronic Commerce, Artificial Intelligence: David Pennock: Yahoo! Research
"I was recently explaining all this to a colleague. To make my point, we Googled that question. Low and behold, there it was: asked and answered — verbatim — on Yahoo! Answers. How many legs does a fish have? Zero. Apparently Yahoo! Answers also knows the number of legs of a crayfish, rabbit, dog, starfish, mosquito, caterpillar, crab, mealworm, and “about 133,000″ more."
web  search-engines  artificial-intelligence  digitization  susan-blackmore-comes-to-mind 
march 2010 by Vaguery
Collecta Releases its Real Time API – issues challenge! « AltSearchEngines
"In conjunction with the API release, Collecta is launching a developer’s challenge with ChallengePost.com.

Dubbed “The AppMaster Challenge,” the contest will help drive the development of creative and powerful applications. From now through October 8th, developers can submit their Collecta-powered plug-in, webapp or application and the Collecta team will select the one that best exemplifies what real-time results can do. The winner will be announced on October 15th, and will receive both a featured spot as AppMaster Champion and a new 15″ MacBook Pro. There will be weekly prizes as well, and developers are encouraged to submit early and often."
search-engines  data  data-analysis  data-aggregation  competition  programming 
september 2009 by Vaguery
The Xapian Project |
"Xapian is a highly adaptable toolkit which allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also supports a rich set of boolean query operators.

If you're after a packaged search engine for your website, you should take a look at Omega: an application we supply built upon Xapian. Unlike most other website search solutions, Xapian's versatility allows you to extend Omega to meet your needs as they grow."
search-engines  library  open-source 
august 2009 by Vaguery
NKill Blog: NKill in PC World
"One of NKill's objectives is to catalog every referenced public machine or network. Starting with all .com, .net, .org domains, www.DOMAIN, mail exchange records, nameservers, etc. and grab the version banners of the software they are running.

Nkill will be really useful for profiling a target during a security assessment because IP4 transforms are hard to perform without a database. Given an IP4 address, shitty sites like domaintools will tell you which virtual hosts are sharing the same address, that's it and they will charge you a fee for that information. They won't tell you which organisations (domains) are trusting this IP address for their mail, nameservers, etc.

With NKill, when a new vulnerability is discovered (e.g. IIS, postfix, apache, php...) we can instantly known which domains are vulnerable; you can pull that information for a whole country and we can also monitor how long it takes for people to react and patch their boxes."
security  search-engines  database  networks  social-networks  system-administration  malware  transparency 
june 2009 by Vaguery
Beg the Internet...
"Google is beginning to fail to scale: there are now so many things on the internet and my memory for unique key words is so foggy that I can no longer find things I know exist."
anecdote  findability  Google  disintermediation-targets  search-engines  it's-people 
may 2009 by Vaguery
OAIster | About
"OAIster is a union catalog of digital resources. We provide access to these digital resources by "harvesting" their descriptive metadata (records) using OAI-PMH (the Open Archives Initiative Protocol for Metadata Harvesting). The Open Archives Initiative is not the same thing as the Open Access movement."
open-archives  archive  union-catalog  digitization  open-access  reference  search-engines  collections 
april 2009 by Vaguery
AltSearchEngines » Blog Archive » How to Search for Influencers with Datanetis
Be braced:

"For someone that has been working building software for the marketing automation industry over 8 years now and is familiar with multiple solutions for finding the right prospect out of many, it was an eye opener. I’m evidencing the progression from mass email campaigns through marketing to target individuals with a matching/relevant offers (data mining, behavioral pattern, collaborate filtering, recommendation engines) to finding customers that can market for you - agents."
social-networks  marketing  influence  advertising  data-mining  networks  search-engines 
december 2008 by Vaguery
SEO Rapping - a thaumaturgical compendium
"...but make sure you use good color combinations..."
search-engines  rap  SEO  web-design 
april 2008 by Vaguery
open...: Paying the Price for Google
"The web has become over-fitted to Google like a strain of wheat becomes over-designed to a specific ecology."
Google  search-engines  SEO  monoculture  opportunity  risk  convergence  innovation  extinction 
january 2008 by Vaguery
open...: What's a Paglo?
"We are maniacally focused on delivering the most value, for the most users, as quickly as possible."
search-engines  Paglo  business-model  business-plan  openness  GNU  open-source  reputational-revenue 
november 2007 by Vaguery
Kung Fu Grippe
"Most SEOs are making headphones out of coconuts, hoping it brings traffic, and then wondering why the gods are so angry at them. They never get that the headphones probably aren't hooked up to anything but their make-believe radio."
via:nelson  SEO  search-engines  optimization  Google  cargo-cult  analogy  Merlin-Mann  web-analytics  analytics  marketing 
october 2007 by Vaguery
Science Direct-ly into Google
"Both information seekers and publishers bear the responsibility of remembering that the Lens of Google through which we increasingly seek the world is only one lens, albeit one with further and further vision."
elsevier  Google  ScienceDirect  publishing  search-engines  googlescholar  academia  web2.0 
july 2007 by Vaguery
XPoogle - an Agile search tool
Collaborative special-topic Google search subsetting engine
Google  extreme-programming  XP  search-engines  hacking  social-networks  archive  idea  web2.0 
june 2007 by Vaguery
MeLCat, the Michigan eLibrary Catalog and Resource Sharing System
The new "interlibrary loan" search /catalog aggregator system for Michigan libraries.
via:vielmetti  Michigan  local  Ann-Arbor  library2.0  archive  catalog  books  search-engines 
april 2007 by Vaguery
WorldCat Identities
New (beta) WorldCat search interface: type an author name, and see a list of hits ranked by the number of holdings in all cataloged libraries.
worldcat  book-search  library  library2.0  web2.0  search-engines  authors  worldcat-identities 
february 2007 by Vaguery

related tags

academia  academic  administration  advertising  aggregation  algorithms  analogy  analytics  anecdote  Ann-Arbor  antiquarian  archive  archiving  artificial-intelligence  artist  authors  bibliography  bibliomania  blogging  book  book-search  books  bookselling  branding  business  business-model  business-plan  cargo-cult  catalog  citation  classification  clustering  collections  competition  convergence  crowdmining  cultural-norms  culture  data  data-aggregation  data-analysis  data-mining  database  del.icio.us  design  digitization  disintermediation-targets  education  elsevier  extinction  extreme-programming  findability  futurism  geek-port  GNU  google  googlescholar  hacking  heuristics  idea  images  indexing  influence  innovation  institutional-design  intellectual-property  intelligence-gathering  interactive  it's-people  journals  law  library  library2.0  linguistics  lists  local  machine-learning  mailing-lists  malware  marketing  media  Merlin-Mann  metrics  Michigan  monoculture  multiobjective-optimization  natural-language  natural-language-processing  network-culture  network-theory  networks  NLP  nudge-targets  OCR  open-access  open-archives  open-source  openness  opportunity  optimization  Paglo  pedagogy  performance  photography  programming  proper-names  public-policy  publishing  ranking  rap  reference  reputational-revenue  research  risk  ScienceDirect  search-algorithms  search-engines  security  SEO  sexism  shopping  social-dynamics  social-networks  specialization  statistics  strings  susan-blackmore-comes-to-mind  system-administration  tagging  teaching  technology  text-mining  theme  tools  trademark  transparency  union-catalog  via:hrheingold  via:nelson  via:vielmetti  visualization  web  web-analytics  web-design  web2.0  wikipedia  wordpress  worldcat  worldcat-identities  XP 

Copy this bookmark:



description:


tags: