rybesh + search   218

The Metadata is the Interface: Better Description for Better Discovery of Archives and Special Collections, Synthesized from User Studies
This essay—part of a series of OCLC Research projects to mobilize unique materials synthesizes evidence of what descriptive information people say they need for research.
userresearch  metadata  interface  search  specialcollections  archives 
15 days ago by rybesh
JSTOR: The Journal of Modern History, Vol. 84, No. 1 (March 2012), pp. 116-144
by using multiple databases and keyword variants, the historian may gain confidence in a particular chronological intervention. Large databases, the result of scanned microfilm collections or mass digitization initiatives across multiple libraries, provide enough texts to bridge generation and genre, incorporating authors from a variety of backgrounds. Sheer number of texts is important here: ECCO indexes 200,000 works from eighteenth- and nineteenth-century Britain with 33 million pages of text; Google Books Search has 42 million books from all periods. If the historian’s goal is to show a shift in common word usage, the size of a database is more important than its genre specificity; in the case examined in the present article, for instance, Google Book Search and ECCO were superior to the available poetry databases. Iterative visitation of multiple databases provided another potential source of richness for extracting meaning from these tools.
textanalysis  search  digitalhumanities 
21 days ago by rybesh
Personal Assistants for Everyone - Fancy Hands
Fancy Hands is a team of personal assistants ready to work for you right now. You should focus on what's important, let us focus on the rest.
search  research  IR 
february 2012 by rybesh
mattweber/elasticsearch-mocksolrplugin - GitHub
This plugin will allow you to use tools that were built to interact with Solr with ElasticSearch.
solr  search  tools 
december 2011 by rybesh
The Effects of Racial Animus on Voting: Evidence Using Google Search Data
Traditional surveys struggle to capture socially unacceptable attitudes such as racial
animus. This paper uses Google searches including racially charged language as a proxy
for a local area’s racial animus. I use the Google-search proxy, available for roughly
200 media markets in the United States, to reassess the impact of racial attitudes on
voting for a black candidate in the United States. I compare an area’s racially charged
search volume to its votes for Barack Obama, the 2008 black Democratic presidential
candidate, controlling for its votes for John Kerry, the 2004 white Democratic presidential candidate. Other studies using a similar empirical specification and standard
state-level survey measures of racial attitudes yield little evidence that racial animus
had a major impact in recent U.S. elections. Using the Google-search proxy, I find
significant and robust effects in the 2008 presidential election. The estimates imply
that racial animus in the United States cost Obama three to five percentage points in
the national popular vote in the 2008 election.
statistics  socialscience  methods  search 
november 2011 by rybesh
DocumentCloud's VisualSearch.js
VisualSearch.js enhances ordinary search boxes with the ability to autocomplete faceted search queries. Specify the facets for completion, along with the completable values for any facet. You can retrieve the search query as a structured object, so you don't have to parse the query string yourself.
faceted  search  javascript 
november 2011 by rybesh
Case Study: Contextual Search for Volkswagen and the Automotive Industry
In summary the key benefits of using Semantic Web technology for Volkswagen were as follows:

A standardised interface to data and content, accessible to developers with different skillsets, using different technologies within and without the organisation.
Separation of concerns between information and application, both logically and physically.
Increases value, reusability and accessibility of data.
Very powerful federation features.
Adoption and use didn't necessitate process or change management. It could be leveraged at any stage within the product lifecycle painlessly and gracefully, both internally and externally.
semweb  linkeddata  search  inls520  metadata 
october 2011 by rybesh
Sapping Attention: Bookworm and library search
4) Organize the library according to your personal principles, and browse it from arbitrary points.

This is where we need to go. Bookworm presents one set of ways for reordering the library based on the principle that language is constrained by the fields of its utterance--geographical (publication place), disciplinary (LC classification), temporal (publication year), even autobiographical (author age). The line chart that a search creates is a representation of overall trends; but it is also, taken point by point, an enormous collection of books. If you search for a term by author age and publication place, Bookworm is reordering the collection of the Open Library (a lot of it, anyway) into chunks divided by author age and place, showing you information about each one of those chunks, and inviting you to dive into a particular one to find the books matching your term.
search  organization  inls520 
september 2011 by rybesh
Grep the Web
Submit a series of strings or patterns and we will show you the urls on which they appear (in rank order).
search  tools 
september 2011 by rybesh
elasticsearch - tutorials - CouchDB Integration
This tutorial explains the process of setting up ElasticSearch to automatically index data
in CouchDB and make it search-able.
couchdb  elasticsearch  search 
august 2011 by rybesh
Nipster!
npm registry search using github stats for ranking.
nodejs  npm  search 
august 2011 by rybesh
elasticsearch - guide - Attachment Type
he attachment type allows to index different “attachment” type field (encoded as base64), for example, microsoft office formats, open document formats, ePub, HTML, and so on (full list can be found here).
elasticsearch  search  reference  pdf 
august 2011 by rybesh
elasticsearch - guide - Search API - Facets
Facets provide aggregated data based on a search query. In the simple case, a facet can return facet counts for various facet values for a specific field. ElasticSearch supports more advanced facet implementations, such as statistical or date histogram facets.
faceted  search  api  howto 
july 2011 by rybesh
elasticsearch - tutorials - Attachment Type in Action
This tutorial will walk you through basic attachment type setup and use in search including highighting. (How to use elasticsearch to index PDFs and other file types.)
indexing  search  howto  pdf 
july 2011 by rybesh
Apache Tika - Apache Tika
The Apache Tika™ toolkit detects and extracts metadata and structured text content from various documents using existing parser libraries.
lucene  metadata  search  pdf 
july 2011 by rybesh
Price-is-Right Binary Search (for Suffix Arrays of Documents) « LingPipe Blog
Suffix arrays are useful if you’re looking for anything from plagiarized passages in a pile of writing assignments, cut-and-paste code blocks in a large project, or just commonly repeated phrases on Twitter.
search  textanalysis  textmining 
june 2011 by rybesh
Python Package Index : python-Levenshtein 0.10.2
Python extension computing string distances and similarities.
python  textanalysis  search 
may 2011 by rybesh
Reverted Indexing
Traditional interactive information retrieval systems function by creating inverted lists, or term indexes. For every term in the vocabulary, a list is created that contains the documents in which that term occurs and its frequency within each document. Retrieval algorithms then use these term frequencies alongside other collection statistics to identify matching documents for a query.

Term-based search, however, is just one example of interactive information seeking. Other examples include offering suggestions of documents similar to ones already found, or identifying effective query expansion terms that the user might wish to use. More generally, these fall into several categories: query term suggestion, relevance feedback, and pseudo-relevance feedback.

We can combine the inverted index with the notion of retrievability to create an efficient query expansion algorithm that is useful for a number of applications, such as query expansion and relevance (and pseudo-relevance) feedback. We call this kind of index a reverted index because rather than mapping terms onto documents, it maps document ids onto queries that retrieved the associated documents.
IR  tools  search  lucene 
march 2011 by rybesh
Living Knowledge : Home
Knowledge and its articulations are strongly influenced by diversity in, e.g., cultural backgrounds, schools of thought, geographical contexts. Judgements, assessments and opinions, which play a crucial role in many areas of democratic societies, including politics and economics, reflect this diversity in perspective and goals. For the information on the Web (including, e.g., news and blogs) diversity - implied by the ever increasing multitude of information providers - is the reason for diverging viewpoints and conflicts. Time and evolution add a further dimension making diversity an intrinsic and unavoidable property of knowledge.
news  search  research  time  knowledge  europe 
march 2011 by rybesh
Time Explorer
Welcome to the Time Explorer, an application designed for analyzing how news changes over time. Time Explorer extends upon current time-based systems in many important ways. First, Time Explorer is designed to help users discover how entities such as people and locations associated with a query change over time. Second, by searching on time expressions extracted automatically from text, the application allows the user to explore not only how topics evolved the past, but also how they will continue to evolve in the future.
time  history  news  search  interface 
march 2011 by rybesh
elasticsearch - - Open Source, Distributed, RESTful, Search Engine
It is an Open Source (Apache 2), Distributed, RESTful, Search Engine built on top of Lucene.
search  ir  tools  rest  java  json 
february 2011 by rybesh
FactForge.net
FactForge represents a reason-able view to the web of data. It aims to allow users to find resources and facts based on the semantics of the data, like web search engines index WWW pages and facilitate their usage.
semweb  facts  search 
october 2010 by rybesh
Library Juice » A Google trick for staying ahead of AI
Increasing use of AI means smarter-than-average searchers constantly need to learn tricks in order to counteract the AI that assumes a user base of average consumers.
search  interface  Information_Ethics  Technology 
september 2010 by rybesh
Training Examples Q&A - machine learning, natural language processing, artificial intelligence, text analysis, information retrieval, search, data mining, statistical modeling, and data visualization
Where data geeks ask and answer questions on machine learning, natural language processing, artificial intelligence, text analysis, information retrieval, search, data mining, statistical modeling, and data visualization!
ai  machinelearning  nlp  textanalysis  ir  datamining  search  statistics  infoviz  reference 
june 2010 by rybesh
TuQS
Turnguard's QuadStore is the first draft of an own implementation of a QuadStore with main focus on data-retrieval speed. Implements full-text search.
triplestore  search  database  semweb  tools 
march 2010 by rybesh
Digital Search II: A User Perspective on Database Design « Easily Distracted
"Rather than moving towards amalgamation and interoperability across databases, you really get the sense that everybody’s been busy grabbing at whatever piles of text they can lay their hands on, building the biggest little mudhill they can manage to put up, and then building walls around it. There are interstitial services that help a user 'jump' from one little fragmented collection to another and portals that aim to be a 'top level' to return to, sure, but we should be doing better by now."
search  database  interface  scholarship  library  context  contextfinder  usability 
december 2009 by rybesh
Anatomy of a Search « Easily Distracted
"Over Thanksgiving weekend, I had a great search experience that I think is worth laying out here, because it captures three of the key dimensions of digital search."
search  strategy  history  scholarship 
december 2009 by rybesh
RDF Aggregates and Full Text Search
Perform full text searches, filtered by types that are inferred.
rdf  database  search  semweb  tools  howto 
may 2009 by rybesh
Haystack - Search for Django
Haystack provides modular search for Django. It features a unified, familiar API that allows you to plug in different search backends (such as Solr, Whoosh, etc.) without having to modify your code.
django  python  search  framework  searchengine  tools 
april 2009 by rybesh
Information Retrieval Gupf » Retrievability
Popularity bias (”the inherent democratic nature of the web”) might actually prevent more information from ever being seen, because it never appears at the top of anyone’s query list.
IR  critique  search  analysis 
april 2009 by rybesh
imgSeekCmd User Guide
Content-based image search. The searching algorithm makes use of multiresolution wavelet decomposition of the query and database images.
image  search  tools  contentanalysis  python 
april 2009 by rybesh
Cuil - Cuil Blog: Launching Timelines
We make it easy to explore the events in the timeline, just move your mouse over an event and a pop-up will appear with a longer description and a link to search for related pages. Beyond people, timelines can be a useful tool for displaying information about a period in history, such as the Great Depression. Or a famous sports arena, like Madison Square Garden. Or, say, the highest bridge in the world, the Millau Viaduct.
events  timeline  infoviz  search  interface 
april 2009 by rybesh
Code: Flickr Developer Blog » Building Fast Client-side Searches
Fetching the data using a dynamically generated script tag... the difference in performance was shocking.
search  javascript  performance  webservices  ajax  ui  json 
march 2009 by rybesh
django-springsteen and Distributed Search @ Irrational Exuberance
Provides a trivial wrapper for Yahoo! BOSS, but goes further and provides a simple framework for building distributed search networks.
search  python  django  yahoo  distributed 
february 2009 by rybesh
SRU/CQL Standardization in OASIS
The premise behind dynamic bindings is that any search engine, even one that existed prior to development of the standard, need only to provide a dynamic binding - a self-description. It need make no other changes in order to be accessible. A client will be able to access any search engine that provides a description, if only it implements the capability to read and interpret the description and use it to formulate a request (including a query) and interpret the response.
metadata  search  standards  webservices  IR 
january 2009 by rybesh
Search Web Services - The OASIS SWS Technical Committee Work: The Abstract Protocol Definition, OpenSearch Binding, and SRU/CQL 2.0
The OASIS Search Web Services Technical Committee is developing search and retrieval web services, integrating various approaches under a unifying model, an Abstract Protocol Definition.
metadata  search  standards  webservices  IR 
january 2009 by rybesh
[whatwg] Trying to work out the problems solved by RDFa
It would seem important that the Web easily enable small-time users of data to efficiently communicate with one another, without the need to have one of the giants as an intermediary.
opinion  semweb  rdfa  metadata  architecture  search  web  webinfo 
january 2009 by rybesh
SourceForge.net: SeerSuite
SeerSuite is an application toolkit for digital libraries and search engines; i.e., CiteSeerX.
tools  opensource  extraction  bibliography  java  search 
december 2008 by rybesh
The Laboratorium: Principles and Recommendations for the Google Book Search Settlement
I hope that these recommendations will prove equally appealing to those who think that Google can do no evil and those who think it does only evil. Perhaps they will prove equally frustrating. The settlement is good as it stands, but it could stand to be better.
google  books  search  law  policy 
november 2008 by rybesh
Services
Open GUID consists of the following services to manage web identity: Finding existing unique web identifiers. Establishing new unique web identifiers. Associating legacy identifiers with an open one. Registering identical classes and instances in web ontologies.
semweb  identity  registry  search  tools  webservices 
september 2008 by rybesh
FAQ/FindSimilar – Xapian – Trac
How can I implement a "find documents like this one" feature?
search  reference  howto 
july 2008 by rybesh
Multicolr Search Lab - Idée Inc.
We extracted the colours from 3 million “interesting” Flickr images. Using our visual similarity technology you can navigate the collection by colour.
color  image  search  webservices  visualweb 
july 2008 by rybesh
Wikipedia + Lucene's MoreLikeThis = useful bits about the bits?
Proof-of-concept based on vacuuming every Wikipedia article into the Lucene open source search engine to build a text categorisation tool prototype.
wikipedia  search  categorization  ideas 
june 2008 by rybesh
ClioPatria semantic search web-server
It joins the SWI-Prolog RDF and HTTP infrastructure with a SeRQL/SPARQL query engine, interfacing to the The Yahoo! User Interface Library (YUI) and libraries that support semantic search.
semweb  tools  search  interface  api  prolog  opensource 
may 2008 by rybesh
E-Culture MultimediaN - cultural heritage search
This cultural search engine will give you access to artworks from several museum collections.
culture  museum  multimedia  semweb  search  research  CWI 
april 2008 by rybesh
A Semantic Multimedia Web: Create, Annotate, Present and Share your Media
We consider the use of Semantic Web technologies for improving the multimedia user experience on the Web.
multimedia  semweb  annotation  metadata  search  editing  research  CWI 
april 2008 by rybesh
Outgoing: OpenURL: The Ministry of Silly Names
A request to any web server in existence can be modeled in the simplest of terms: what, who, where, why, when, and how.
library  hypermedia  web  standards  search 
april 2008 by rybesh
page-store.com
Page-store positions itself as a web wholesaler, supplying page and link information to vertical search engine companies on a per-use basis.
web  search  business 
april 2008 by rybesh
Whimsley: Mr. Google's Guidebook
Mr. Google is lying! His Guidebook no longer reflects the paths set out by travellers as they navigate their lives. It is no longer an outside observer of people's wanderings.
search  humor  fiction  mystery 
march 2008 by rybesh
RubyForge: Conveyor: Project Info
Conveyor can be used like an application-agnostic version of MySQL binlogs, which can be replayed to write data into multiple, diverse data stores.
web  data  architecture  code  ruby  database  search 
february 2008 by rybesh
Apache UIMA - Apache UIMA
The Unstructured Information Management Architecture (UIMA) is an architecture and software framework for creating, discovering, composing and deploying a broad range of multi-modal analysis capabilities and integrating them with search technologies.
extraction  recognition  architecture  tools  java  datamining  search 
february 2008 by rybesh
Ask About Ireland: Media Bank
More than 7,000 items relating to Irish culture and heritage.
ireland  media  image  database  culture  neh2007  search 
february 2008 by rybesh
Dawid Weiss
Text clustering, information retrieval, web mining, text processing, NLP.
people  academia  poland  search  datamining  nlp  machinelearning 
november 2007 by rybesh
Conditions for the Digital Library of Alexandria
To the extent it or other search engines limit access to parts of their index, their public-spirited defenses of their archiving and indexing projects are suspect.
books  digitization  infrastructure  copyright  law  fairuse  archives  search  policy  ideas 
november 2007 by rybesh
OASIS Specification Template: Search Web Services
The Search web service is a means of opening a database to external enquiry in a standardized manner that facilitates discovery of query and response possibilities and makes it possible for heterogeneous databases to be queried simultaneously with the sam
search  webservices  standards 
november 2007 by rybesh
Google Custom Search Engine - Site search and more
With Custom Search Engine, you can harness the power of Google to create a search engine tailored to your needs.
custom  search  tools 
september 2007 by rybesh
Goddard DAAC Air Pollution Event Search Tool
Querying by location, time and/or pollutant concentration is useful for researchers and others familiar with air quality data. But for people not familiar with what PM2.5 is or the significance of 65 ug/m3, searching by event type is useful.
events  search  interface 
july 2007 by rybesh
First International Workshop on Cultural Heritage on the Semantic Web
The objective of the workshop is to bring together researchers from the Semantic Web field and cultural heritage professionals to discuss the digitalization, annotation, archiving, and retrieval of our cultural heritage in all its forms.
semweb  museum  archives  annotation  search  conference  2007  korea 
july 2007 by rybesh
Google Changing Its Tune
“I don’t think we’re ideologically bound to only computers, only algorithms,” Mr. Cutts said. In fact, he said, Google has combed through its own Web pages to remove all references to “automatic ranking.”
search  ideology  google 
june 2007 by rybesh
The VRE project page
Liverpool project to integrate the Fab4 (Multivalent) browser, the Cheshire XML search engine, and the Kepler (Ptolemy) workflow engine.
web  annotation  xml  search  digital  library  architecture 
june 2007 by rybesh
The Times Morgue Packs Up and Ships Out
The clips convey information that the searcher may not have known to look for—often simply through the layout and typeface, which an engine such as Nexis doesn’t preserve.
newspaper  archives  visualmedia  design  semantics  interface  search  digital 
may 2007 by rybesh
clipartbrowser - Google Code
A browser for locally stored clipart that allows images to be searched for and imported from the open clipart library at http://www.openclipart.org.
python  clipart  search  code  tools 
march 2007 by rybesh
Inside CDL: eXtensible Text Framework (XTF)
The CDL eXtensible Text Framework (XTF) is a flexible indexing and query tool that supports searching across collections of heterogeneous data and presents results in a highly configurable manner.
library  search  code  tools  opensource 
march 2007 by rybesh
Google CEO: Media divided over online video
Traditional media argue their content has a certain intrinsic value, while Google says "prove it," he said. "That's often a difficult conversation."
media  business  search  economics  video 
march 2007 by rybesh
Google and the books
Can we say it was a mistake? For it was a mistake.
library  archives  books  search  metadata  manifesto 
march 2007 by rybesh
shimenawa - Thoughts and presentations
As my host Michael Buckland observed, there is clarity in the counsel of our fundamentals: making information available, ensuring open access, assisting others in discovery, creating user-empowering tools and services.
library  information  opensource  search  organization  tools  webservices 
march 2007 by rybesh
NATIVE INSTRUMENTS: Traktor 3
Boasts direct integrated access to the Beatport Online Music Store, allowing you to browse their extensive catalogue, pre-listen and buy hot new tracks and download them directly into your library.
editing  tools  archives  search  library  music  remix 
february 2007 by rybesh
Kevin Kelly -- The Technium
The goal of lifelogging: to record and archive all information in one’s life.
biography  memory  archives  search  surveillance 
february 2007 by rybesh
CopySpace at iStockphoto.com
iStockphoto.com has a search engine that can sort images based on where you could place text or a logo.
design  search  image  analysis  advertising 
february 2007 by rybesh
Annotate the web, then rewire it « Jon Udell
The dominant way in which most people will “program” the web is by writing metadata, not code, and we’ll need an interface as friendly and powerful as Pipes to help them do that.
social  metadata  video  annotation  election  interface  database  search  semweb  tools  webservices 
february 2007 by rybesh
Library of Congress Authorities (Search for Name, Subject, Title and Name/Title)
Using Library of Congress Authorities, you can browse and view authority headings for Subject, Name, Title and Name/Title combinations; and download authority records in MARC format for use in a local library system.
archives  bibliography  books  catalogs  classification  government  library  metadata  reference  search 
february 2007 by rybesh
luke
Luke is a handy development and diagnostic tool, which accesses already existing Lucene indexes and allows you to display and modify their contents in several ways.
search  tools  java 
february 2007 by rybesh
plush
Plush is PyLUcene SHell to play with a Lucene indexes interactively.
search  tools  python  java 
february 2007 by rybesh
NRC - Context, Content and Community
We believe that dramatic breakthroughs can be gained through more holistic systems that consider the contexts and communities in which content is created and consumed.
community  content  mobile  locative  ubicomp  search  advertising 
january 2007 by rybesh
« earlier      

related tags

academia  acm  advertising  ai  ajax  analysis  annotation  api  architecture  archives  art  attention  audio  automatic  berkeley  bias  bibliography  biography  blog  books  brands  business  catalogs  categorization  cinema  classification  clipart  code  collaboration  collectiveaction  color  comics  commercial  community  computervision  conference  consulting  content  contentanalysis  context  contextfinder  convergence  copyright  couchdb  critique  culture  custom  CWI  dam  data  database  datamining  delivery  democracy  design  digital  digitalhumanities  digitization  distributed  django  documents  doi  economics  editing  education  EIND  elasticsearch  election  europe  events  extraction  faceted  facts  fairuse  fiction  filters  framework  genre  google  government  greece  hiphop  history  howto  humor  hypermedia  ideas  identity  ideology  image  incentives  indexing  information  Information_Ethics  infoviz  infrastructure  inls520  interface  internet  ir  ireland  japan  java  javascript  journalism  json  knowledge  korea  kr  labs  language  law  library  linkeddata  linking  locative  logic  lucene  lyrics  machinelearning  management  manifesto  marketing  markup  media  memory  metadata  methods  microformats  mit  mobile  money  movements  mp3  mpeg-7  msmdx  multimedia  multimodal  museum  music  mystery  narrative  neh2007  networking  news  newspaper  nlp  nodejs  npm  opensource  opinion  organization  osx  p2p  pdf  people  performance  perl  personalization  php  playlist  plugin  poland  policy  political  politics  prolog  python  quality  quantitative  quote  radio  rap  rdf  rdfa  recognition  reference  registry  remix  reputation  research  rest  ruby  sampling  scholarship  search  searchengine  semantics  semweb  services  sfbayarea  sharing  social  socialinformatics  socialscience  solr  specialcollections  speech  sports  SSMS2006  standards  statistics  storage  strategy  streaming  subtitle  summarization  surveillance  syndication  technology  textanalysis  textmining  theory  thessaloniki  time  timeline  timetags  tools  transparency  travel  trends  triplestore  trust  tv  ubicomp  ui  uk  unix  unmediated  usability  usecases  userresearch  video  vienna  visualmedia  visualweb  web  web2.0  webinfo  webservices  wikipedia  windows  xml  yahoo  YRB 

Copy this bookmark:



description:


tags: