mcroydon + search   156

Changing Bits: Lucene's FuzzyQuery is 100 times faster in 4.0
"There are many exciting improvements in Lucene's eventual 4.0 (trunk) release, but the awesome speedup to FuzzyQuery really stands out, not only from its incredible gains but also because of the amazing behind-the-scenes story of how it all came to be."
lucene  solr  apache  java  search 
april 2011 by mcroydon
A fast, fuzzy, full-text index using Redis | PlayNice.ly
I really love Redis' primitive types and operations on them.
lucene  rails  redis  search  semanticweb  python 
may 2010 by mcroydon
Lucid Imagination » State of Spatial Support in Apache Solr
Native spatial search in Solr is getting *really close*.
geo  lucene  search  solr  spatial 
march 2010 by mcroydon
Deduplication - Solr Wiki
Allowing solr to spot dupes in exact match or near match scenarios.
hash  information_retrieval  lucene  java  programming  search  solr  deduplication 
march 2010 by mcroydon
Lucid Imagination » The Seven Deadly Sins of Solr
An amusing take on the many things that you might be doing wrong.
search  performance  apache  tips  lucene  solr  ir 
january 2010 by mcroydon
toastdriven's queued_search at master - GitHub
Daniel knocks out a reusable app that makes it easy to have Haystack queue updates and deletes so that you get the benefits of near-realtime updates (depending on how you cron up the consumer) without the delays on object save. If you're using Haystack in production, you probably want to be using this.
haystack  search  queues  python  django  solr 
january 2010 by mcroydon
zoie - Project Hosting on Google Code
Built on top of Apache Lucene and focuses on solving several problems around real-time search and indexing performance.
programming  development  google  library  opensource  free  java  search  architecture  tech  apache  db  lucene  solr  project  indexing  realtime  index  fulltext  mq  linkedin  zoie 
november 2009 by mcroydon
Training to Climb an Everest of Digital Data
Big data is big and almost always requires a completely different mindset than the one that is taught in computer science programs.
data  database  processing  google  news  toread  ibm  energy  datasets  mining  search  research  science  internet  algorithms  storage  scaling  education  hadoop  analysis  computer-science  datacuration 
october 2009 by mcroydon
[#SOLR-773] Incorporate Local Lucene/Solr - ASF JIRA
It looks like LocalSolr is tentatively slated to land in Solr 1.5.
gis  search  geo  map  lucene  geography  solr 
august 2009 by mcroydon
A Comparison of Open Source Search Engines
A look at Lucene, Zettair, Sphinx, SQLite, and Xaipan and some benchmarks.
database  opensource  search  comparison  lucene  engine  oss  benchmark  searchengine  searchengines 
july 2009 by mcroydon
Bill Katz
MIT-licensed full-text search on Google App Engine.
web  django  development  library  engine  search  indexing  resource  appengine  model  whoosh  gae  foss 
july 2009 by mcroydon
Welcome! - gae-search / full-text search on App Engine
Interesting full-text-search engine built on top of Google App Engine as a commercial venture.
python  web  django  google  search  text  appengine  gae 
july 2009 by mcroydon
Read It: Search User Interfaces
"This book presents the state of the art of search interface design, based on both academic research and deployment in commercial systems." Full text available online
search  interface  ui  design 
june 2009 by mcroydon
NUCULAR fielded text searchable indexing: Documentation
Another lightweight Python full-text search engine with a silly if not unfortunate name.
python  software  django  development  database  tools  library  api  xml  opensource  search  text  application  lucene  db  concurrency  indexing  solr  oss  ir  searchengine  fulltext  whoosh  nucular 
june 2009 by mcroydon
mcroydon's django-tumbleweed at master - GitHub
Tumbleweed is essentially a framework for writing your own tumblelog using data denormalized in Haystack. It leans heavily on Haystack and the underlying search backend and is currently only recommended to be used with Solr.
django  tumblelog  tumble  haystack  solr  search 
june 2009 by mcroydon
katta - distributed lucene
"Katta serves large, replicated, Lucene indexes as shards to serve high loads and very large data sets."
software  development  java  search  scalability  performance  scaling  distributed  apache  hadoop  clustering  cloud  grid  lucene  tool  indexing  project  ir  searchengine  information-retrieval  dist  package  hdfs  katta 
june 2009 by mcroydon
Cloudera's Distribution for Hadoop | Cloudera
Includes lots of feature tickets that are pretty stable but not yet in a Hadoop release. It reminds me a lot of Debian unstable or Ubuntu a month or so before release. Good stuff indeed. Includes RPM and APT package management options.
software  data  linux  google  search  aws  distributed  computing  ec2  hadoop  cloud  mapreduce  cloudcomputing  distribution  clusters  packaging  cloudera  cloud-computing  rpm  apt 
june 2009 by mcroydon
Apache Mahout - Taste Documentation
Collaborative filtering as part of the Mahout project. Also includes a web services interface for interfacing with non-Java stuff.
software  java  search  algorithm  cluster  apache  mapreduce  machinelearning  engine  filtering  recommendation  webservice  recommendations  mahout 
april 2009 by mcroydon
Scaling Lucene and Solr | Lucid Imagination
Wow, I thought I had seen all of the goodies on the lucid imagination site. Nice find, Joseph.
programming  development  linux  java  search  sysadmin  scalability  scaling  article  performance  optimization  lucene  solr  oss  ir  information-retrieval  tomcat  tuning 
march 2009 by mcroydon
KMWorld.com: Designing for faceted search
Some thoughts when designing for faceted search.
design  search 
march 2009 by mcroydon
duetopia - Google Code
Interesting, though I don't know exactly what it all means.
geo  gis  search  metadata  geodjango 
february 2009 by mcroydon
« earlier      

related tags

3d  aa  academic  academics  acquisition  advice  aggregation  aggregator  ai  airlines  airplane  airport  ajax  algo  algorithims  algorithm  algorithms  alpha  amazon  america  analysis  analytics  analyzers  aol  apache  api  apis  app  appengine  application  apt  architecture  archive  archives  article  articles  audio  audiobooks  autocomplete  autosuggest  awesome  aws  bar  barcode  barcodes  basics  benchmark  bestpractices  big  bigdata  bigtable  binary  bioinformatics  bit  bitmap  blog  book  bookmarking  bookmarks  books  boss  browser  business  businessmodels  c  c++  cache  caching  caffeine  census  cities  ckan  classification  clientside  clothes  clothing  cloud  cloud-computing  cloudcomputing  cloudera  cluster  clustering  clusters  cms  code  codes  coding  collaboration  collaborative  collective  colocation  colossus  columndb  combinatorics  community  comparison  compress  compression  compsci  computer  computer-science  computers  computerscience  computervision  computing  concurrency  conference  conferences  contacts  content  conversion  cool  cooliris  copyright  corpus  couchdb  cover  craigslist  cs  cuecat  daniellindsleyrocksdahouse  data  data-mining  data-structure  data-structures  database  databases  datacuration  datamining  dataset  datasets  datastructure  datastructures  data_mining  data_structure  db  dbpedia  dbs  deduplication  dekstop  del.icio.us  delicious  demographics  deployment  design  designpatterns  dev  development  digital  directory  dist  distance  distributed  distribution  dj  django  django-apps  doc  document  download  e-books  e-commerce  ean  ebook  ebooks  ec2  ecommerce  education  elastic  elasticmapreduce  elasticsearch  ellington  emacs  emr  encyclopedia  energy  engine  english  entity-extraction  erlang  etexts  eval  example  examples  experience  experiments  explanation  extension  extensions  extraction  faceted  facetedsearch  facets  family  fashion  fast  fastbit  fcc  feedster  ferrett  filesystem  filesystems  filtering  find  fingerprint  firefox  flickr  flight  flights  flv  food  foss  framework  free  freebase  friend  friends  fte  fulltext  fun  function  functional  future  gadgets  gae  gallery  game  garden  gate  gateway  geek  genealogy  geo  geocoding  geodjango  geography  geolocation  geometry  georss  geospatial  geowanking  gfs  gfs2  gis  gnome  google  gov  government  gplv3  grammar  graph  graphics  grid  groonga  guide  guidelines  ha  hacking  hadoop  hash  hashing  haystack  hdfs  headlines  help  hip-hop  hiphop  history  hivemind  hosting  house  howto  html  http  hyperlocal  ia  iaas  ibm  ils  image  images  imaging  import  imported  index  indexing  inex  information  information-extraction  information-retrieval  informationretrieval  information_extraction  information_retrieval  innovation  inspiration  instant  integer  integers  intelligence  interesting  interface  internet  ip  ir  jacob  java  javascript  jellyroll  journalism  js  json  jsonp  katta  knowledge  labs  language  language:c  LBS  learning  libraries  library  libs  license:  license:gpl  license:lgpl  license:mit  license:zpl  like  lingpipe  linguistics  linkeddata  linkedin  linux  list  literature  local  location  logs  lsa  lucene  machine  machine-learning  machinelearning  mahout  manning  map  map-reduce  mapping  mapreduce  maps  market  marketplace  markup  mashup  matching  math  mathematics  maths  media  memcache  metadata  metaweb  metric  metrics  microsoft  mining  mobile  model  mono  morelikethis  motion  mowser  mp3  mq  mrjob  multidimensional  multimedia  music  mysql  natural  natural-language  navigation  netflix  network  networking  networks  news  newsmedia  newspaper  newspapers  newyorktimes  nlp  nosql  nucular  number  numbers  nyt  nytimes  olap  online  onlinejournalism  ontology  open  open-source  openaccess  opencontent  opendata  openknowledge  opensource  open_source  optimisation  optimization  optimize  oss  p2p  package  packaging  pagerank  palm  paper  papers  parser  parsing  pathfinding  pda  pdf  people  perceptual  perceptualhash  performance  phash  phone  phones  photo  photography  photos  physics  pig  planes  planning  platform  plugin  plugins  population  porn  portal  pos  postgresql  prediction  presentation  presentations  press  preview  processing  product  profiling  programming  project  projects  publishing  puzzle  python  quadtree  query  queues  rails  ranking  rdbms  rdf  read-later  reading  realtime  recommendation  recommendations  redis  reduce  reference  regex  related  remix  replace  repository  research  resource  resources  rest  restful  reviews  robotics  robots  roogle  ror  rpm  rss  ruby  ruby-on-rails  rubyonrails  s3  safari  sample  samples  scala  scalability  scale  scaling  science  scripps  script  search  search-engine  searchengine  searchengines  searching  semantic  semantic-web  semanticweb  semantic_web  seo  sequence  sequences  series  series-60  server  service  shoes  shop  shopping  similar  similarity  simpledb  slide  slides  slideshare  small  sms  SOAP  social  socialgraph  socialmedia  socialnetworks  socialsoftware  software  solr  songs  sound  source  sparql  spatial  speed  spellcheck  spelling  sphinx  sql  standards  startup  startups  statistics  stats  storage  stories  storytelling  string  strings  structure  structures  svd  symbian  sysadmin  tagging  taxonomy  tech  technology  telephone  tentacle  tentacle-porn  term  termextract  termextraction  text  text-mining  textmining  texts  theory  timeline  times  tips  to-read  todo  tokenizer  tokyo  tokyocabinet  tomcat  tool  tools  topia  toread  travel  tree  trees  trend  trie  tumble  tumblelog  tuning  tutorial  tutorials  twitter  ucc  ui  upc  usability  users  utilities  ux  via:chl  via:jacobian  video  virtualization  visual  visualization  vps  wap  washingtonpost  web  web-services  web2.0  web3.0  webdesign  webdev  weblogs  webservice  webservices  whoosh  wiki  wikipedia  wolfram  wolframalpha  work  xaipan  xhtml  xml  yahoo  yelp  youtube  yui  zoie  全文検索 

Copy this bookmark:



description:


tags: