Coverage of ApacheCon North America 2011, 6th-11th November 2011 | Lanyrd
november 2011 by donturn
audio & presentations from ApacheCon North America 2011
apache
opensource
lucene
solr
hadoop
november 2011 by donturn
Apache OpenNLP - Welcome to Apache OpenNLP
november 2011 by donturn
OpenNLP is an organizational center for open source projects related to natural language processing. Its primary role is to encourage and facilitate the collaboration of researchers and developers on such projects.
OpenNLP also hosts a variety of java-based NLP tools which perform sentence detection, tokenization, pos-tagging, chunking and parsing, named-entity detection, and coreference using the OpenNLP Maxent machine learning package
apache
java
nlp
opensource
OpenNLP also hosts a variety of java-based NLP tools which perform sentence detection, tokenization, pos-tagging, chunking and parsing, named-entity detection, and coreference using the OpenNLP Maxent machine learning package
november 2011 by donturn
Apache Tika - Apache Tika
november 2011 by donturn
The Apache Tika™ toolkit detects and extracts metadata and structured text content from various documents using existing parser libraries.
apache
java
lucene
metadata
parser
november 2011 by donturn
Apache UIMA - Apache UIMA
november 2011 by donturn
Unstructured Information Management applications are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at.
UIMA enables applications to be decomposed into components, for example "language identification" => "language specific segmentation" => "sentence boundary detection" => "entity detection (person/place names etc.)". Each component implements interfaces defined by the framework and provides self-describing metadata via XML descriptor files. The framework manages these components and the data flow between them. Components are written in Java or C++; the data that flows between components is designed for efficient mapping between these languages.
apache
framework
java
nlp
opensource
UIMA enables applications to be decomposed into components, for example "language identification" => "language specific segmentation" => "sentence boundary detection" => "entity detection (person/place names etc.)". Each component implements interfaces defined by the framework and provides self-describing metadata via XML descriptor files. The framework manages these components and the data flow between them. Components are written in Java or C++; the data that flows between components is designed for efficient mapping between these languages.
november 2011 by donturn
Apache Rave
november 2011 by donturn
Apache Rave is a new web and social mashup engine. It will provide an out-of-the-box as well as an extendible lightweight Java platform to host, serve and aggregate (Open)Social Gadgets and services through a highly customizable and Web 2.0 friendly front-end. Rave is targeted as engine for internet and intranet portals and as building block to provide context-aware personalization and collaboration features for multi-site/multi-channel (mobile) oriented and content driven websites and (social) network oriented services and platforms. For the OpenSocial container and services the (Java) Apache Shindig will be integrated. At a later stage further generalization is envisioned to also transparently support W3C Widgets using Apache Wookie.
apache
opensource
mashup
november 2011 by donturn
Welcome to Apache Pig!
september 2011 by donturn
Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.
apache
data
hadoop
mapreduce
opensource
sql
database
datascience
september 2011 by donturn
Apache Flume
september 2011 by donturn
Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Its main goal is to deliver data from applications to Apache Hadoop's HDFS. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. The system is centrally managed and allows for intelligent dynamic management. It uses a simple extensible data model that allows for online analytic applications.
opensource
apache
logs
datascience
kdd
data
september 2011 by donturn
Apache Solr 3.1 Cookbook Book & eBook | Packt Publishing Technical & IT Book and eBook Store
september 2011 by donturn
Apache Solr 3.1 Cookbook looks great
search
apache
opensource
lucene
solr
september 2011 by donturn
related tags
ajax ⊕ apache ⊖ blog ⊕ crawler ⊕ data ⊕ database ⊕ datascience ⊕ filtering ⊕ framework ⊕ hadoop ⊕ ical ⊕ indexing ⊕ information_retrieval ⊕ ir ⊕ java ⊕ kdd ⊕ logs ⊕ lucene ⊕ mapreduce ⊕ mashup ⊕ metadata ⊕ nlp ⊕ opensource ⊕ parser ⊕ pkm ⊕ rstats ⊕ search ⊕ server ⊕ solr ⊕ spam ⊕ spider ⊕ sql ⊕ statistics ⊕ web ⊕ webdav ⊕Copy this bookmark: