donturn + opensource   155

Apache OpenNLP - Welcome to Apache OpenNLP
OpenNLP is an organizational center for open source projects related to natural language processing. Its primary role is to encourage and facilitate the collaboration of researchers and developers on such projects.

OpenNLP also hosts a variety of java-based NLP tools which perform sentence detection, tokenization, pos-tagging, chunking and parsing, named-entity detection, and coreference using the OpenNLP Maxent machine learning package
apache  java  nlp  opensource 
november 2011 by donturn
Apache UIMA - Apache UIMA
Unstructured Information Management applications are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at.


UIMA enables applications to be decomposed into components, for example "language identification" => "language specific segmentation" => "sentence boundary detection" => "entity detection (person/place names etc.)". Each component implements interfaces defined by the framework and provides self-describing metadata via XML descriptor files. The framework manages these components and the data flow between them. Components are written in Java or C++; the data that flows between components is designed for efficient mapping between these languages.
apache  framework  java  nlp  opensource 
november 2011 by donturn
Welcome to Chukwa!
Chukwa is an open source data collection system for monitoring large distributed systems. Chukwa is built on top of the Hadoop Distributed File System (HDFS) and Map/Reduce framework and inherits Hadoop’s scalability and robustness. Chukwa also includes a flexible and powerful toolkit for displaying, monitoring and analyzing results to make the best use of the collected data.
hadoop  monitor  logs  opensource 
november 2011 by donturn
Welcome to Hive!
Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.
data  database  hadoop  opensource  nosql 
november 2011 by donturn
Apache Rave
Apache Rave is a new web and social mashup engine. It will provide an out-of-the-box as well as an extendible lightweight Java platform to host, serve and aggregate (Open)Social Gadgets and services through a highly customizable and Web 2.0 friendly front-end. Rave is targeted as engine for internet and intranet portals and as building block to provide context-aware personalization and collaboration features for multi-site/multi-channel (mobile) oriented and content driven websites and (social) network oriented services and platforms. For the OpenSocial container and services the (Java) Apache Shindig will be integrated. At a later stage further generalization is envisioned to also transparently support W3C Widgets using Apache Wookie.
apache  opensource  mashup 
november 2011 by donturn
Welcome to Apache Pig!
Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.
apache  data  hadoop  mapreduce  opensource  sql  database  datascience 
september 2011 by donturn
Scribe - GitHub
Scribe is a server for aggregating streaming log data. It is designed to scale to a very large number of nodes and be robust to network and node failures. There is a scribe server running on every node in the system, configured to aggregate messages and send them to a central scribe server (or servers) in larger groups. If the central scribe server isn’t available the local scribe server writes the messages to a file on local disk and sends them when the central server recovers. The central scribe server(s) can write the messages to the files that are their final destination, typically on an nfs filer or a distributed filesystem, or send them to another layer of scribe servers.

Scribe is unique in that clients log entries consisting of two strings, a category and a message. The category is a high level description of the intended destination of the message and can have a specific configuration in the scribe server, which allows data stores to be moved by changing the scribe configuration instead of client code. The server also allows for configurations based on category prefix, and a default configuration that can insert the category name in the file path. Flexibility and extensibility is provided through the “store” abstraction. Stores are loaded dynamically based on a configuration file, and can be changed at runtime without stopping the server. Stores are implemented as a class hierarchy, and stores can contain other stores. This allows a user to chain features together in different orders and combinations by changing only the configuration.

Scribe is implemented as a thrift service using the non-blocking C++ server. The installation at facebook runs on thousands of machines and reliably delivers tens of billions of messages a day.
opensource  logs  datascience 
september 2011 by donturn
Apache Flume
Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Its main goal is to deliver data from applications to Apache Hadoop's HDFS. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. The system is centrally managed and allows for intelligent dynamic management. It uses a simple extensible data model that allows for online analytic applications.
opensource  apache  logs  datascience  kdd  data 
september 2011 by donturn
Orange - Data Mining Fruitful & Fun
the Orange app w/python & graphical dev interfaces for data mining & visualization looks nice #opensource
datamining  db  machinelearning  opensource  python  dev  viz  analysis  textmining 
february 2011 by donturn
OpenVBX: the Web-based, Open Source Phone System for Business
a lot like google voice, but you run it on your own server
opensource  api  phone  voice  audio 
june 2010 by donturn
Quick Media Converter
FLV TO AVI DIVX DVD MP4 MPEG MP3 H264 IPHONE IPOD MOV WMV XVID WII XBOX PS3 3GP 3G2 TS VOB Free Video Audio Converter Cocoon Software
apps  windows  microsoft  xp  opensource  video  music  audio  utilities  media  mp3  iphone  ipod  mp4  convert 
january 2009 by donturn
« earlier      

related tags

academic  ads  aggregator  ajax  algorithm  algorithms  analysis  analytics  annotation  apache  api  apps  architecture  archive  atom  audio  automation  aws  backup  bayes  behavior  blackberry  blocking  blog  blogging  blogginh  bonjour  book  bookmarks  books  brower  browser  business  calendar  cf  cfp  charts  chat  cipa  classification  cli  client  cloud  clustering  cms  cocoa  code  collaboration  collaborative_filtering  community  compress  conferences  convert  cookbook  corporate  crawler  crm  cs  cscw  css  dashboard  data  database  datamining  datascience  data_mining  db  design  desktop  dev  dvd  dvr  eclipse  economics  editor  education  email  encryption  enterprise  epub  etl  examples  excel  extensions  facebook  facets  files  filteirng  filtering  find  firefox  firewall  flash  flickr  folksonomy  framework  freeware  gmail  google  gov  gpl  gps  graph  graphics  greasemonkey  grep  groupware  gtd  gui  hacks  hadoop  hci  hdtv  heatmap  howto  hypertext  ia  ibm  ical  ide  im  imap  index  indexing  information_architecture  information_retrieval  interaction  interface  intranet  iphone  iphoto  ipod  ir  itunes  java  javascript  journal  kdd  kde  keyboard  km  kms  language  launcher  ldap  linguistics  linux  lisp  literature  log  logging  logs  lucene  mac  machinelearning  machine_learning  macro  mail  management  managment  mapreduce  maps  mashup  math  mathematics  media  messaging  metadata  metrics  microformats  microsoft  mobile  monitor  movies  mozilla  mp3  mp4  music  mysql  mythtv  network  networks  nlp  nokia  nosql  notepad  notes  nutch  office  ontology  opensource  open_source  oreilly  organization  osx  outliner  outlook  owl  palm  passwords  pdf  phone  photos  php  pictures  pim  plugin  podcast  policy  portable  portal  powerpoint  privacy  productivity  programming  project  project_management  prototyping  proxy  publishing  pvr  python  qualitative  quant  quantia  questionnaires  quicksilver  r-project  rdf  reference  regex  reports  research  rss  rstats  safari  scan  scanner  science  screensaver  scripting  search  security  semantic  semantic_web  semanti_web  semweb  sentiment  server  services  socialmedia  social_computing  social_networks  social_software  software  solaris  solr  source  spam  spider  splus  spotlight  spreadsheet  spss  sql  squid  stack  statistics  stats  statsitcs  sun  survey  syllabi  sync  tagging  tags  teaching  technology  text  textbook  textmining  thunderbird  tivo  todo  tools  tv  twitter  ubuntu  ui  unix  usability  usb  utilities  ux  video  vista  visualization  viz  voice  web  web2  webkit  weka  wifi  wiki  wikipedia  windows  wireframe  word  wordpress  work  xml  xp  yahoo 

Copy this bookmark:



description:


tags: