rybesh + pdf   32

elasticsearch - guide - Attachment Type
he attachment type allows to index different “attachment” type field (encoded as base64), for example, microsoft office formats, open document formats, ePub, HTML, and so on (full list can be found here).
elasticsearch  search  reference  pdf 
august 2011 by rybesh
elasticsearch - tutorials - Attachment Type in Action
This tutorial will walk you through basic attachment type setup and use in search including highighting. (How to use elasticsearch to index PDFs and other file types.)
indexing  search  howto  pdf 
july 2011 by rybesh
Apache Tika - Apache Tika
The Apache Tika™ toolkit detects and extracts metadata and structured text content from various documents using existing parser libraries.
lucene  metadata  search  pdf 
july 2011 by rybesh
PDFMiner
PDFMiner is a tool for extracting information from PDF documents. Unlike other PDF-related tools, it focuses entirely on getting and analyzing text data. PDFMiner allows one to obtain the exact location of text in a page, as well as other information such as fonts or lines. It includes a PDF converter that can transform PDF files into other text formats (such as HTML). It has an extensible PDF parser that can be used for other purposes than text analysis.
pdf  python  tools 
may 2011 by rybesh
HTML/CSS to PDF converter written in Python - HTML2PDF Converter
XHTML2PDF is a converter for HTML/XHTML and CSS to PDF and a Python package.
css  html  pdf  python  django 
april 2011 by rybesh
Pandoc - About pandoc
If you need to convert files from one markup format into another, pandoc is your swiss-army knife. Need to generate a man page from a markdown file? No problem. LaTeX to Docbook? Sure. HTML to MediaWiki? Yes, that too. Pandoc can read markdown and (subsets of) reStructuredText, textile, HTML, and LaTeX, and it can write plain text, markdown, reStructuredText, HTML, LaTeX, ConTeXt, PDF, RTF, DocBook XML, OpenDocument XML, ODT, GNU Texinfo, MediaWiki markup, textile, groff man pages, Emacs org-mode, EPUB ebooks, and S5 and Slidy HTML slide shows. PDF output (via LaTeX) is also supported with the included markdown2pdf wrapper script.
html  latex  markup  pdf  markdown 
april 2011 by rybesh
FlexPaper - the open source document viewer solution for pdf, doc, ..
FlexPaper displays documents in your favorite browser using flash. Its way of reusing display containers makes it possible to view large documents and books.
pdf  flex  flash  tools  interface  web 
october 2010 by rybesh
ReportLab - Open Source Software
The ReportLab Open Source PDF library is a proven industry-strength PDF generating solution, that you can use for meeting your requirements and deadlines in enterprise reporting systems.
python  pdf  printing  code 
november 2008 by rybesh
iArchives - Leaders in Document Digitization
We at iArchives are pioneering the movement from paper to digital. Our powerful, patented technology can convert your data to high quality, searchable, readable PDF files quickly and easily.
archives  documents  image  pdf  OCR 
july 2007 by rybesh
cb2Bib: Overview
The cb2Bib is a tool for rapidly extracting unformatted, or unstandardized biblographic references from email alerts, journal Web pages, and PDF files.
academia  tools  pdf 
july 2005 by rybesh
YesLogic
A simple declarative CSS style sheet enables you to create good-looking PDF files without having to deal with complex XSL tranformations or proprietary solutions.
css  pdf  web  xml  tools 
january 2005 by rybesh
The pstotext program
pstotext is a program that works with Ghostscript (version 3.33 or later) to extract plain text from PostScript and PDF files.
opensource  pdf  tools 
november 2004 by rybesh
Multiple Perspective Interactive Video
In MPI video, a viewer could view an event from multiple perspectives, even based on the contents of the events.
3d  computervision  ideas  pdf  video 
october 2004 by rybesh
Classes vs. Prototypes - Some Philosophical and Historical Observations (ResearchIndex)
In this paper we take a rather unusual, non-technical approach and investigate object-oriented programming and the prototype-based programming field from a purely philosophical viewpoint.
code  oop  pdf  philosophy 
october 2004 by rybesh
Using Gimp to fill in PDFs
Some .pdf forms allow you to fill them in, but most don't. In the old days your choices were a pen or a typewriter--neither particularly appetizing. Now you can use Gimp to fill in the forms.
howto  pdf 
september 2004 by rybesh
An ethnographic study of music information seeking
Eliciting the native music information strategies employed by people searching for popular music.
acm  music  pdf  search  social 
september 2004 by rybesh
Content management for electronic music distribution
Advanced techniques are necessary to help users navigate in large music catalogs... there is still a long way to go... in particular concerning the nature of the metadata and similarity relations extracted.
acm  doi  identity  music  pdf  personalization  web 
september 2004 by rybesh
Interdisciplinary Communities and Research Issues in Music Information Retrieval
In order for MIR to succeed, researchers need to work with real user communities and develop research resources such as reference music collections.
music  pdf  search 
september 2004 by rybesh
A Naturalist Approach to Music File Name Analysis
An identification mechanism that exploits the information found in music audio filenames.
identity  metadata  music  pdf 
september 2004 by rybesh
Knowledge-Based Extraction of Named Entities
A knowledge-based approach to learning rules for named-entity extraction from unstructured Web text.
identity  nlp  pdf 
september 2004 by rybesh
Adaptive Name Matching in Information Integration
Our research explores approaches to the namematching problem that improve accuracy, by combining multiple string similarity methods that capture different notions of similarity to adapt to a specific domain.
identity  nlp  pdf 
september 2004 by rybesh
Object Co-identification on the Semantic Web
The SemanticWeb seeks integrate data from many different sources. Since different sources often use different names for the same object, we need to map between these names.
identity  pdf  semweb 
september 2004 by rybesh
Semantic Negotiation: Coidentifying objects across data sources
Integrating and composing web services from different providers requires a solution for the problem of different providers using different names for the same object.
identity  metadata  pdf  search  semweb 
september 2004 by rybesh
Inferring Descriptions and Similarity for Music from Community Metadata.
Methods for unsupervised learning of text profiles for music from unstructured text obtained from the web.
metadata  music  nlp  pdf  personalization  social 
september 2004 by rybesh
Using cultural metadata for artist recommendations
The beauty of this approach lies in the possibility to access so-called cultural metadata that is the agglomeration of several independent--originally subjective--perspectives about music.
metadata  music  nlp  pdf  personalization 
september 2004 by rybesh
Retrieval effectiveness of an ontology-based model for information selection
A scalable disambiguation algorithm that prunes irrelevant concepts and allows relevant ones to associate with documents and participate in query generation.
acm  doi  kr  pdf  search 
september 2004 by rybesh
Personalization of user profiles for content-based music retrieval based on relevance feedback
A music retrieval method which retrieves songs based on the user's musical preferences. Since music preferences are expected to be highly ambiguous, relevance feedback methods are used to improve performance.
acm  doi  music  pdf  personalization  search 
september 2004 by rybesh
Representing internet streaming media metadata using MPEG-7 multimedia description schemes
Singingfish.com uses MPEG-7 description schemes to model the metadata characteristics of Internet streaming media.
acm  doi  metadata  multimedia  pdf  search  streaming 
september 2004 by rybesh
Computers and the Humanties: Special Issue on Digital Images
This special issue of Computers and the Humanities addresses the challenges and opportunities in designing, building, and using digital image collections.
academia  ideas  image  library  pdf  video 
september 2004 by rybesh
SWISH-Enhanced
A fast, powerful, flexible, free, and easy to use system for indexing collections of Web pages or other text files (including PDFs).
library  opensource  pdf  search  tools 
september 2004 by rybesh
Docco
A little personal document management system. It scans for a number of different document formats and creates a database containing which words are contained in which documents.
infoviz  java  library  opensource  pdf  search  tools 
september 2004 by rybesh
Multivalent
Free and open source Java software for scanned paper, PDF, HTML, UNIX manual pages, TeX DVI, and more.
java  library  metadata  opensource  pdf  tools 
september 2004 by rybesh

Copy this bookmark:



description:


tags: