elasticsearch - guide - Attachment Type
august 2011 by rybesh
he attachment type allows to index different “attachment” type field (encoded as base64), for example, microsoft office formats, open document formats, ePub, HTML, and so on (full list can be found here).
elasticsearch
search
reference
pdf
august 2011 by rybesh
PDFMiner
may 2011 by rybesh
PDFMiner is a tool for extracting information from PDF documents. Unlike other PDF-related tools, it focuses entirely on getting and analyzing text data. PDFMiner allows one to obtain the exact location of text in a page, as well as other information such as fonts or lines. It includes a PDF converter that can transform PDF files into other text formats (such as HTML). It has an extensible PDF parser that can be used for other purposes than text analysis.
pdf
python
tools
may 2011 by rybesh
HTML/CSS to PDF converter written in Python - HTML2PDF Converter
april 2011 by rybesh
XHTML2PDF is a converter for HTML/XHTML and CSS to PDF and a Python package.
css
html
pdf
python
django
april 2011 by rybesh
Pandoc - About pandoc
april 2011 by rybesh
If you need to convert files from one markup format into another, pandoc is your swiss-army knife. Need to generate a man page from a markdown file? No problem. LaTeX to Docbook? Sure. HTML to MediaWiki? Yes, that too. Pandoc can read markdown and (subsets of) reStructuredText, textile, HTML, and LaTeX, and it can write plain text, markdown, reStructuredText, HTML, LaTeX, ConTeXt, PDF, RTF, DocBook XML, OpenDocument XML, ODT, GNU Texinfo, MediaWiki markup, textile, groff man pages, Emacs org-mode, EPUB ebooks, and S5 and Slidy HTML slide shows. PDF output (via LaTeX) is also supported with the included markdown2pdf wrapper script.
html
latex
markup
pdf
markdown
april 2011 by rybesh
FlexPaper - the open source document viewer solution for pdf, doc, ..
october 2010 by rybesh
FlexPaper displays documents in your favorite browser using flash. Its way of reusing display containers makes it possible to view large documents and books.
pdf
flex
flash
tools
interface
web
october 2010 by rybesh
ReportLab - Open Source Software
november 2008 by rybesh
The ReportLab Open Source PDF library is a proven industry-strength PDF generating solution, that you can use for meeting your requirements and deadlines in enterprise reporting systems.
python
pdf
printing
code
november 2008 by rybesh
The pstotext program
november 2004 by rybesh
pstotext is a program that works with Ghostscript (version 3.33 or later) to extract plain text from PostScript and PDF files.
opensource
pdf
tools
november 2004 by rybesh
Multiple Perspective Interactive Video
october 2004 by rybesh
In MPI video, a viewer could view an event from multiple perspectives, even based on the contents of the events.
3d
computervision
ideas
pdf
video
october 2004 by rybesh
Classes vs. Prototypes - Some Philosophical and Historical Observations (ResearchIndex)
october 2004 by rybesh
In this paper we take a rather unusual, non-technical approach and investigate object-oriented programming and the prototype-based programming field from a purely philosophical viewpoint.
code
oop
pdf
philosophy
october 2004 by rybesh
Using Gimp to fill in PDFs
september 2004 by rybesh
Some .pdf forms allow you to fill them in, but most don't. In the old days your choices were a pen or a typewriter--neither particularly appetizing. Now you can use Gimp to fill in the forms.
howto
pdf
september 2004 by rybesh
An ethnographic study of music information seeking
september 2004 by rybesh
Eliciting the native music information strategies employed by people searching for popular music.
acm
music
pdf
search
social
september 2004 by rybesh
Content management for electronic music distribution
september 2004 by rybesh
Advanced techniques are necessary to help users navigate in large music catalogs... there is still a long way to go... in particular concerning the nature of the metadata and similarity relations extracted.
acm
doi
identity
music
pdf
personalization
web
september 2004 by rybesh
Interdisciplinary Communities and Research Issues in Music Information Retrieval
september 2004 by rybesh
In order for MIR to succeed, researchers need to work with real user communities and develop research resources such as reference music collections.
music
pdf
search
september 2004 by rybesh
A Naturalist Approach to Music File Name Analysis
september 2004 by rybesh
An identification mechanism that exploits the information found in music audio filenames.
identity
metadata
music
pdf
september 2004 by rybesh
Knowledge-Based Extraction of Named Entities
september 2004 by rybesh
A knowledge-based approach to learning rules for named-entity extraction from unstructured Web text.
identity
nlp
pdf
september 2004 by rybesh
Adaptive Name Matching in Information Integration
september 2004 by rybesh
Our research explores approaches to the namematching problem that improve accuracy, by combining multiple string similarity methods that capture different notions of similarity to adapt to a specific domain.
identity
nlp
pdf
september 2004 by rybesh
Object Co-identification on the Semantic Web
september 2004 by rybesh
The SemanticWeb seeks integrate data from many different sources. Since different sources often use different names for the same object, we need to map between these names.
identity
pdf
semweb
september 2004 by rybesh
Semantic Negotiation: Coidentifying objects across data sources
september 2004 by rybesh
Integrating and composing web services from different providers requires a solution for the problem of different providers using different names for the same object.
identity
metadata
pdf
search
semweb
september 2004 by rybesh
Inferring Descriptions and Similarity for Music from Community Metadata.
september 2004 by rybesh
Methods for unsupervised learning of text profiles for music from unstructured text obtained from the web.
metadata
music
nlp
pdf
personalization
social
september 2004 by rybesh
Using cultural metadata for artist recommendations
september 2004 by rybesh
The beauty of this approach lies in the possibility to access so-called cultural metadata that is the agglomeration of several independent--originally subjective--perspectives about music.
metadata
music
nlp
pdf
personalization
september 2004 by rybesh
Retrieval effectiveness of an ontology-based model for information selection
september 2004 by rybesh
A scalable disambiguation algorithm that prunes irrelevant concepts and allows relevant ones to associate with documents and participate in query generation.
acm
doi
kr
pdf
search
september 2004 by rybesh
Personalization of user profiles for content-based music retrieval based on relevance feedback
september 2004 by rybesh
A music retrieval method which retrieves songs based on the user's musical preferences. Since music preferences are expected to be highly ambiguous, relevance feedback methods are used to improve performance.
acm
doi
music
pdf
personalization
search
september 2004 by rybesh
Representing internet streaming media metadata using MPEG-7 multimedia description schemes
september 2004 by rybesh
Singingfish.com uses MPEG-7 description schemes to model the metadata characteristics of Internet streaming media.
acm
doi
metadata
multimedia
pdf
search
streaming
september 2004 by rybesh
Computers and the Humanties: Special Issue on Digital Images
september 2004 by rybesh
This special issue of Computers and the Humanities addresses the challenges and opportunities in designing, building, and using digital image collections.
academia
ideas
image
library
pdf
video
september 2004 by rybesh
SWISH-Enhanced
september 2004 by rybesh
A fast, powerful, flexible, free, and easy to use system for indexing collections of Web pages or other text files (including PDFs).
library
opensource
pdf
search
tools
september 2004 by rybesh
Docco
september 2004 by rybesh
A little personal document management system. It scans for a number of different document formats and creates a database containing which words are contained in which documents.
infoviz
java
library
opensource
pdf
search
tools
september 2004 by rybesh
Multivalent
september 2004 by rybesh
Free and open source Java software for scanned paper, PDF, HTML, UNIX manual pages, TeX DVI, and more.
java
library
metadata
opensource
pdf
tools
september 2004 by rybesh
related tags
3d ⊕ academia ⊕ acm ⊕ archives ⊕ code ⊕ computervision ⊕ css ⊕ django ⊕ documents ⊕ doi ⊕ elasticsearch ⊕ flash ⊕ flex ⊕ howto ⊕ html ⊕ ideas ⊕ identity ⊕ image ⊕ indexing ⊕ infoviz ⊕ interface ⊕ java ⊕ kr ⊕ latex ⊕ library ⊕ lucene ⊕ markdown ⊕ markup ⊕ metadata ⊕ multimedia ⊕ music ⊕ nlp ⊕ OCR ⊕ oop ⊕ opensource ⊕ pdf ⊖ personalization ⊕ philosophy ⊕ printing ⊕ python ⊕ reference ⊕ search ⊕ semweb ⊕ social ⊕ streaming ⊕ tools ⊕ video ⊕ web ⊕ xml ⊕Copy this bookmark: