Walking Randomly » Natural Scientists: their very big output files – and a tale of diffs
april 2011 by Vaguery
"A few years back, when a user at the University of Manchester asked for help with the ‘diff – files too big/ out of memory’ problem, I wrote a modern version that I called idiffh (for Ian’s diffh). My ground rules were:<br />
Work on any text files on any operating system with a C compilerHave no limits on, e.g., line lengths or file sizeNever ‘give up’ if the going gets tough (i.e. when the files are very different)"
diff
text-mining
dataset
open-science
tools
from delicious
Work on any text files on any operating system with a C compilerHave no limits on, e.g., line lengths or file sizeNever ‘give up’ if the going gets tough (i.e. when the files are very different)"
april 2011 by Vaguery
Buy Historical Market Data
july 2010 by Vaguery
"Select the historical market data products below
Here you can select the products you are interested in. Click on the product's name to find out more about it. Press the Continue button to place an order or to get a quote."
nudge-targets
trading
data
dataset
financial-engineering
Here you can select the products you are interested in. Click on the product's name to find out more about it. Press the Continue button to place an order or to get a quote."
july 2010 by Vaguery
The Berkeley Segmentation Dataset and Benchmark
june 2010 by Vaguery
"The goal of this work is to provide an empirical basis for research on image segmentation and boundary detection. To this end, we have collected 12,000 hand-labeled segmentations of 1,000 Corel dataset images from 30 human subjects. Half of the segmentations were obtained from presenting the subject with a color image; the other half from presenting a grayscale image. The public benchmark based on this data consists of all of the grayscale and color segmentations for 300 images. The images are divided into a training set of 200 images, and a test set of 100 images."
dataset
learning-from-data
training-set
machine-learning
image-segmentation
image-processing
nudge
june 2010 by Vaguery
[1006.3679] Segmentation of Natural Images by Texture and Boundary Compression
june 2010 by Vaguery
"We present a novel algorithm for segmentation of natural images that harnesses the principle of minimum description length (MDL). Our method is based on observations that a homogeneously textured region of a natural image can be well modeled by a Gaussian distribution and the region boundary can be effectively coded by an adaptive chain code. The optimal segmentation of an image is the one that gives the shortest coding length for encoding all textures and boundaries in the image, and is obtained via an agglomerative clustering process applied to a hierarchy of decreasing window sizes as multi-scale texture features. The optimal segmentation also provides an accurate estimate of the overall coding length and hence the true entropy of the image. We test our algorithm on the publicly available Berkeley Segmentation Dataset. It achieves state-of-the-art segmentation results compared to other existing methods."
algorithms
image-segmentation
numerical-methods
machine-learning
image-compression
nudge-targets
dataset
june 2010 by Vaguery
Stock, Futures and FOREX End of Day Data in MetaStock Data and ASCII Data formats
november 2009 by Vaguery
"Norgate Investor Services provides quality end-of-day data for stock markets in Australia (ASX), Asia (SGX) and USA (NASDAQ, NYSE, NYSE Amex, NYSE Arca, OTC-BB, PinkSheets). Extensive historical data is available. Hourly snapshot data is available for the ASX and SGX. Data is provided in a "MetaStock™ compatible" data format.
Stock data is organised into security types (equities, indices, warrants, options) and can be organised into custom folders which allow you to segregate such as index participation, sector, industry group, dividend-paying-shares. World Indices are provided free with any subscription."
data
dataset
financial-engineering
trading
investment
subscriptions
Nudge
Stock data is organised into security types (equities, indices, warrants, options) and can be organised into custom folders which allow you to segregate such as index participation, sector, industry group, dividend-paying-shares. World Indices are provided free with any subscription."
november 2009 by Vaguery
I Now Have Delisted Stock Data! | System Trading with Woodshedder
november 2009 by Vaguery
"I got my data from Norgate Investor Services, (the same folks that provide my end-of-day feed). They only charge a one-time fee for the delisted data, while some of their competitors charge as much as 3x Norgate’s one time fee with the charge recurring annually!
Since adding the delisted database, I have not noted any great differences in the historical results of the systems I work with. I have stated a few times that it is my belief that short-term systems that hold stocks for a few days to a week are not likely to suffer greatly from survivorship bias. So far, this belief is proving to be true."
data
dataset
stocks
history
data-as-a-service
trading
investing
technical-analysis
learning-from-data
Since adding the delisted database, I have not noted any great differences in the historical results of the systems I work with. I have stated a few times that it is my belief that short-term systems that hold stocks for a few days to a week are not likely to suffer greatly from survivorship bias. So far, this belief is proving to be true."
november 2009 by Vaguery
http://moya.bus.miami.edu/~tallys/cusplib/
november 2009 by Vaguery
"Consider the following optimization problem: we are given n jobs, a time horizon T, and one machine M with processing capacity Cap >= 2. Each job has a processing time (pj), release date (rj), due date (dj), machine utilization (cj), and weight (wj). We would like to schedule all the jobs on machine M while making sure that: (i) all jobs obey their execution window [rj,dj] (to a certain extent; see possible objectives), and (ii) we respect the machine capacity at all times (i.e., given a time 0 <= t <= T, the sum of cj over all jobs running at time t is always less than or equal to Cap). Possible objective functions are: minimize makespan, minimize total (weighted) tardiness, minimize total number of late jobs, minimize total (weighted) delay, etc."
operations-research
optimization
library
dataset
examples
problem-solving
Nudge
november 2009 by Vaguery
The Ann Arbor Chronicle » City and Residents to Make Tree Policy
july 2009 by Vaguery
"We asked the city of Ann Arbor for all the electronic deliverables from Davey. And we provide the following data with a caveat: On Monday evening, city staff stressed that they were still doing some quality control work on the initial data set – so the data provided to The Chronicle is a snapshot of the city’s trees as assessed by the Davey Resource Group. The city’s inventory will presumably be maintained as a frequently updated data set that changes as trees are pruned, removed, or planted."
local
Ann-Arbor
GIS
raw-data-now
trees
dataset
mapping
transparency
open-access
public-policy
july 2009 by Vaguery
Historical Statistics for Mineral Commodities in the United States, Data Series 2005-140
june 2009 by Vaguery
" The U.S. Geological Survey (USGS) provides information to the public and to policy-makers concerning the current use and flow of minerals and materials in the United States economy. The USGS collects, analyzes, and disseminates minerals information on most nonfuel mineral commodities.
This USGS digital database is an online compilation of historical U.S. statistics on mineral and material commodities. The database contains information on approximately 90 mineral commodities, including production, imports, exports, and stocks; reported and apparent consumption; and unit value (the real and nominal price in U.S. dollars of a metric ton of apparent consumption). For many of the commodities, data are reported as far back as 1900. Each commodity file includes a document that describes of the units of measure, defines terms, and lists USGS contacts for additional information."
data
dataset
commodities
minerals
investment
trading
speculation
raw-data-now
USGS
history
economics
mining
production
This USGS digital database is an online compilation of historical U.S. statistics on mineral and material commodities. The database contains information on approximately 90 mineral commodities, including production, imports, exports, and stocks; reported and apparent consumption; and unit value (the real and nominal price in U.S. dollars of a metric ton of apparent consumption). For many of the commodities, data are reported as far back as 1900. Each commodity file includes a document that describes of the units of measure, defines terms, and lists USGS contacts for additional information."
june 2009 by Vaguery
Inside the iPhone field test mode - Blog - WirelessInfo.com - Cell Phone Reviews and Wireless Plan Ratings
october 2008 by Vaguery
"The iPhone field mode shows a lot of information. In fact, it is more comprehensive than many other phone field modes, allowing you to see the details of the individual cell towers and a lot of detail about the cell phone network. To access it, dial *3001#12345#*. If you are already in a call, just hit "add call", enter the number above and hit call; the phone will go into test mode, but keep your call connected. "
transparency
hack
iPgibw
cell-network
cell
Apple
iPhone
network
dataset
october 2008 by Vaguery
related tags
algorithms ⊕ analytics ⊕ Ann-Arbor ⊕ Apple ⊕ cell ⊕ cell-network ⊕ challenge ⊕ classification ⊕ commodities ⊕ competition ⊕ conferences ⊕ contest ⊕ data ⊕ data-analysis ⊕ data-as-a-service ⊕ data-mining ⊕ dataset ⊖ diff ⊕ economics ⊕ examples ⊕ feature-detection ⊕ financial-engineering ⊕ genetic-programming ⊕ GIS ⊕ Google ⊕ hack ⊕ history ⊕ image-analogies ⊕ image-compression ⊕ image-processing ⊕ image-segmentation ⊕ investing ⊕ investment ⊕ iPgibw ⊕ iPhone ⊕ KDD ⊕ language ⊕ learning-from-data ⊕ library ⊕ linguistics ⊕ local ⊕ machine-learning ⊕ mapping ⊕ minerals ⊕ mining ⊕ n-grams ⊕ network ⊕ nudge ⊕ nudge-targets ⊕ numerical-methods ⊕ open-access ⊕ open-science ⊕ operations-research ⊕ optimization ⊕ problem-solving ⊕ production ⊕ public-policy ⊕ raw-data-now ⊕ resources ⊕ speculation ⊕ stocks ⊕ subscriptions ⊕ technical-analysis ⊕ text-mining ⊕ textures ⊕ tools ⊕ trading ⊕ training-set ⊕ transparency ⊕ trees ⊕ USGS ⊕ visualization ⊕Copy this bookmark: