Vaguery + dataset   13

Walking Randomly » Natural Scientists: their very big output files – and a tale of diffs
"A few years back, when a user at the University of Manchester asked for help with the ‘diff – files too big/ out of memory’ problem, I wrote a modern version that I called idiffh (for Ian’s diffh). My ground rules were:<br />
Work on any text files on any operating system with a C compilerHave no limits on, e.g., line lengths or file sizeNever ‘give up’ if the going gets tough (i.e. when the files are very different)"
diff  text-mining  dataset  open-science  tools  from delicious
april 2011 by Vaguery
Buy Historical Market Data
"Select the historical market data products below
Here you can select the products you are interested in. Click on the product's name to find out more about it. Press the Continue button to place an order or to get a quote."
nudge-targets  trading  data  dataset  financial-engineering 
july 2010 by Vaguery
The Berkeley Segmentation Dataset and Benchmark
"The goal of this work is to provide an empirical basis for research on image segmentation and boundary detection. To this end, we have collected 12,000 hand-labeled segmentations of 1,000 Corel dataset images from 30 human subjects. Half of the segmentations were obtained from presenting the subject with a color image; the other half from presenting a grayscale image. The public benchmark based on this data consists of all of the grayscale and color segmentations for 300 images. The images are divided into a training set of 200 images, and a test set of 100 images."
dataset  learning-from-data  training-set  machine-learning  image-segmentation  image-processing  nudge 
june 2010 by Vaguery
[1006.3679] Segmentation of Natural Images by Texture and Boundary Compression
"We present a novel algorithm for segmentation of natural images that harnesses the principle of minimum description length (MDL). Our method is based on observations that a homogeneously textured region of a natural image can be well modeled by a Gaussian distribution and the region boundary can be effectively coded by an adaptive chain code. The optimal segmentation of an image is the one that gives the shortest coding length for encoding all textures and boundaries in the image, and is obtained via an agglomerative clustering process applied to a hierarchy of decreasing window sizes as multi-scale texture features. The optimal segmentation also provides an accurate estimate of the overall coding length and hence the true entropy of the image. We test our algorithm on the publicly available Berkeley Segmentation Dataset. It achieves state-of-the-art segmentation results compared to other existing methods."
algorithms  image-segmentation  numerical-methods  machine-learning  image-compression  nudge-targets  dataset 
june 2010 by Vaguery
Stock, Futures and FOREX End of Day Data in MetaStock Data and ASCII Data formats
"Norgate Investor Services provides quality end-of-day data for stock markets in Australia (ASX), Asia (SGX) and USA (NASDAQ, NYSE, NYSE Amex, NYSE Arca, OTC-BB, PinkSheets). Extensive historical data is available. Hourly snapshot data is available for the ASX and SGX. Data is provided in a "MetaStock™ compatible" data format.

Stock data is organised into security types (equities, indices, warrants, options) and can be organised into custom folders which allow you to segregate such as index participation, sector, industry group, dividend-paying-shares. World Indices are provided free with any subscription."
data  dataset  financial-engineering  trading  investment  subscriptions  Nudge 
november 2009 by Vaguery
I Now Have Delisted Stock Data! | System Trading with Woodshedder
"I got my data from Norgate Investor Services, (the same folks that provide my end-of-day feed). They only charge a one-time fee for the delisted data, while some of their competitors charge as much as 3x Norgate’s one time fee with the charge recurring annually!
Since adding the delisted database, I have not noted any great differences in the historical results of the systems I work with. I have stated a few times that it is my belief that short-term systems that hold stocks for a few days to a week are not likely to suffer greatly from survivorship bias. So far, this belief is proving to be true."
data  dataset  stocks  history  data-as-a-service  trading  investing  technical-analysis  learning-from-data 
november 2009 by Vaguery
http://moya.bus.miami.edu/~tallys/cusplib/
"Consider the following optimization problem: we are given n jobs, a time horizon T, and one machine M with processing capacity Cap >= 2. Each job has a processing time (pj), release date (rj), due date (dj), machine utilization (cj), and weight (wj). We would like to schedule all the jobs on machine M while making sure that: (i) all jobs obey their execution window [rj,dj] (to a certain extent; see possible objectives), and (ii) we respect the machine capacity at all times (i.e., given a time 0 <= t <= T, the sum of cj over all jobs running at time t is always less than or equal to Cap). Possible objective functions are: minimize makespan, minimize total (weighted) tardiness, minimize total number of late jobs, minimize total (weighted) delay, etc."
operations-research  optimization  library  dataset  examples  problem-solving  Nudge 
november 2009 by Vaguery
The Ann Arbor Chronicle » City and Residents to Make Tree Policy
"We asked the city of Ann Arbor for all the electronic deliverables from Davey. And we provide the following data with a caveat: On Monday evening, city staff stressed that they were still doing some quality control work on the initial data set – so the data provided to The Chronicle is a snapshot of the city’s trees as assessed by the Davey Resource Group. The city’s inventory will presumably be maintained as a frequently updated data set that changes as trees are pruned, removed, or planted."
local  Ann-Arbor  GIS  raw-data-now  trees  dataset  mapping  transparency  open-access  public-policy 
july 2009 by Vaguery
Historical Statistics for Mineral Commodities in the United States, Data Series 2005-140
" The U.S. Geological Survey (USGS) provides information to the public and to policy-makers concerning the current use and flow of minerals and materials in the United States economy. The USGS collects, analyzes, and disseminates minerals information on most nonfuel mineral commodities.

This USGS digital database is an online compilation of historical U.S. statistics on mineral and material commodities. The database contains information on approximately 90 mineral commodities, including production, imports, exports, and stocks; reported and apparent consumption; and unit value (the real and nominal price in U.S. dollars of a metric ton of apparent consumption). For many of the commodities, data are reported as far back as 1900. Each commodity file includes a document that describes of the units of measure, defines terms, and lists USGS contacts for additional information."
data  dataset  commodities  minerals  investment  trading  speculation  raw-data-now  USGS  history  economics  mining  production 
june 2009 by Vaguery
Inside the iPhone field test mode - Blog - WirelessInfo.com - Cell Phone Reviews and Wireless Plan Ratings
"The iPhone field mode shows a lot of information. In fact, it is more comprehensive than many other phone field modes, allowing you to see the details of the individual cell towers and a lot of detail about the cell phone network. To access it, dial *3001#12345#*. If you are already in a call, just hit "add call", enter the number above and hit call; the phone will go into test mode, but keep your call connected. "
transparency  hack  iPgibw  cell-network  cell  Apple  iPhone  network  dataset 
october 2008 by Vaguery

Copy this bookmark:



description:


tags: