Merge #1023
1023: Feature: Added Boston Housing Dataset r=CarloLucibello a=pranjaldatta [Boston Housing Dataset](https://archive.ics.uci.edu/ml/machine-learning-databases/housing/) is one of the most common datasets that are used by beginners. It is as popular as other datasets like Iris etc. Hence, it feels only natural that this dataset is a part of Flux. Added src/data/housing.jl: code for downloading and loading the dataset Edited src/data/Data.jl: To include and export housing.jl Edited test/data.jl: Added test for the dataset. *All tests in test/data.jl are passing* Co-authored-by: pranjaldatta <pranjaldatta99@gmail.com> Co-authored-by: Pranjal Datta <pranjaldatta99@gmail.com>
This commit is contained in:
commit
37af9fb15c
|
@ -42,4 +42,7 @@ using .Sentiment
|
||||||
include("iris.jl")
|
include("iris.jl")
|
||||||
export Iris
|
export Iris
|
||||||
|
|
||||||
|
include("housing.jl")
|
||||||
|
export Housing
|
||||||
|
|
||||||
end
|
end
|
||||||
|
|
|
@ -0,0 +1,136 @@
|
||||||
|
"""
|
||||||
|
1. Title: Boston Housing Data
|
||||||
|
|
||||||
|
2. Sources:
|
||||||
|
(a) Origin: This dataset was taken from the StatLib library which is
|
||||||
|
maintained at Carnegie Mellon University.
|
||||||
|
(b) Creator: Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the
|
||||||
|
demand for clean air', J. Environ. Economics & Management,
|
||||||
|
vol.5, 81-102, 1978.
|
||||||
|
(c) Date: July 7, 1993
|
||||||
|
|
||||||
|
3. Number of Instances: 506
|
||||||
|
|
||||||
|
4. Number of Attributes: 13 continuous attributes (including "class"
|
||||||
|
attribute "MEDV"), 1 binary-valued attribute.
|
||||||
|
|
||||||
|
5. Attribute Information:
|
||||||
|
|
||||||
|
1. CRIM per capita crime rate by town
|
||||||
|
2. ZN proportion of residential land zoned for lots over
|
||||||
|
25,000 sq.ft.
|
||||||
|
3. INDUS proportion of non-retail business acres per town
|
||||||
|
4. CHAS Charles River dummy variable (= 1 if tract bounds
|
||||||
|
river; 0 otherwise)
|
||||||
|
5. NOX nitric oxides concentration (parts per 10 million)
|
||||||
|
6. RM average number of rooms per dwelling
|
||||||
|
7. AGE proportion of owner-occupied units built prior to 1940
|
||||||
|
8. DIS weighted distances to five Boston employment centres
|
||||||
|
9. RAD index of accessibility to radial highways
|
||||||
|
10. TAX full-value property-tax rate per 10,000 dollars
|
||||||
|
11. PTRATIO pupil-teacher ratio by town
|
||||||
|
12. B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks
|
||||||
|
by town
|
||||||
|
13. LSTAT % lower status of the population
|
||||||
|
14. MEDV Median value of owner-occupied homes in 1000's of dollars
|
||||||
|
|
||||||
|
Downloaded From: https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data
|
||||||
|
|
||||||
|
"""
|
||||||
|
module Housing
|
||||||
|
|
||||||
|
using DelimitedFiles
|
||||||
|
using ..Data: deps, download_and_verify
|
||||||
|
|
||||||
|
#Uncomment if package exists
|
||||||
|
#const cache_prefix = "https://cache.julialang.org/"
|
||||||
|
const cache_prefix = ""
|
||||||
|
|
||||||
|
function load()
|
||||||
|
isfile(deps("housing.data")) && return
|
||||||
|
|
||||||
|
@info "Downloading the Boston housing Dataset"
|
||||||
|
download_and_verify("$(cache_prefix)https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data",
|
||||||
|
deps("housing.data"),
|
||||||
|
"baadf72995725d76efe787b664e1f083388c79ba21ef9a7990d87f774184735a")
|
||||||
|
|
||||||
|
#@info "Download complete. Working on the files"
|
||||||
|
path = deps()
|
||||||
|
isfile(deps("housing.data")) && touch(joinpath(path, "tempfile.data"))
|
||||||
|
open(joinpath(path, "tempfile.data"), "a") do fout
|
||||||
|
open(deps("housing.data"), "r") do fin
|
||||||
|
for line in eachline(fin)
|
||||||
|
line = replace(lstrip(line), r" +" => s",")
|
||||||
|
println(fout, line)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
mv(joinpath(path, "tempfile.data"), deps("housing.data"), force=true)
|
||||||
|
end
|
||||||
|
|
||||||
|
"""
|
||||||
|
Gets the targets for the Boston housing dataset, a 506 element array listing the targets for each example
|
||||||
|
|
||||||
|
```jldoctest
|
||||||
|
julia> using Flux
|
||||||
|
|
||||||
|
julia> target = Flux.Data.Housing.targets()
|
||||||
|
|
||||||
|
julia> summary(target)
|
||||||
|
506×1 Array{Float64,2}
|
||||||
|
|
||||||
|
julia> target[1]
|
||||||
|
24.0
|
||||||
|
|
||||||
|
"""
|
||||||
|
function targets()
|
||||||
|
load()
|
||||||
|
housing = readdlm(deps("housing.data"), ',')
|
||||||
|
reshape(Vector{Float64}(housing[1:end,end]), (506, 1))
|
||||||
|
end
|
||||||
|
|
||||||
|
|
||||||
|
"""
|
||||||
|
Gets the names of the features provided in the dataset
|
||||||
|
|
||||||
|
"""
|
||||||
|
function feature_names()
|
||||||
|
["crim","zn","indus","chas","nox","rm","age","dis","rad","tax","ptratio","b","lstat"]
|
||||||
|
end
|
||||||
|
|
||||||
|
|
||||||
|
"""
|
||||||
|
Gets the features of the Boston Housing Dataset. This is a 506x13 Matrix of Float64 datatypes.
|
||||||
|
The values are in the order ["crim","zn","indus","chas","nox","rm","age","dis","rad","tax","ptratio","b","lstat"].
|
||||||
|
It has 506 examples.
|
||||||
|
|
||||||
|
```jldoctest
|
||||||
|
julia> using Flux
|
||||||
|
|
||||||
|
julia> features = Flux.Data.Housing.features()
|
||||||
|
|
||||||
|
julia> summary(features)
|
||||||
|
506×13 Array{Float64,2}
|
||||||
|
|
||||||
|
julia> features[1, :]
|
||||||
|
13-element Array{Float64,1}:
|
||||||
|
0.00632
|
||||||
|
18.0
|
||||||
|
2.31
|
||||||
|
0.0
|
||||||
|
0.538
|
||||||
|
⋮
|
||||||
|
296.0
|
||||||
|
15.3
|
||||||
|
396.9
|
||||||
|
4.98
|
||||||
|
|
||||||
|
"""
|
||||||
|
function features()
|
||||||
|
load()
|
||||||
|
housing = readdlm(deps("housing.data"), ',')
|
||||||
|
Matrix{Float64}(housing[1:end, 1:13])
|
||||||
|
end
|
||||||
|
|
||||||
|
|
||||||
|
end
|
|
@ -16,7 +16,13 @@ using Test
|
||||||
@test Data.Sentiment.train() isa Vector{Data.Tree{Any}}
|
@test Data.Sentiment.train() isa Vector{Data.Tree{Any}}
|
||||||
|
|
||||||
@test Iris.features() isa Matrix
|
@test Iris.features() isa Matrix
|
||||||
@test size(Iris.features()) == (4,150)
|
@test size(Iris.features()) == (4,150)
|
||||||
|
|
||||||
@test Iris.labels() isa Vector{String}
|
@test Iris.labels() isa Vector{String}
|
||||||
@test size(Iris.labels()) == (150,)
|
@test size(Iris.labels()) == (150,)
|
||||||
|
|
||||||
|
@test Housing.features() isa Matrix
|
||||||
|
@test size(Housing.features()) == (506, 13)
|
||||||
|
|
||||||
|
@test Housing.targets() isa Array{Float64}
|
||||||
|
@test size(Housing.targets()) == (506, 1)
|
||||||
|
|
Loading…
Reference in New Issue