{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Week 4 Peer Review"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"2. In a code cell below, import the required packages: Distributions, DataFrames, and Random (install these packages via the REPL if required)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# Import the required packages\n",
"using Distributions, DataFrames, Random"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# Seed the random number generator\n",
"Random.seed!(1234);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"3. In a code cell below, create a dataframe named df1, with 30 rows and 4 columns (variables). Call the first column ID. It should hold the values 1 through 30 (to make up 30 rows). Use three rand() function calls to generate three more columns named var1, var2, and var3. The second column (var1) should consist of 30 values from a standard normal distribution (mean of 0 and standard deviation of 1). The third column (var2) should consist of 30 random value from a normal distribution with a mean of 10 and a standard deviation of 2. The last column (var3) should contain 30 random values chosen from a range of integers between (and including) 5 and 15."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
| ID | var1 | var2 | var3 |
---|
| Int64 | Float64 | Float64 | Int64 |
---|
30 rows × 4 columns
1 | 1 | 0.867347 | 7.44066 | 14 |
---|
2 | 2 | -0.901744 | 11.9946 | 13 |
---|
3 | 3 | -0.494479 | 10.6048 | 12 |
---|
4 | 4 | -0.902914 | 9.92711 | 9 |
---|
5 | 5 | 0.864401 | 10.2839 | 15 |
---|
6 | 6 | 2.21188 | 11.0425 | 14 |
---|
7 | 7 | 0.532813 | 11.7935 | 15 |
---|
8 | 8 | -0.271735 | 8.97294 | 9 |
---|
9 | 9 | 0.502334 | 8.4704 | 9 |
---|
10 | 10 | -0.516984 | 6.91715 | 8 |
---|
11 | 11 | -0.560501 | 9.83968 | 15 |
---|
12 | 12 | -0.0192918 | 7.81756 | 14 |
---|
13 | 13 | 0.128064 | 8.83897 | 11 |
---|
14 | 14 | 1.85278 | 9.36913 | 10 |
---|
15 | 15 | -0.827763 | 7.2771 | 15 |
---|
16 | 16 | 0.110096 | 9.77109 | 15 |
---|
17 | 17 | -0.251176 | 10.3317 | 6 |
---|
18 | 18 | 0.369714 | 9.18312 | 5 |
---|
19 | 19 | 0.0721164 | 7.98043 | 12 |
---|
20 | 20 | -1.50343 | 8.91239 | 13 |
---|
21 | 21 | 1.56417 | 7.54655 | 14 |
---|
22 | 22 | -1.39674 | 8.91657 | 5 |
---|
23 | 23 | 1.1055 | 8.62701 | 8 |
---|
24 | 24 | -1.10673 | 8.57414 | 9 |
---|
25 | 25 | -3.21136 | 9.34588 | 5 |
---|
26 | 26 | -0.0740145 | 11.0297 | 9 |
---|
27 | 27 | 0.150976 | 14.8349 | 10 |
---|
28 | 28 | 0.769278 | 9.38405 | 14 |
---|
29 | 29 | -0.310153 | 12.4906 | 15 |
---|
30 | 30 | -0.602707 | 9.9001 | 7 |
---|
"
],
"text/latex": [
"\\begin{tabular}{r|cccc}\n",
"\t& ID & var1 & var2 & var3\\\\\n",
"\t\\hline\n",
"\t& Int64 & Float64 & Float64 & Int64\\\\\n",
"\t\\hline\n",
"\t1 & 1 & 0.867347 & 7.44066 & 14 \\\\\n",
"\t2 & 2 & -0.901744 & 11.9946 & 13 \\\\\n",
"\t3 & 3 & -0.494479 & 10.6048 & 12 \\\\\n",
"\t4 & 4 & -0.902914 & 9.92711 & 9 \\\\\n",
"\t5 & 5 & 0.864401 & 10.2839 & 15 \\\\\n",
"\t6 & 6 & 2.21188 & 11.0425 & 14 \\\\\n",
"\t7 & 7 & 0.532813 & 11.7935 & 15 \\\\\n",
"\t8 & 8 & -0.271735 & 8.97294 & 9 \\\\\n",
"\t9 & 9 & 0.502334 & 8.4704 & 9 \\\\\n",
"\t10 & 10 & -0.516984 & 6.91715 & 8 \\\\\n",
"\t11 & 11 & -0.560501 & 9.83968 & 15 \\\\\n",
"\t12 & 12 & -0.0192918 & 7.81756 & 14 \\\\\n",
"\t13 & 13 & 0.128064 & 8.83897 & 11 \\\\\n",
"\t14 & 14 & 1.85278 & 9.36913 & 10 \\\\\n",
"\t15 & 15 & -0.827763 & 7.2771 & 15 \\\\\n",
"\t16 & 16 & 0.110096 & 9.77109 & 15 \\\\\n",
"\t17 & 17 & -0.251176 & 10.3317 & 6 \\\\\n",
"\t18 & 18 & 0.369714 & 9.18312 & 5 \\\\\n",
"\t19 & 19 & 0.0721164 & 7.98043 & 12 \\\\\n",
"\t20 & 20 & -1.50343 & 8.91239 & 13 \\\\\n",
"\t21 & 21 & 1.56417 & 7.54655 & 14 \\\\\n",
"\t22 & 22 & -1.39674 & 8.91657 & 5 \\\\\n",
"\t23 & 23 & 1.1055 & 8.62701 & 8 \\\\\n",
"\t24 & 24 & -1.10673 & 8.57414 & 9 \\\\\n",
"\t25 & 25 & -3.21136 & 9.34588 & 5 \\\\\n",
"\t26 & 26 & -0.0740145 & 11.0297 & 9 \\\\\n",
"\t27 & 27 & 0.150976 & 14.8349 & 10 \\\\\n",
"\t28 & 28 & 0.769278 & 9.38405 & 14 \\\\\n",
"\t29 & 29 & -0.310153 & 12.4906 & 15 \\\\\n",
"\t30 & 30 & -0.602707 & 9.9001 & 7 \\\\\n",
"\\end{tabular}\n"
],
"text/plain": [
"30×4 DataFrame\n",
"│ Row │ ID │ var1 │ var2 │ var3 │\n",
"│ │ \u001b[90mInt64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mInt64\u001b[39m │\n",
"├─────┼───────┼────────────┼─────────┼───────┤\n",
"│ 1 │ 1 │ 0.867347 │ 7.44066 │ 14 │\n",
"│ 2 │ 2 │ -0.901744 │ 11.9946 │ 13 │\n",
"│ 3 │ 3 │ -0.494479 │ 10.6048 │ 12 │\n",
"│ 4 │ 4 │ -0.902914 │ 9.92711 │ 9 │\n",
"│ 5 │ 5 │ 0.864401 │ 10.2839 │ 15 │\n",
"│ 6 │ 6 │ 2.21188 │ 11.0425 │ 14 │\n",
"│ 7 │ 7 │ 0.532813 │ 11.7935 │ 15 │\n",
"│ 8 │ 8 │ -0.271735 │ 8.97294 │ 9 │\n",
"│ 9 │ 9 │ 0.502334 │ 8.4704 │ 9 │\n",
"│ 10 │ 10 │ -0.516984 │ 6.91715 │ 8 │\n",
"⋮\n",
"│ 20 │ 20 │ -1.50343 │ 8.91239 │ 13 │\n",
"│ 21 │ 21 │ 1.56417 │ 7.54655 │ 14 │\n",
"│ 22 │ 22 │ -1.39674 │ 8.91657 │ 5 │\n",
"│ 23 │ 23 │ 1.1055 │ 8.62701 │ 8 │\n",
"│ 24 │ 24 │ -1.10673 │ 8.57414 │ 9 │\n",
"│ 25 │ 25 │ -3.21136 │ 9.34588 │ 5 │\n",
"│ 26 │ 26 │ -0.0740145 │ 11.0297 │ 9 │\n",
"│ 27 │ 27 │ 0.150976 │ 14.8349 │ 10 │\n",
"│ 28 │ 28 │ 0.769278 │ 9.38405 │ 14 │\n",
"│ 29 │ 29 │ -0.310153 │ 12.4906 │ 15 │\n",
"│ 30 │ 30 │ -0.602707 │ 9.9001 │ 7 │"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = DataFrame(ID = 1:30, var1 = rand(Normal(0,1),30), var2 = rand(Normal(10,2),30), var3 = rand(5:15,30))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"4.In code cells below, write the code to calculate the mean and variance of each column in the dataframe. For example for the first variable this could be done using the println function and referring to each column (variable) by its symbol notation. Try to shorten the code with a for-loop, iterating over the variables names (in symbol format)."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"┌ Warning: `getindex(df::DataFrame, col_ind::ColumnIndex)` is deprecated, use `df[!, col_ind]` instead.\n",
"│ caller = top-level scope at In[4]:3\n",
"└ @ Core ./In[4]:3\n",
"┌ Warning: `getindex(df::DataFrame, col_ind::ColumnIndex)` is deprecated, use `df[!, col_ind]` instead.\n",
"│ caller = top-level scope at In[4]:4\n",
"└ @ Core ./In[4]:4\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"The mean of var1 is: -0.061674963752526096, the variance is: 1.1790054448274625\n",
"The mean of var2 is: 9.580613055613338, the variance is: 2.948790077536739\n",
"The mean of var3 is: 11.0, the variance is: 11.724137931034482\n"
]
}
],
"source": [
"for s in [:var1,:var2,:var3] #names(df)\n",
" colname = String(s)\n",
" meancol = mean(df[s])\n",
" variancecol = var(df[s])\n",
" println(\"The mean of $colname is: $meancol, the variance is: $variancecol\")\n",
"end"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"5. In a code cells below, create a new DataFrame named df2 from the last 20 rows of the original DataFrame, df1."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
" | ID | var1 | var2 | var3 |
---|
| Int64 | Float64 | Float64 | Int64 |
---|
20 rows × 4 columns
1 | 11 | -0.560501 | 9.83968 | 15 |
---|
2 | 12 | -0.0192918 | 7.81756 | 14 |
---|
3 | 13 | 0.128064 | 8.83897 | 11 |
---|
4 | 14 | 1.85278 | 9.36913 | 10 |
---|
5 | 15 | -0.827763 | 7.2771 | 15 |
---|
6 | 16 | 0.110096 | 9.77109 | 15 |
---|
7 | 17 | -0.251176 | 10.3317 | 6 |
---|
8 | 18 | 0.369714 | 9.18312 | 5 |
---|
9 | 19 | 0.0721164 | 7.98043 | 12 |
---|
10 | 20 | -1.50343 | 8.91239 | 13 |
---|
11 | 21 | 1.56417 | 7.54655 | 14 |
---|
12 | 22 | -1.39674 | 8.91657 | 5 |
---|
13 | 23 | 1.1055 | 8.62701 | 8 |
---|
14 | 24 | -1.10673 | 8.57414 | 9 |
---|
15 | 25 | -3.21136 | 9.34588 | 5 |
---|
16 | 26 | -0.0740145 | 11.0297 | 9 |
---|
17 | 27 | 0.150976 | 14.8349 | 10 |
---|
18 | 28 | 0.769278 | 9.38405 | 14 |
---|
19 | 29 | -0.310153 | 12.4906 | 15 |
---|
20 | 30 | -0.602707 | 9.9001 | 7 |
---|
"
],
"text/latex": [
"\\begin{tabular}{r|cccc}\n",
"\t& ID & var1 & var2 & var3\\\\\n",
"\t\\hline\n",
"\t& Int64 & Float64 & Float64 & Int64\\\\\n",
"\t\\hline\n",
"\t1 & 11 & -0.560501 & 9.83968 & 15 \\\\\n",
"\t2 & 12 & -0.0192918 & 7.81756 & 14 \\\\\n",
"\t3 & 13 & 0.128064 & 8.83897 & 11 \\\\\n",
"\t4 & 14 & 1.85278 & 9.36913 & 10 \\\\\n",
"\t5 & 15 & -0.827763 & 7.2771 & 15 \\\\\n",
"\t6 & 16 & 0.110096 & 9.77109 & 15 \\\\\n",
"\t7 & 17 & -0.251176 & 10.3317 & 6 \\\\\n",
"\t8 & 18 & 0.369714 & 9.18312 & 5 \\\\\n",
"\t9 & 19 & 0.0721164 & 7.98043 & 12 \\\\\n",
"\t10 & 20 & -1.50343 & 8.91239 & 13 \\\\\n",
"\t11 & 21 & 1.56417 & 7.54655 & 14 \\\\\n",
"\t12 & 22 & -1.39674 & 8.91657 & 5 \\\\\n",
"\t13 & 23 & 1.1055 & 8.62701 & 8 \\\\\n",
"\t14 & 24 & -1.10673 & 8.57414 & 9 \\\\\n",
"\t15 & 25 & -3.21136 & 9.34588 & 5 \\\\\n",
"\t16 & 26 & -0.0740145 & 11.0297 & 9 \\\\\n",
"\t17 & 27 & 0.150976 & 14.8349 & 10 \\\\\n",
"\t18 & 28 & 0.769278 & 9.38405 & 14 \\\\\n",
"\t19 & 29 & -0.310153 & 12.4906 & 15 \\\\\n",
"\t20 & 30 & -0.602707 & 9.9001 & 7 \\\\\n",
"\\end{tabular}\n"
],
"text/plain": [
"20×4 DataFrame\n",
"│ Row │ ID │ var1 │ var2 │ var3 │\n",
"│ │ \u001b[90mInt64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mInt64\u001b[39m │\n",
"├─────┼───────┼────────────┼─────────┼───────┤\n",
"│ 1 │ 11 │ -0.560501 │ 9.83968 │ 15 │\n",
"│ 2 │ 12 │ -0.0192918 │ 7.81756 │ 14 │\n",
"│ 3 │ 13 │ 0.128064 │ 8.83897 │ 11 │\n",
"│ 4 │ 14 │ 1.85278 │ 9.36913 │ 10 │\n",
"│ 5 │ 15 │ -0.827763 │ 7.2771 │ 15 │\n",
"│ 6 │ 16 │ 0.110096 │ 9.77109 │ 15 │\n",
"│ 7 │ 17 │ -0.251176 │ 10.3317 │ 6 │\n",
"│ 8 │ 18 │ 0.369714 │ 9.18312 │ 5 │\n",
"│ 9 │ 19 │ 0.0721164 │ 7.98043 │ 12 │\n",
"│ 10 │ 20 │ -1.50343 │ 8.91239 │ 13 │\n",
"│ 11 │ 21 │ 1.56417 │ 7.54655 │ 14 │\n",
"│ 12 │ 22 │ -1.39674 │ 8.91657 │ 5 │\n",
"│ 13 │ 23 │ 1.1055 │ 8.62701 │ 8 │\n",
"│ 14 │ 24 │ -1.10673 │ 8.57414 │ 9 │\n",
"│ 15 │ 25 │ -3.21136 │ 9.34588 │ 5 │\n",
"│ 16 │ 26 │ -0.0740145 │ 11.0297 │ 9 │\n",
"│ 17 │ 27 │ 0.150976 │ 14.8349 │ 10 │\n",
"│ 18 │ 28 │ 0.769278 │ 9.38405 │ 14 │\n",
"│ 19 │ 29 │ -0.310153 │ 12.4906 │ 15 │\n",
"│ 20 │ 30 │ -0.602707 │ 9.9001 │ 7 │"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df2 = df[11:end,:]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"6. In a code cells below, show the results of computing simple descriptive statistics on this new DataFrame using the describe() function."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
" | variable | mean | min | median | max | nunique | nmissing | eltype |
---|
| Symbol | Float64 | Real | Float64 | Real | Nothing | Nothing | DataType |
---|
4 rows × 8 columns
1 | ID | 20.5 | 11 | 20.5 | 30 | | | Int64 |
---|
2 | var1 | -0.187058 | -3.21136 | -0.0466532 | 1.85278 | | | Float64 |
---|
3 | var2 | 9.49853 | 7.2771 | 9.2645 | 14.8349 | | | Float64 |
---|
4 | var3 | 10.6 | 5 | 10.5 | 15 | | | Int64 |
---|
"
],
"text/latex": [
"\\begin{tabular}{r|cccccccc}\n",
"\t& variable & mean & min & median & max & nunique & nmissing & eltype\\\\\n",
"\t\\hline\n",
"\t& Symbol & Float64 & Real & Float64 & Real & Nothing & Nothing & DataType\\\\\n",
"\t\\hline\n",
"\t1 & ID & 20.5 & 11 & 20.5 & 30 & & & Int64 \\\\\n",
"\t2 & var1 & -0.187058 & -3.21136 & -0.0466532 & 1.85278 & & & Float64 \\\\\n",
"\t3 & var2 & 9.49853 & 7.2771 & 9.2645 & 14.8349 & & & Float64 \\\\\n",
"\t4 & var3 & 10.6 & 5 & 10.5 & 15 & & & Int64 \\\\\n",
"\\end{tabular}\n"
],
"text/plain": [
"4×8 DataFrame. Omitted printing of 2 columns\n",
"│ Row │ variable │ mean │ min │ median │ max │ nunique │\n",
"│ │ \u001b[90mSymbol\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mReal\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mReal\u001b[39m │ \u001b[90mNothing\u001b[39m │\n",
"├─────┼──────────┼───────────┼──────────┼────────────┼─────────┼─────────┤\n",
"│ 1 │ ID │ 20.5 │ 11 │ 20.5 │ 30 │ │\n",
"│ 2 │ var1 │ -0.187058 │ -3.21136 │ -0.0466532 │ 1.85278 │ │\n",
"│ 3 │ var2 │ 9.49853 │ 7.2771 │ 9.2645 │ 14.8349 │ │\n",
"│ 4 │ var3 │ 10.6 │ 5 │ 10.5 │ 15 │ │"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"describe(df2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"7. In a code cells below, add a column named cat1 to the df2 DataFrame consisting of a random selection of 20 values from the sample space GroupA and GroupB.m"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
" | ID | var1 | var2 | var3 | Col1 |
---|
| Int64 | Float64 | Float64 | Int64 | String |
---|
20 rows × 5 columns
1 | 11 | -0.560501 | 9.83968 | 15 | GroupA |
---|
2 | 12 | -0.0192918 | 7.81756 | 14 | GroupB |
---|
3 | 13 | 0.128064 | 8.83897 | 11 | GroupB |
---|
4 | 14 | 1.85278 | 9.36913 | 10 | GroupB |
---|
5 | 15 | -0.827763 | 7.2771 | 15 | GroupB |
---|
6 | 16 | 0.110096 | 9.77109 | 15 | GroupA |
---|
7 | 17 | -0.251176 | 10.3317 | 6 | GroupB |
---|
8 | 18 | 0.369714 | 9.18312 | 5 | GroupA |
---|
9 | 19 | 0.0721164 | 7.98043 | 12 | GroupB |
---|
10 | 20 | -1.50343 | 8.91239 | 13 | GroupA |
---|
11 | 21 | 1.56417 | 7.54655 | 14 | GroupB |
---|
12 | 22 | -1.39674 | 8.91657 | 5 | GroupB |
---|
13 | 23 | 1.1055 | 8.62701 | 8 | GroupA |
---|
14 | 24 | -1.10673 | 8.57414 | 9 | GroupA |
---|
15 | 25 | -3.21136 | 9.34588 | 5 | GroupA |
---|
16 | 26 | -0.0740145 | 11.0297 | 9 | GroupA |
---|
17 | 27 | 0.150976 | 14.8349 | 10 | GroupA |
---|
18 | 28 | 0.769278 | 9.38405 | 14 | GroupA |
---|
19 | 29 | -0.310153 | 12.4906 | 15 | GroupA |
---|
20 | 30 | -0.602707 | 9.9001 | 7 | GroupA |
---|
"
],
"text/latex": [
"\\begin{tabular}{r|ccccc}\n",
"\t& ID & var1 & var2 & var3 & Col1\\\\\n",
"\t\\hline\n",
"\t& Int64 & Float64 & Float64 & Int64 & String\\\\\n",
"\t\\hline\n",
"\t1 & 11 & -0.560501 & 9.83968 & 15 & GroupA \\\\\n",
"\t2 & 12 & -0.0192918 & 7.81756 & 14 & GroupB \\\\\n",
"\t3 & 13 & 0.128064 & 8.83897 & 11 & GroupB \\\\\n",
"\t4 & 14 & 1.85278 & 9.36913 & 10 & GroupB \\\\\n",
"\t5 & 15 & -0.827763 & 7.2771 & 15 & GroupB \\\\\n",
"\t6 & 16 & 0.110096 & 9.77109 & 15 & GroupA \\\\\n",
"\t7 & 17 & -0.251176 & 10.3317 & 6 & GroupB \\\\\n",
"\t8 & 18 & 0.369714 & 9.18312 & 5 & GroupA \\\\\n",
"\t9 & 19 & 0.0721164 & 7.98043 & 12 & GroupB \\\\\n",
"\t10 & 20 & -1.50343 & 8.91239 & 13 & GroupA \\\\\n",
"\t11 & 21 & 1.56417 & 7.54655 & 14 & GroupB \\\\\n",
"\t12 & 22 & -1.39674 & 8.91657 & 5 & GroupB \\\\\n",
"\t13 & 23 & 1.1055 & 8.62701 & 8 & GroupA \\\\\n",
"\t14 & 24 & -1.10673 & 8.57414 & 9 & GroupA \\\\\n",
"\t15 & 25 & -3.21136 & 9.34588 & 5 & GroupA \\\\\n",
"\t16 & 26 & -0.0740145 & 11.0297 & 9 & GroupA \\\\\n",
"\t17 & 27 & 0.150976 & 14.8349 & 10 & GroupA \\\\\n",
"\t18 & 28 & 0.769278 & 9.38405 & 14 & GroupA \\\\\n",
"\t19 & 29 & -0.310153 & 12.4906 & 15 & GroupA \\\\\n",
"\t20 & 30 & -0.602707 & 9.9001 & 7 & GroupA \\\\\n",
"\\end{tabular}\n"
],
"text/plain": [
"20×5 DataFrame\n",
"│ Row │ ID │ var1 │ var2 │ var3 │ Col1 │\n",
"│ │ \u001b[90mInt64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mInt64\u001b[39m │ \u001b[90mString\u001b[39m │\n",
"├─────┼───────┼────────────┼─────────┼───────┼────────┤\n",
"│ 1 │ 11 │ -0.560501 │ 9.83968 │ 15 │ GroupA │\n",
"│ 2 │ 12 │ -0.0192918 │ 7.81756 │ 14 │ GroupB │\n",
"│ 3 │ 13 │ 0.128064 │ 8.83897 │ 11 │ GroupB │\n",
"│ 4 │ 14 │ 1.85278 │ 9.36913 │ 10 │ GroupB │\n",
"│ 5 │ 15 │ -0.827763 │ 7.2771 │ 15 │ GroupB │\n",
"│ 6 │ 16 │ 0.110096 │ 9.77109 │ 15 │ GroupA │\n",
"│ 7 │ 17 │ -0.251176 │ 10.3317 │ 6 │ GroupB │\n",
"│ 8 │ 18 │ 0.369714 │ 9.18312 │ 5 │ GroupA │\n",
"│ 9 │ 19 │ 0.0721164 │ 7.98043 │ 12 │ GroupB │\n",
"│ 10 │ 20 │ -1.50343 │ 8.91239 │ 13 │ GroupA │\n",
"│ 11 │ 21 │ 1.56417 │ 7.54655 │ 14 │ GroupB │\n",
"│ 12 │ 22 │ -1.39674 │ 8.91657 │ 5 │ GroupB │\n",
"│ 13 │ 23 │ 1.1055 │ 8.62701 │ 8 │ GroupA │\n",
"│ 14 │ 24 │ -1.10673 │ 8.57414 │ 9 │ GroupA │\n",
"│ 15 │ 25 │ -3.21136 │ 9.34588 │ 5 │ GroupA │\n",
"│ 16 │ 26 │ -0.0740145 │ 11.0297 │ 9 │ GroupA │\n",
"│ 17 │ 27 │ 0.150976 │ 14.8349 │ 10 │ GroupA │\n",
"│ 18 │ 28 │ 0.769278 │ 9.38405 │ 14 │ GroupA │\n",
"│ 19 │ 29 │ -0.310153 │ 12.4906 │ 15 │ GroupA │\n",
"│ 20 │ 30 │ -0.602707 │ 9.9001 │ 7 │ GroupA │"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"insertcols!(df2,:Col1 => rand([\"GroupA\",\"GroupB\"],20))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"8. In a code cells below, create a DataFrame named df3 with columns named *id*, var4 and var5 such that id contains the values 11 through 30, var4 contains the values 21 through 40 and var5 contains the values 41 through 60."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
" | ID | var4 | var5 |
---|
| Int64 | Int64 | Int64 |
---|
20 rows × 3 columns
1 | 11 | 21 | 41 |
---|
2 | 12 | 22 | 42 |
---|
3 | 13 | 23 | 43 |
---|
4 | 14 | 24 | 44 |
---|
5 | 15 | 25 | 45 |
---|
6 | 16 | 26 | 46 |
---|
7 | 17 | 27 | 47 |
---|
8 | 18 | 28 | 48 |
---|
9 | 19 | 29 | 49 |
---|
10 | 20 | 30 | 50 |
---|
11 | 21 | 31 | 51 |
---|
12 | 22 | 32 | 52 |
---|
13 | 23 | 33 | 53 |
---|
14 | 24 | 34 | 54 |
---|
15 | 25 | 35 | 55 |
---|
16 | 26 | 36 | 56 |
---|
17 | 27 | 37 | 57 |
---|
18 | 28 | 38 | 58 |
---|
19 | 29 | 39 | 59 |
---|
20 | 30 | 40 | 60 |
---|
"
],
"text/latex": [
"\\begin{tabular}{r|ccc}\n",
"\t& ID & var4 & var5\\\\\n",
"\t\\hline\n",
"\t& Int64 & Int64 & Int64\\\\\n",
"\t\\hline\n",
"\t1 & 11 & 21 & 41 \\\\\n",
"\t2 & 12 & 22 & 42 \\\\\n",
"\t3 & 13 & 23 & 43 \\\\\n",
"\t4 & 14 & 24 & 44 \\\\\n",
"\t5 & 15 & 25 & 45 \\\\\n",
"\t6 & 16 & 26 & 46 \\\\\n",
"\t7 & 17 & 27 & 47 \\\\\n",
"\t8 & 18 & 28 & 48 \\\\\n",
"\t9 & 19 & 29 & 49 \\\\\n",
"\t10 & 20 & 30 & 50 \\\\\n",
"\t11 & 21 & 31 & 51 \\\\\n",
"\t12 & 22 & 32 & 52 \\\\\n",
"\t13 & 23 & 33 & 53 \\\\\n",
"\t14 & 24 & 34 & 54 \\\\\n",
"\t15 & 25 & 35 & 55 \\\\\n",
"\t16 & 26 & 36 & 56 \\\\\n",
"\t17 & 27 & 37 & 57 \\\\\n",
"\t18 & 28 & 38 & 58 \\\\\n",
"\t19 & 29 & 39 & 59 \\\\\n",
"\t20 & 30 & 40 & 60 \\\\\n",
"\\end{tabular}\n"
],
"text/plain": [
"20×3 DataFrame\n",
"│ Row │ ID │ var4 │ var5 │\n",
"│ │ \u001b[90mInt64\u001b[39m │ \u001b[90mInt64\u001b[39m │ \u001b[90mInt64\u001b[39m │\n",
"├─────┼───────┼───────┼───────┤\n",
"│ 1 │ 11 │ 21 │ 41 │\n",
"│ 2 │ 12 │ 22 │ 42 │\n",
"│ 3 │ 13 │ 23 │ 43 │\n",
"│ 4 │ 14 │ 24 │ 44 │\n",
"│ 5 │ 15 │ 25 │ 45 │\n",
"│ 6 │ 16 │ 26 │ 46 │\n",
"│ 7 │ 17 │ 27 │ 47 │\n",
"│ 8 │ 18 │ 28 │ 48 │\n",
"│ 9 │ 19 │ 29 │ 49 │\n",
"│ 10 │ 20 │ 30 │ 50 │\n",
"│ 11 │ 21 │ 31 │ 51 │\n",
"│ 12 │ 22 │ 32 │ 52 │\n",
"│ 13 │ 23 │ 33 │ 53 │\n",
"│ 14 │ 24 │ 34 │ 54 │\n",
"│ 15 │ 25 │ 35 │ 55 │\n",
"│ 16 │ 26 │ 36 │ 56 │\n",
"│ 17 │ 27 │ 37 │ 57 │\n",
"│ 18 │ 28 │ 38 │ 58 │\n",
"│ 19 │ 29 │ 39 │ 59 │\n",
"│ 20 │ 30 │ 40 │ 60 │"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df3 = DataFrame(ID = collect(11:30), var4 = collect(21:40), var5 = collect(41:60))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"9. In a code cells below, do a join of DataFrames df2 and df3 on the id column and save the result as a new dataframe called df4"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
" | ID | var1 | var2 | var3 | Col1 | var4 | var5 |
---|
| Int64 | Float64 | Float64 | Int64 | String | Int64 | Int64 |
---|
20 rows × 7 columns
1 | 11 | -0.560501 | 9.83968 | 15 | GroupA | 21 | 41 |
---|
2 | 12 | -0.0192918 | 7.81756 | 14 | GroupB | 22 | 42 |
---|
3 | 13 | 0.128064 | 8.83897 | 11 | GroupB | 23 | 43 |
---|
4 | 14 | 1.85278 | 9.36913 | 10 | GroupB | 24 | 44 |
---|
5 | 15 | -0.827763 | 7.2771 | 15 | GroupB | 25 | 45 |
---|
6 | 16 | 0.110096 | 9.77109 | 15 | GroupA | 26 | 46 |
---|
7 | 17 | -0.251176 | 10.3317 | 6 | GroupB | 27 | 47 |
---|
8 | 18 | 0.369714 | 9.18312 | 5 | GroupA | 28 | 48 |
---|
9 | 19 | 0.0721164 | 7.98043 | 12 | GroupB | 29 | 49 |
---|
10 | 20 | -1.50343 | 8.91239 | 13 | GroupA | 30 | 50 |
---|
11 | 21 | 1.56417 | 7.54655 | 14 | GroupB | 31 | 51 |
---|
12 | 22 | -1.39674 | 8.91657 | 5 | GroupB | 32 | 52 |
---|
13 | 23 | 1.1055 | 8.62701 | 8 | GroupA | 33 | 53 |
---|
14 | 24 | -1.10673 | 8.57414 | 9 | GroupA | 34 | 54 |
---|
15 | 25 | -3.21136 | 9.34588 | 5 | GroupA | 35 | 55 |
---|
16 | 26 | -0.0740145 | 11.0297 | 9 | GroupA | 36 | 56 |
---|
17 | 27 | 0.150976 | 14.8349 | 10 | GroupA | 37 | 57 |
---|
18 | 28 | 0.769278 | 9.38405 | 14 | GroupA | 38 | 58 |
---|
19 | 29 | -0.310153 | 12.4906 | 15 | GroupA | 39 | 59 |
---|
20 | 30 | -0.602707 | 9.9001 | 7 | GroupA | 40 | 60 |
---|
"
],
"text/latex": [
"\\begin{tabular}{r|ccccccc}\n",
"\t& ID & var1 & var2 & var3 & Col1 & var4 & var5\\\\\n",
"\t\\hline\n",
"\t& Int64 & Float64 & Float64 & Int64 & String & Int64 & Int64\\\\\n",
"\t\\hline\n",
"\t1 & 11 & -0.560501 & 9.83968 & 15 & GroupA & 21 & 41 \\\\\n",
"\t2 & 12 & -0.0192918 & 7.81756 & 14 & GroupB & 22 & 42 \\\\\n",
"\t3 & 13 & 0.128064 & 8.83897 & 11 & GroupB & 23 & 43 \\\\\n",
"\t4 & 14 & 1.85278 & 9.36913 & 10 & GroupB & 24 & 44 \\\\\n",
"\t5 & 15 & -0.827763 & 7.2771 & 15 & GroupB & 25 & 45 \\\\\n",
"\t6 & 16 & 0.110096 & 9.77109 & 15 & GroupA & 26 & 46 \\\\\n",
"\t7 & 17 & -0.251176 & 10.3317 & 6 & GroupB & 27 & 47 \\\\\n",
"\t8 & 18 & 0.369714 & 9.18312 & 5 & GroupA & 28 & 48 \\\\\n",
"\t9 & 19 & 0.0721164 & 7.98043 & 12 & GroupB & 29 & 49 \\\\\n",
"\t10 & 20 & -1.50343 & 8.91239 & 13 & GroupA & 30 & 50 \\\\\n",
"\t11 & 21 & 1.56417 & 7.54655 & 14 & GroupB & 31 & 51 \\\\\n",
"\t12 & 22 & -1.39674 & 8.91657 & 5 & GroupB & 32 & 52 \\\\\n",
"\t13 & 23 & 1.1055 & 8.62701 & 8 & GroupA & 33 & 53 \\\\\n",
"\t14 & 24 & -1.10673 & 8.57414 & 9 & GroupA & 34 & 54 \\\\\n",
"\t15 & 25 & -3.21136 & 9.34588 & 5 & GroupA & 35 & 55 \\\\\n",
"\t16 & 26 & -0.0740145 & 11.0297 & 9 & GroupA & 36 & 56 \\\\\n",
"\t17 & 27 & 0.150976 & 14.8349 & 10 & GroupA & 37 & 57 \\\\\n",
"\t18 & 28 & 0.769278 & 9.38405 & 14 & GroupA & 38 & 58 \\\\\n",
"\t19 & 29 & -0.310153 & 12.4906 & 15 & GroupA & 39 & 59 \\\\\n",
"\t20 & 30 & -0.602707 & 9.9001 & 7 & GroupA & 40 & 60 \\\\\n",
"\\end{tabular}\n"
],
"text/plain": [
"20×7 DataFrame\n",
"│ Row │ ID │ var1 │ var2 │ var3 │ Col1 │ var4 │ var5 │\n",
"│ │ \u001b[90mInt64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mInt64\u001b[39m │ \u001b[90mString\u001b[39m │ \u001b[90mInt64\u001b[39m │ \u001b[90mInt64\u001b[39m │\n",
"├─────┼───────┼────────────┼─────────┼───────┼────────┼───────┼───────┤\n",
"│ 1 │ 11 │ -0.560501 │ 9.83968 │ 15 │ GroupA │ 21 │ 41 │\n",
"│ 2 │ 12 │ -0.0192918 │ 7.81756 │ 14 │ GroupB │ 22 │ 42 │\n",
"│ 3 │ 13 │ 0.128064 │ 8.83897 │ 11 │ GroupB │ 23 │ 43 │\n",
"│ 4 │ 14 │ 1.85278 │ 9.36913 │ 10 │ GroupB │ 24 │ 44 │\n",
"│ 5 │ 15 │ -0.827763 │ 7.2771 │ 15 │ GroupB │ 25 │ 45 │\n",
"│ 6 │ 16 │ 0.110096 │ 9.77109 │ 15 │ GroupA │ 26 │ 46 │\n",
"│ 7 │ 17 │ -0.251176 │ 10.3317 │ 6 │ GroupB │ 27 │ 47 │\n",
"│ 8 │ 18 │ 0.369714 │ 9.18312 │ 5 │ GroupA │ 28 │ 48 │\n",
"│ 9 │ 19 │ 0.0721164 │ 7.98043 │ 12 │ GroupB │ 29 │ 49 │\n",
"│ 10 │ 20 │ -1.50343 │ 8.91239 │ 13 │ GroupA │ 30 │ 50 │\n",
"│ 11 │ 21 │ 1.56417 │ 7.54655 │ 14 │ GroupB │ 31 │ 51 │\n",
"│ 12 │ 22 │ -1.39674 │ 8.91657 │ 5 │ GroupB │ 32 │ 52 │\n",
"│ 13 │ 23 │ 1.1055 │ 8.62701 │ 8 │ GroupA │ 33 │ 53 │\n",
"│ 14 │ 24 │ -1.10673 │ 8.57414 │ 9 │ GroupA │ 34 │ 54 │\n",
"│ 15 │ 25 │ -3.21136 │ 9.34588 │ 5 │ GroupA │ 35 │ 55 │\n",
"│ 16 │ 26 │ -0.0740145 │ 11.0297 │ 9 │ GroupA │ 36 │ 56 │\n",
"│ 17 │ 27 │ 0.150976 │ 14.8349 │ 10 │ GroupA │ 37 │ 57 │\n",
"│ 18 │ 28 │ 0.769278 │ 9.38405 │ 14 │ GroupA │ 38 │ 58 │\n",
"│ 19 │ 29 │ -0.310153 │ 12.4906 │ 15 │ GroupA │ 39 │ 59 │\n",
"│ 20 │ 30 │ -0.602707 │ 9.9001 │ 7 │ GroupA │ 40 │ 60 │"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df4 = innerjoin(df2,df3,on = :ID)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Julia 1.2.0",
"language": "julia",
"name": "julia-1.2"
},
"language_info": {
"file_extension": ".jl",
"mimetype": "application/julia",
"name": "julia",
"version": "1.2.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}