{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Week 4 Peer Review" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. In a code cell below, import the required packages: Distributions, DataFrames, and Random (install these packages via the REPL if required)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Import the required packages\n", "using Distributions, DataFrames, Random" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Seed the random number generator\n", "Random.seed!(1234);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. In a code cell below, create a dataframe named df1, with 30 rows and 4 columns (variables). Call the first column ID. It should hold the values 1 through 30 (to make up 30 rows). Use three rand() function calls to generate three more columns named var1, var2, and var3. The second column (var1) should consist of 30 values from a standard normal distribution (mean of 0 and standard deviation of 1). The third column (var2) should consist of 30 random value from a normal distribution with a mean of 10 and a standard deviation of 2. The last column (var3) should contain 30 random values chosen from a range of integers between (and including) 5 and 15." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "

30 rows × 4 columns

IDvar1var2var3
Int64Float64Float64Int64
110.8673477.4406614
22-0.90174411.994613
33-0.49447910.604812
44-0.9029149.927119
550.86440110.283915
662.2118811.042514
770.53281311.793515
88-0.2717358.972949
990.5023348.47049
1010-0.5169846.917158
1111-0.5605019.8396815
1212-0.01929187.8175614
13130.1280648.8389711
14141.852789.3691310
1515-0.8277637.277115
16160.1100969.7710915
1717-0.25117610.33176
18180.3697149.183125
19190.07211647.9804312
2020-1.503438.9123913
21211.564177.5465514
2222-1.396748.916575
23231.10558.627018
2424-1.106738.574149
2525-3.211369.345885
2626-0.074014511.02979
27270.15097614.834910
28280.7692789.3840514
2929-0.31015312.490615
3030-0.6027079.90017
" ], "text/latex": [ "\\begin{tabular}{r|cccc}\n", "\t& ID & var1 & var2 & var3\\\\\n", "\t\\hline\n", "\t& Int64 & Float64 & Float64 & Int64\\\\\n", "\t\\hline\n", "\t1 & 1 & 0.867347 & 7.44066 & 14 \\\\\n", "\t2 & 2 & -0.901744 & 11.9946 & 13 \\\\\n", "\t3 & 3 & -0.494479 & 10.6048 & 12 \\\\\n", "\t4 & 4 & -0.902914 & 9.92711 & 9 \\\\\n", "\t5 & 5 & 0.864401 & 10.2839 & 15 \\\\\n", "\t6 & 6 & 2.21188 & 11.0425 & 14 \\\\\n", "\t7 & 7 & 0.532813 & 11.7935 & 15 \\\\\n", "\t8 & 8 & -0.271735 & 8.97294 & 9 \\\\\n", "\t9 & 9 & 0.502334 & 8.4704 & 9 \\\\\n", "\t10 & 10 & -0.516984 & 6.91715 & 8 \\\\\n", "\t11 & 11 & -0.560501 & 9.83968 & 15 \\\\\n", "\t12 & 12 & -0.0192918 & 7.81756 & 14 \\\\\n", "\t13 & 13 & 0.128064 & 8.83897 & 11 \\\\\n", "\t14 & 14 & 1.85278 & 9.36913 & 10 \\\\\n", "\t15 & 15 & -0.827763 & 7.2771 & 15 \\\\\n", "\t16 & 16 & 0.110096 & 9.77109 & 15 \\\\\n", "\t17 & 17 & -0.251176 & 10.3317 & 6 \\\\\n", "\t18 & 18 & 0.369714 & 9.18312 & 5 \\\\\n", "\t19 & 19 & 0.0721164 & 7.98043 & 12 \\\\\n", "\t20 & 20 & -1.50343 & 8.91239 & 13 \\\\\n", "\t21 & 21 & 1.56417 & 7.54655 & 14 \\\\\n", "\t22 & 22 & -1.39674 & 8.91657 & 5 \\\\\n", "\t23 & 23 & 1.1055 & 8.62701 & 8 \\\\\n", "\t24 & 24 & -1.10673 & 8.57414 & 9 \\\\\n", "\t25 & 25 & -3.21136 & 9.34588 & 5 \\\\\n", "\t26 & 26 & -0.0740145 & 11.0297 & 9 \\\\\n", "\t27 & 27 & 0.150976 & 14.8349 & 10 \\\\\n", "\t28 & 28 & 0.769278 & 9.38405 & 14 \\\\\n", "\t29 & 29 & -0.310153 & 12.4906 & 15 \\\\\n", "\t30 & 30 & -0.602707 & 9.9001 & 7 \\\\\n", "\\end{tabular}\n" ], "text/plain": [ "30×4 DataFrame\n", "│ Row │ ID │ var1 │ var2 │ var3 │\n", "│ │ \u001b[90mInt64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mInt64\u001b[39m │\n", "├─────┼───────┼────────────┼─────────┼───────┤\n", "│ 1 │ 1 │ 0.867347 │ 7.44066 │ 14 │\n", "│ 2 │ 2 │ -0.901744 │ 11.9946 │ 13 │\n", "│ 3 │ 3 │ -0.494479 │ 10.6048 │ 12 │\n", "│ 4 │ 4 │ -0.902914 │ 9.92711 │ 9 │\n", "│ 5 │ 5 │ 0.864401 │ 10.2839 │ 15 │\n", "│ 6 │ 6 │ 2.21188 │ 11.0425 │ 14 │\n", "│ 7 │ 7 │ 0.532813 │ 11.7935 │ 15 │\n", "│ 8 │ 8 │ -0.271735 │ 8.97294 │ 9 │\n", "│ 9 │ 9 │ 0.502334 │ 8.4704 │ 9 │\n", "│ 10 │ 10 │ -0.516984 │ 6.91715 │ 8 │\n", "⋮\n", "│ 20 │ 20 │ -1.50343 │ 8.91239 │ 13 │\n", "│ 21 │ 21 │ 1.56417 │ 7.54655 │ 14 │\n", "│ 22 │ 22 │ -1.39674 │ 8.91657 │ 5 │\n", "│ 23 │ 23 │ 1.1055 │ 8.62701 │ 8 │\n", "│ 24 │ 24 │ -1.10673 │ 8.57414 │ 9 │\n", "│ 25 │ 25 │ -3.21136 │ 9.34588 │ 5 │\n", "│ 26 │ 26 │ -0.0740145 │ 11.0297 │ 9 │\n", "│ 27 │ 27 │ 0.150976 │ 14.8349 │ 10 │\n", "│ 28 │ 28 │ 0.769278 │ 9.38405 │ 14 │\n", "│ 29 │ 29 │ -0.310153 │ 12.4906 │ 15 │\n", "│ 30 │ 30 │ -0.602707 │ 9.9001 │ 7 │" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = DataFrame(ID = 1:30, var1 = rand(Normal(0,1),30), var2 = rand(Normal(10,2),30), var3 = rand(5:15,30))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "4.In code cells below, write the code to calculate the mean and variance of each column in the dataframe. For example for the first variable this could be done using the println function and referring to each column (variable) by its symbol notation. Try to shorten the code with a for-loop, iterating over the variables names (in symbol format)." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "┌ Warning: `getindex(df::DataFrame, col_ind::ColumnIndex)` is deprecated, use `df[!, col_ind]` instead.\n", "│ caller = top-level scope at In[4]:3\n", "└ @ Core ./In[4]:3\n", "┌ Warning: `getindex(df::DataFrame, col_ind::ColumnIndex)` is deprecated, use `df[!, col_ind]` instead.\n", "│ caller = top-level scope at In[4]:4\n", "└ @ Core ./In[4]:4\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "The mean of var1 is: -0.061674963752526096, the variance is: 1.1790054448274625\n", "The mean of var2 is: 9.580613055613338, the variance is: 2.948790077536739\n", "The mean of var3 is: 11.0, the variance is: 11.724137931034482\n" ] } ], "source": [ "for s in [:var1,:var2,:var3] #names(df)\n", " colname = String(s)\n", " meancol = mean(df[s])\n", " variancecol = var(df[s])\n", " println(\"The mean of $colname is: $meancol, the variance is: $variancecol\")\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "5. In a code cells below, create a new DataFrame named df2 from the last 20 rows of the original DataFrame, df1." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "

20 rows × 4 columns

IDvar1var2var3
Int64Float64Float64Int64
111-0.5605019.8396815
212-0.01929187.8175614
3130.1280648.8389711
4141.852789.3691310
515-0.8277637.277115
6160.1100969.7710915
717-0.25117610.33176
8180.3697149.183125
9190.07211647.9804312
1020-1.503438.9123913
11211.564177.5465514
1222-1.396748.916575
13231.10558.627018
1424-1.106738.574149
1525-3.211369.345885
1626-0.074014511.02979
17270.15097614.834910
18280.7692789.3840514
1929-0.31015312.490615
2030-0.6027079.90017
" ], "text/latex": [ "\\begin{tabular}{r|cccc}\n", "\t& ID & var1 & var2 & var3\\\\\n", "\t\\hline\n", "\t& Int64 & Float64 & Float64 & Int64\\\\\n", "\t\\hline\n", "\t1 & 11 & -0.560501 & 9.83968 & 15 \\\\\n", "\t2 & 12 & -0.0192918 & 7.81756 & 14 \\\\\n", "\t3 & 13 & 0.128064 & 8.83897 & 11 \\\\\n", "\t4 & 14 & 1.85278 & 9.36913 & 10 \\\\\n", "\t5 & 15 & -0.827763 & 7.2771 & 15 \\\\\n", "\t6 & 16 & 0.110096 & 9.77109 & 15 \\\\\n", "\t7 & 17 & -0.251176 & 10.3317 & 6 \\\\\n", "\t8 & 18 & 0.369714 & 9.18312 & 5 \\\\\n", "\t9 & 19 & 0.0721164 & 7.98043 & 12 \\\\\n", "\t10 & 20 & -1.50343 & 8.91239 & 13 \\\\\n", "\t11 & 21 & 1.56417 & 7.54655 & 14 \\\\\n", "\t12 & 22 & -1.39674 & 8.91657 & 5 \\\\\n", "\t13 & 23 & 1.1055 & 8.62701 & 8 \\\\\n", "\t14 & 24 & -1.10673 & 8.57414 & 9 \\\\\n", "\t15 & 25 & -3.21136 & 9.34588 & 5 \\\\\n", "\t16 & 26 & -0.0740145 & 11.0297 & 9 \\\\\n", "\t17 & 27 & 0.150976 & 14.8349 & 10 \\\\\n", "\t18 & 28 & 0.769278 & 9.38405 & 14 \\\\\n", "\t19 & 29 & -0.310153 & 12.4906 & 15 \\\\\n", "\t20 & 30 & -0.602707 & 9.9001 & 7 \\\\\n", "\\end{tabular}\n" ], "text/plain": [ "20×4 DataFrame\n", "│ Row │ ID │ var1 │ var2 │ var3 │\n", "│ │ \u001b[90mInt64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mInt64\u001b[39m │\n", "├─────┼───────┼────────────┼─────────┼───────┤\n", "│ 1 │ 11 │ -0.560501 │ 9.83968 │ 15 │\n", "│ 2 │ 12 │ -0.0192918 │ 7.81756 │ 14 │\n", "│ 3 │ 13 │ 0.128064 │ 8.83897 │ 11 │\n", "│ 4 │ 14 │ 1.85278 │ 9.36913 │ 10 │\n", "│ 5 │ 15 │ -0.827763 │ 7.2771 │ 15 │\n", "│ 6 │ 16 │ 0.110096 │ 9.77109 │ 15 │\n", "│ 7 │ 17 │ -0.251176 │ 10.3317 │ 6 │\n", "│ 8 │ 18 │ 0.369714 │ 9.18312 │ 5 │\n", "│ 9 │ 19 │ 0.0721164 │ 7.98043 │ 12 │\n", "│ 10 │ 20 │ -1.50343 │ 8.91239 │ 13 │\n", "│ 11 │ 21 │ 1.56417 │ 7.54655 │ 14 │\n", "│ 12 │ 22 │ -1.39674 │ 8.91657 │ 5 │\n", "│ 13 │ 23 │ 1.1055 │ 8.62701 │ 8 │\n", "│ 14 │ 24 │ -1.10673 │ 8.57414 │ 9 │\n", "│ 15 │ 25 │ -3.21136 │ 9.34588 │ 5 │\n", "│ 16 │ 26 │ -0.0740145 │ 11.0297 │ 9 │\n", "│ 17 │ 27 │ 0.150976 │ 14.8349 │ 10 │\n", "│ 18 │ 28 │ 0.769278 │ 9.38405 │ 14 │\n", "│ 19 │ 29 │ -0.310153 │ 12.4906 │ 15 │\n", "│ 20 │ 30 │ -0.602707 │ 9.9001 │ 7 │" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2 = df[11:end,:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "6. In a code cells below, show the results of computing simple descriptive statistics on this new DataFrame using the describe() function." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "

4 rows × 8 columns

variablemeanminmedianmaxnuniquenmissingeltype
SymbolFloat64RealFloat64RealNothingNothingDataType
1ID20.51120.530Int64
2var1-0.187058-3.21136-0.04665321.85278Float64
3var29.498537.27719.264514.8349Float64
4var310.6510.515Int64
" ], "text/latex": [ "\\begin{tabular}{r|cccccccc}\n", "\t& variable & mean & min & median & max & nunique & nmissing & eltype\\\\\n", "\t\\hline\n", "\t& Symbol & Float64 & Real & Float64 & Real & Nothing & Nothing & DataType\\\\\n", "\t\\hline\n", "\t1 & ID & 20.5 & 11 & 20.5 & 30 & & & Int64 \\\\\n", "\t2 & var1 & -0.187058 & -3.21136 & -0.0466532 & 1.85278 & & & Float64 \\\\\n", "\t3 & var2 & 9.49853 & 7.2771 & 9.2645 & 14.8349 & & & Float64 \\\\\n", "\t4 & var3 & 10.6 & 5 & 10.5 & 15 & & & Int64 \\\\\n", "\\end{tabular}\n" ], "text/plain": [ "4×8 DataFrame. Omitted printing of 2 columns\n", "│ Row │ variable │ mean │ min │ median │ max │ nunique │\n", "│ │ \u001b[90mSymbol\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mReal\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mReal\u001b[39m │ \u001b[90mNothing\u001b[39m │\n", "├─────┼──────────┼───────────┼──────────┼────────────┼─────────┼─────────┤\n", "│ 1 │ ID │ 20.5 │ 11 │ 20.5 │ 30 │ │\n", "│ 2 │ var1 │ -0.187058 │ -3.21136 │ -0.0466532 │ 1.85278 │ │\n", "│ 3 │ var2 │ 9.49853 │ 7.2771 │ 9.2645 │ 14.8349 │ │\n", "│ 4 │ var3 │ 10.6 │ 5 │ 10.5 │ 15 │ │" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "describe(df2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "7. In a code cells below, add a column named cat1 to the df2 DataFrame consisting of a random selection of 20 values from the sample space GroupA and GroupB.m" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "

20 rows × 5 columns

IDvar1var2var3Col1
Int64Float64Float64Int64String
111-0.5605019.8396815GroupA
212-0.01929187.8175614GroupB
3130.1280648.8389711GroupB
4141.852789.3691310GroupB
515-0.8277637.277115GroupB
6160.1100969.7710915GroupA
717-0.25117610.33176GroupB
8180.3697149.183125GroupA
9190.07211647.9804312GroupB
1020-1.503438.9123913GroupA
11211.564177.5465514GroupB
1222-1.396748.916575GroupB
13231.10558.627018GroupA
1424-1.106738.574149GroupA
1525-3.211369.345885GroupA
1626-0.074014511.02979GroupA
17270.15097614.834910GroupA
18280.7692789.3840514GroupA
1929-0.31015312.490615GroupA
2030-0.6027079.90017GroupA
" ], "text/latex": [ "\\begin{tabular}{r|ccccc}\n", "\t& ID & var1 & var2 & var3 & Col1\\\\\n", "\t\\hline\n", "\t& Int64 & Float64 & Float64 & Int64 & String\\\\\n", "\t\\hline\n", "\t1 & 11 & -0.560501 & 9.83968 & 15 & GroupA \\\\\n", "\t2 & 12 & -0.0192918 & 7.81756 & 14 & GroupB \\\\\n", "\t3 & 13 & 0.128064 & 8.83897 & 11 & GroupB \\\\\n", "\t4 & 14 & 1.85278 & 9.36913 & 10 & GroupB \\\\\n", "\t5 & 15 & -0.827763 & 7.2771 & 15 & GroupB \\\\\n", "\t6 & 16 & 0.110096 & 9.77109 & 15 & GroupA \\\\\n", "\t7 & 17 & -0.251176 & 10.3317 & 6 & GroupB \\\\\n", "\t8 & 18 & 0.369714 & 9.18312 & 5 & GroupA \\\\\n", "\t9 & 19 & 0.0721164 & 7.98043 & 12 & GroupB \\\\\n", "\t10 & 20 & -1.50343 & 8.91239 & 13 & GroupA \\\\\n", "\t11 & 21 & 1.56417 & 7.54655 & 14 & GroupB \\\\\n", "\t12 & 22 & -1.39674 & 8.91657 & 5 & GroupB \\\\\n", "\t13 & 23 & 1.1055 & 8.62701 & 8 & GroupA \\\\\n", "\t14 & 24 & -1.10673 & 8.57414 & 9 & GroupA \\\\\n", "\t15 & 25 & -3.21136 & 9.34588 & 5 & GroupA \\\\\n", "\t16 & 26 & -0.0740145 & 11.0297 & 9 & GroupA \\\\\n", "\t17 & 27 & 0.150976 & 14.8349 & 10 & GroupA \\\\\n", "\t18 & 28 & 0.769278 & 9.38405 & 14 & GroupA \\\\\n", "\t19 & 29 & -0.310153 & 12.4906 & 15 & GroupA \\\\\n", "\t20 & 30 & -0.602707 & 9.9001 & 7 & GroupA \\\\\n", "\\end{tabular}\n" ], "text/plain": [ "20×5 DataFrame\n", "│ Row │ ID │ var1 │ var2 │ var3 │ Col1 │\n", "│ │ \u001b[90mInt64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mInt64\u001b[39m │ \u001b[90mString\u001b[39m │\n", "├─────┼───────┼────────────┼─────────┼───────┼────────┤\n", "│ 1 │ 11 │ -0.560501 │ 9.83968 │ 15 │ GroupA │\n", "│ 2 │ 12 │ -0.0192918 │ 7.81756 │ 14 │ GroupB │\n", "│ 3 │ 13 │ 0.128064 │ 8.83897 │ 11 │ GroupB │\n", "│ 4 │ 14 │ 1.85278 │ 9.36913 │ 10 │ GroupB │\n", "│ 5 │ 15 │ -0.827763 │ 7.2771 │ 15 │ GroupB │\n", "│ 6 │ 16 │ 0.110096 │ 9.77109 │ 15 │ GroupA │\n", "│ 7 │ 17 │ -0.251176 │ 10.3317 │ 6 │ GroupB │\n", "│ 8 │ 18 │ 0.369714 │ 9.18312 │ 5 │ GroupA │\n", "│ 9 │ 19 │ 0.0721164 │ 7.98043 │ 12 │ GroupB │\n", "│ 10 │ 20 │ -1.50343 │ 8.91239 │ 13 │ GroupA │\n", "│ 11 │ 21 │ 1.56417 │ 7.54655 │ 14 │ GroupB │\n", "│ 12 │ 22 │ -1.39674 │ 8.91657 │ 5 │ GroupB │\n", "│ 13 │ 23 │ 1.1055 │ 8.62701 │ 8 │ GroupA │\n", "│ 14 │ 24 │ -1.10673 │ 8.57414 │ 9 │ GroupA │\n", "│ 15 │ 25 │ -3.21136 │ 9.34588 │ 5 │ GroupA │\n", "│ 16 │ 26 │ -0.0740145 │ 11.0297 │ 9 │ GroupA │\n", "│ 17 │ 27 │ 0.150976 │ 14.8349 │ 10 │ GroupA │\n", "│ 18 │ 28 │ 0.769278 │ 9.38405 │ 14 │ GroupA │\n", "│ 19 │ 29 │ -0.310153 │ 12.4906 │ 15 │ GroupA │\n", "│ 20 │ 30 │ -0.602707 │ 9.9001 │ 7 │ GroupA │" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "insertcols!(df2,:Col1 => rand([\"GroupA\",\"GroupB\"],20))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "8. In a code cells below, create a DataFrame named df3 with columns named *id*, var4 and var5 such that id contains the values 11 through 30, var4 contains the values 21 through 40 and var5 contains the values 41 through 60." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "

20 rows × 3 columns

IDvar4var5
Int64Int64Int64
1112141
2122242
3132343
4142444
5152545
6162646
7172747
8182848
9192949
10203050
11213151
12223252
13233353
14243454
15253555
16263656
17273757
18283858
19293959
20304060
" ], "text/latex": [ "\\begin{tabular}{r|ccc}\n", "\t& ID & var4 & var5\\\\\n", "\t\\hline\n", "\t& Int64 & Int64 & Int64\\\\\n", "\t\\hline\n", "\t1 & 11 & 21 & 41 \\\\\n", "\t2 & 12 & 22 & 42 \\\\\n", "\t3 & 13 & 23 & 43 \\\\\n", "\t4 & 14 & 24 & 44 \\\\\n", "\t5 & 15 & 25 & 45 \\\\\n", "\t6 & 16 & 26 & 46 \\\\\n", "\t7 & 17 & 27 & 47 \\\\\n", "\t8 & 18 & 28 & 48 \\\\\n", "\t9 & 19 & 29 & 49 \\\\\n", "\t10 & 20 & 30 & 50 \\\\\n", "\t11 & 21 & 31 & 51 \\\\\n", "\t12 & 22 & 32 & 52 \\\\\n", "\t13 & 23 & 33 & 53 \\\\\n", "\t14 & 24 & 34 & 54 \\\\\n", "\t15 & 25 & 35 & 55 \\\\\n", "\t16 & 26 & 36 & 56 \\\\\n", "\t17 & 27 & 37 & 57 \\\\\n", "\t18 & 28 & 38 & 58 \\\\\n", "\t19 & 29 & 39 & 59 \\\\\n", "\t20 & 30 & 40 & 60 \\\\\n", "\\end{tabular}\n" ], "text/plain": [ "20×3 DataFrame\n", "│ Row │ ID │ var4 │ var5 │\n", "│ │ \u001b[90mInt64\u001b[39m │ \u001b[90mInt64\u001b[39m │ \u001b[90mInt64\u001b[39m │\n", "├─────┼───────┼───────┼───────┤\n", "│ 1 │ 11 │ 21 │ 41 │\n", "│ 2 │ 12 │ 22 │ 42 │\n", "│ 3 │ 13 │ 23 │ 43 │\n", "│ 4 │ 14 │ 24 │ 44 │\n", "│ 5 │ 15 │ 25 │ 45 │\n", "│ 6 │ 16 │ 26 │ 46 │\n", "│ 7 │ 17 │ 27 │ 47 │\n", "│ 8 │ 18 │ 28 │ 48 │\n", "│ 9 │ 19 │ 29 │ 49 │\n", "│ 10 │ 20 │ 30 │ 50 │\n", "│ 11 │ 21 │ 31 │ 51 │\n", "│ 12 │ 22 │ 32 │ 52 │\n", "│ 13 │ 23 │ 33 │ 53 │\n", "│ 14 │ 24 │ 34 │ 54 │\n", "│ 15 │ 25 │ 35 │ 55 │\n", "│ 16 │ 26 │ 36 │ 56 │\n", "│ 17 │ 27 │ 37 │ 57 │\n", "│ 18 │ 28 │ 38 │ 58 │\n", "│ 19 │ 29 │ 39 │ 59 │\n", "│ 20 │ 30 │ 40 │ 60 │" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df3 = DataFrame(ID = collect(11:30), var4 = collect(21:40), var5 = collect(41:60))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "9. In a code cells below, do a join of DataFrames df2 and df3 on the id column and save the result as a new dataframe called df4" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "

20 rows × 7 columns

IDvar1var2var3Col1var4var5
Int64Float64Float64Int64StringInt64Int64
111-0.5605019.8396815GroupA2141
212-0.01929187.8175614GroupB2242
3130.1280648.8389711GroupB2343
4141.852789.3691310GroupB2444
515-0.8277637.277115GroupB2545
6160.1100969.7710915GroupA2646
717-0.25117610.33176GroupB2747
8180.3697149.183125GroupA2848
9190.07211647.9804312GroupB2949
1020-1.503438.9123913GroupA3050
11211.564177.5465514GroupB3151
1222-1.396748.916575GroupB3252
13231.10558.627018GroupA3353
1424-1.106738.574149GroupA3454
1525-3.211369.345885GroupA3555
1626-0.074014511.02979GroupA3656
17270.15097614.834910GroupA3757
18280.7692789.3840514GroupA3858
1929-0.31015312.490615GroupA3959
2030-0.6027079.90017GroupA4060
" ], "text/latex": [ "\\begin{tabular}{r|ccccccc}\n", "\t& ID & var1 & var2 & var3 & Col1 & var4 & var5\\\\\n", "\t\\hline\n", "\t& Int64 & Float64 & Float64 & Int64 & String & Int64 & Int64\\\\\n", "\t\\hline\n", "\t1 & 11 & -0.560501 & 9.83968 & 15 & GroupA & 21 & 41 \\\\\n", "\t2 & 12 & -0.0192918 & 7.81756 & 14 & GroupB & 22 & 42 \\\\\n", "\t3 & 13 & 0.128064 & 8.83897 & 11 & GroupB & 23 & 43 \\\\\n", "\t4 & 14 & 1.85278 & 9.36913 & 10 & GroupB & 24 & 44 \\\\\n", "\t5 & 15 & -0.827763 & 7.2771 & 15 & GroupB & 25 & 45 \\\\\n", "\t6 & 16 & 0.110096 & 9.77109 & 15 & GroupA & 26 & 46 \\\\\n", "\t7 & 17 & -0.251176 & 10.3317 & 6 & GroupB & 27 & 47 \\\\\n", "\t8 & 18 & 0.369714 & 9.18312 & 5 & GroupA & 28 & 48 \\\\\n", "\t9 & 19 & 0.0721164 & 7.98043 & 12 & GroupB & 29 & 49 \\\\\n", "\t10 & 20 & -1.50343 & 8.91239 & 13 & GroupA & 30 & 50 \\\\\n", "\t11 & 21 & 1.56417 & 7.54655 & 14 & GroupB & 31 & 51 \\\\\n", "\t12 & 22 & -1.39674 & 8.91657 & 5 & GroupB & 32 & 52 \\\\\n", "\t13 & 23 & 1.1055 & 8.62701 & 8 & GroupA & 33 & 53 \\\\\n", "\t14 & 24 & -1.10673 & 8.57414 & 9 & GroupA & 34 & 54 \\\\\n", "\t15 & 25 & -3.21136 & 9.34588 & 5 & GroupA & 35 & 55 \\\\\n", "\t16 & 26 & -0.0740145 & 11.0297 & 9 & GroupA & 36 & 56 \\\\\n", "\t17 & 27 & 0.150976 & 14.8349 & 10 & GroupA & 37 & 57 \\\\\n", "\t18 & 28 & 0.769278 & 9.38405 & 14 & GroupA & 38 & 58 \\\\\n", "\t19 & 29 & -0.310153 & 12.4906 & 15 & GroupA & 39 & 59 \\\\\n", "\t20 & 30 & -0.602707 & 9.9001 & 7 & GroupA & 40 & 60 \\\\\n", "\\end{tabular}\n" ], "text/plain": [ "20×7 DataFrame\n", "│ Row │ ID │ var1 │ var2 │ var3 │ Col1 │ var4 │ var5 │\n", "│ │ \u001b[90mInt64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mInt64\u001b[39m │ \u001b[90mString\u001b[39m │ \u001b[90mInt64\u001b[39m │ \u001b[90mInt64\u001b[39m │\n", "├─────┼───────┼────────────┼─────────┼───────┼────────┼───────┼───────┤\n", "│ 1 │ 11 │ -0.560501 │ 9.83968 │ 15 │ GroupA │ 21 │ 41 │\n", "│ 2 │ 12 │ -0.0192918 │ 7.81756 │ 14 │ GroupB │ 22 │ 42 │\n", "│ 3 │ 13 │ 0.128064 │ 8.83897 │ 11 │ GroupB │ 23 │ 43 │\n", "│ 4 │ 14 │ 1.85278 │ 9.36913 │ 10 │ GroupB │ 24 │ 44 │\n", "│ 5 │ 15 │ -0.827763 │ 7.2771 │ 15 │ GroupB │ 25 │ 45 │\n", "│ 6 │ 16 │ 0.110096 │ 9.77109 │ 15 │ GroupA │ 26 │ 46 │\n", "│ 7 │ 17 │ -0.251176 │ 10.3317 │ 6 │ GroupB │ 27 │ 47 │\n", "│ 8 │ 18 │ 0.369714 │ 9.18312 │ 5 │ GroupA │ 28 │ 48 │\n", "│ 9 │ 19 │ 0.0721164 │ 7.98043 │ 12 │ GroupB │ 29 │ 49 │\n", "│ 10 │ 20 │ -1.50343 │ 8.91239 │ 13 │ GroupA │ 30 │ 50 │\n", "│ 11 │ 21 │ 1.56417 │ 7.54655 │ 14 │ GroupB │ 31 │ 51 │\n", "│ 12 │ 22 │ -1.39674 │ 8.91657 │ 5 │ GroupB │ 32 │ 52 │\n", "│ 13 │ 23 │ 1.1055 │ 8.62701 │ 8 │ GroupA │ 33 │ 53 │\n", "│ 14 │ 24 │ -1.10673 │ 8.57414 │ 9 │ GroupA │ 34 │ 54 │\n", "│ 15 │ 25 │ -3.21136 │ 9.34588 │ 5 │ GroupA │ 35 │ 55 │\n", "│ 16 │ 26 │ -0.0740145 │ 11.0297 │ 9 │ GroupA │ 36 │ 56 │\n", "│ 17 │ 27 │ 0.150976 │ 14.8349 │ 10 │ GroupA │ 37 │ 57 │\n", "│ 18 │ 28 │ 0.769278 │ 9.38405 │ 14 │ GroupA │ 38 │ 58 │\n", "│ 19 │ 29 │ -0.310153 │ 12.4906 │ 15 │ GroupA │ 39 │ 59 │\n", "│ 20 │ 30 │ -0.602707 │ 9.9001 │ 7 │ GroupA │ 40 │ 60 │" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df4 = innerjoin(df2,df3,on = :ID)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Julia 1.2.0", "language": "julia", "name": "julia-1.2" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.2.0" } }, "nbformat": 4, "nbformat_minor": 4 }