An Introduction to R Language

 

 

啟動 R , 可以看到下面的視窗環境:

 

 

 

 

如何建立向量

 
            c (x, y, z, ...)

    例: 

              c(1,2,3)

              c (1,7:9)
 
              c (1:5, 10.5, "next")
 

 

向量的運算:

 

      t (x) – transpose of x 轉置

 

 

 

 

 

讀取 Excel CSV 檔案 read.csv :

 

    如果檔案 mlb.csv 內容是

 

Teams,W,ERA,R,OBP,SLG,AVG

Baltimore Orioles,66,4.59,613,0.316,0.386,0.259

Boston Red Sox,89,4.2,818,0.339,0.451,0.268

……

Toronto Blue Jays,85,4.22,755,0.312,0.454,0.248

 

 

    讀入資料的指令並存成一個 data frame:

 

 

> team_stats = read.csv ("d:\\hcwang\\R\\mlb.csv", head=TRUE, sep=",")

 

 

    顯示data frame變數 team_stats的內容 (data frame R 語言的一種資料型態)

 

> team_stats

 

                Teams         W   ERA   R   OBP   SLG   AVG

1      Baltimore Orioles  66  4.59  613  0.316  0.386  0.259

2      Boston Red Sox   89  4.20  818  0.339  0.451  0.268

……

14    Toronto Blue Jays 85  4.22  755  0.312  0.454  0.248

 

> names ( team_stats )

 

[1] "Teams" "W"     "ERA"   "R"     "OBP"   "SLG"   "AVG" 

 

    取得其中一個欄位(如ERA)的資料:

 

> team_stats$ERA

 

     [1] 4.59 4.20 4.09 4.30 4.30 4.97 4.04 3.95 4.06 3.56 3.93 3.78 3.93 4.22

 

 

 

    可以使用 attributes 得知data frame變數 team_stats的內容敘述

 

> attributes (team_stats)

 

$names

[1] "Teams" "W"     "ERA"   "R"     "OBP"   "SLG"   "AVG" 

 

$class

[1] "data.frame"

 

$row.names

 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14

 

 


將資料寫入檔案中

 

write (x, file = "filename", ncolumns = m, append = FALSE, sep = " ")

 

x - the data to be written out. If x is a two-dimensional matrix you need to transpose it to
get the columns in
file the same as those in the internal representation.

 

sep - a string used to separate columns. Using sep = "\t" gives tab delimited output;
default is
" "

 

       Example:

 

      > write ("\n  This is a test. \n\n", file="output.prn")    - \n 是跳到新的一列

 

      > x = matrix (1:10, ncol = 5)

 

      > x

        

                    [,1]   [,2]  [,3]  [,4]  [,5]

      [1,]    1      3     5      7     9

      [2,]    2      4     6      8    10

 

 

       使用 append=TRUE, 可寫入後續的資料.

 

            > write ( t (x), file="output.prn", ncolumns=5, sep="  ", append=TRUE)

 

 

       備註: 如果沒使用 t (x),

 

     > write ( x, file="output.prn", ncolumns=5, sep="  ", append=TRUE)

 

       則會輸出

 

                      1  2  3  4  5

                             6  7  8  9  10

 

 

將執行的結果寫入檔案的指令:

 
sink (file = NULL, append = FALSE, type = c("output", "message"), split=FALSE)
 
 
 
Example: 
 
> sink(file = "sink2.txt",type = c("output", "message"), split=TRUE)
> i=1:5
> i
 
 
> sink()     ## 結束
 
如果要刪除該檔案, 可用 unlink 指令
 
> unlink("sink2.txt") 
 


 

Probability  指令:

 

 

Distribution

Normal

t-

Binomial

Probability density function  機率密度函數

dnorm

dt

dbinom

dchisq

Cumulative probability density function

pnorm

pt

pbinom

pchisq

Inverse cumulative probability density function

qnorm

qt

qbinom

qchisq

Random numbers

rnorm

rt

rbinom

Rchisq

 

Example:

 

> dnorm (-5:5, mean = 0, sd = 1 )

 

     [1] 1.486720e-06 1.338302e-04 4.431848e-03 5.399097e-02 2.419707e-01 3.989423e-01

 [7] 2.419707e-01 5.399097e-02 4.431848e-03 1.338302e-04 1.486720e-06

 

 

 

 

 

畫圖指令

 

 

  Example:

 

        > z = rchisq (1000, df = 10)

 

> hist (z)

 

   

 

   > x = seq (-4, 4, by = 0.2 )

 

> y = dt (x, df = 10 )

 

> plot (x, y, type = "o")

 

> title ( main = "t-distribution", sub = "for example" )

 
 
         type = "p" – points only
"l" – line only
"b" – both
"o" – for both over-plotted
 
> x=1:10

> boxplot (x, main="Test")
 
   
   計算信賴區間 Confidence Interval From a Normal Distribution
 
    > x = 5

    > sigma = 2

    > n = 20

 

    > error = qnorm(0.975) * sigma / sqrt(n)  ## 1-0.05/2 = 0.975 

    > left  = x - error

    > right = x + error

 

    > left

 

       [1] 4.123477

 

    > right

 

       [1] 5.876523

 

 

      P 值的計算

 

    > xbar = 7

 

    > z = ( x – xbar )/ ( sigma / sqrt(n))

 

    > z

    [1] -4.472136

 

    > 2 * pnorm (-abs(z))         ## Large samples

 

     [1] 7.744216e-06

 

    > 2 * pt(-abs(z),df = n-1)   ## Small samples

 

    [1] 0.0002611934

 

    因為 p 值小於 0.05, 所以拒絕虛無假設 .

 

 

 


 

控制敘述指令:

 

Conditionals Execution

 

 if ( condition_expression )  
{   if_expression_1;

if_expression_2;

 }

else
{    else_expression_1;

 }

 

 

迴圈控制

 

for (var in seq) 

{

     expression_1;

     expression_2;

    

 }

 
Example: 
 
> s = 0
 
> for (i in 1:length (x)) {
+     s = s + x[i]
+ }
 
> s
[1] 55
 
 
 
while (condition) { expr … }
 
可以使用 break 離開(跳出)迴圈. 而使用 next, 則可以進行迴圈的下一個計算

 


 

ifelse 邏輯運算

 

 

Example:

 

> x = 1:5

> x > 3

 [1]  FALSE   FALSE   FALSE  TRUE  TRUE 

 

> ifelse ( x > 3, 1, 0 )

 

   [1]  0  0  0  1  1

 

 

函式 (函數) 敘述

 

 name = function(arg_1, arg_2, ...) {

expression 1

expression 2

……

   }

 

Examples:

 

> hello = function(x,y)   {   x^2+y^2  }

 

 

> hello(2,3)

[1] 13

 

 

> hello ( 1:3, 3:5)

[1]  10  20  34