Data Science Week 03

Data Science Week 03 - 02

## The apply() family

- apply()

- lapply()

- sapply()

- vapply()

- mapply()

- rapply()

- tapply()

-----------

apply()

apply(X, MARGIN, FUN, ... )

- X is matrix or dataframe

- MARGIN is a variable defining how the function is applied :

-- MARGIN = 1, it applies over rows

-- MARGIN = 2, it applies over columns

- FUN is the function that you want to apply to the data

example

my_mat <- matrix(1:6, nrow=3, byrow = T)

## 1 2

## 3 4

## 5 6

# my_mat의 모든 행의 평균을 벡터로 반환 (MARGIN=1)

apply(my_mat, 1, mean)

[1] 1.5 3.5 5.5

# my_mat의 모든 열의 평균을 벡터로 반환 (MARGIN=2)

apply(my_mat, 2, mean)

[1] 3 4

# runif(12) means 12개의 랜덤 소수를 만들어라

set.seed(2018)

myMat <- matrix(runif(12), ncol=4)

myMat

apply(myMat, 1, mean)

apply(myMat, 2, mean)

-------------------

IRIS DATA

head(iris)

# IRIS 데이터프레임의 1열부터 4열까지 열의 평균을 구하라

apply(iris[, 1:4], 2, mean)

# IRIS 데이터프레임의 모든 행의 평균을 구하라

apply(iris[, 1:4], 1, mean)

colMeans(iris[, 1:4])

---------------------

lapply()

- It applies function to dataframes, lists or vectors

- It gives you back a list

myList <- list(num = 3.14, chr = "char", logi = TRUE)

# 리스트의 각 엘리먼트의 타입을 리스트로 반환

lapply(myList, typeof)

## $num

## [1] "double"

## $chr

## [1] "character"

## $logi

## [1] "logical"

------------------

myList2 <- list(vec = 1:5, mat = matrix(runif(12), ncol = 4, df = iris)

# length 내장함수는 리스트의 길이를 integer로 리턴

# 즉, myList2의 엘리먼트의 길이를 정수 타입으로 변환함

result <- lapply(myList2, length)

result

## $vec

## [1] 5 # number of elements

## $mat

## [1] 12 # number of elements

## $df

## [1] 4 # number of columns

unlist(result) # list -> vector

------------------

lapply() examples

lapply(c(1, 4, 9, 16), sqrt)

[[1]]

[1] 1

[[2]]

[1] 2

....

------------------

sapply()

- It applies function to dataframes, lists or vectors

- It gives you back a vector or matrix

sapply(iris[, 1:4], mean)

sapply(iris[, 1:4], is.numeric)

sapply(c(1, 3, 5, 7, 9), function(x) {x**2})

## [1] 1 9 25 49 81

## INSTANCE FUNCTION

my_vec <- c(1, 3, 7)

sapply(my_vec, function(x) {x ** 3})

[1] 1 27 343

myMat <- matrix(1:12, ncol = 4)

sapply(myMat, function(x) {x/2})

sapply(pools, typeof)

-------------------

sapply() examples

# IRIS 데이터프레임 1열부터 4열까지 3을 초과하면 TRUE, 아니라면 FALSE를 x 변수에 대입.

x <- sapply(iris[, 1:4], function(x) { x > 3})

head(x)

# 열마다 TRUE의 개수의 합을 구한다.

colSums(x)

-------------------

tapply()

- tapply(X, GRP_VAR, FUN, ...)

- apply FUN to X after grouping with GRP_VAR

- X를 GRP_VAR로 그룹화해서, FUN 함수에 대입

- It returns as vector or dataframe.

# iris$Sepal.Length를 Species로 그룹화하여서, 그룹 별 평균을 구한다.

----------------------------------

aggregate()

- aggregate(var1 ~ var2, data=X, FUN = func, ...)

- var1(대상이 되는 열), var2(그룹화 할 때 그룹의 기준이 되는 열), dataframe, 함수)

- mpg를 cyl로 그룹화하여, mtcars 데이터프레임의, 평균함수

- Sepal.Length를 Species를 기준으로 그룹화하여, iris 데이터프레임의, 평균함수

'공부 > R Programming' 카테고리의 다른 글

[Week 04] Lectures (0)	2021.03.28
[Week 03] Lectures (0)	2021.03.28
Data Science Week 03 - 01 (0)	2021.03.18
Week 01: Basics of R (0)	2021.03.02
R기초; R 기초 - ggplot2 그래픽6 - 그래프 배치 및 저장 (0)	2021.01.24

혼밥맨

Data Science Week 03 - 02