[Week 06] Lectures

<64 data cleaning p1>

<66 tidyr part1>
tidy data? is ...

Gather columns into key-value pairs
library(tidyr)
wide_df <- data.frame(col = c('X', 'Y'), A=c(1, 4), B= c(2, 5), c=(3, 6))

# Look at wide_df
wide_df

# Gather the columns of wide_df

# '-col'는 'wide_df$col을 제외하고 모아라'라는 뜻
gather(wide_df, my_key, my_val, -col)

# Spread는 gather의 반대 역할
long_df <- gather(wide_df, my_key, my_val, -col)

# Look at long_df
long_df

# Spread the key-value pairs of long_df
spread(long_df, my_key, my_val)

# SEPARATE

treamtments <- data.frame(patient = rep(c('X', 'Y'), 3), treatment = rep(c('A', 'B'), each = 3), year_mo = rep(c('2010-10', '2012-08', '2014-12'), each = 2), response = c(1, 4, 2, 5, 3, 6))

# View the treatments data
treatments

# Separate year_mo into two columns
separate(treatments, year_mo, c("year", "month"))

# UNITE
treatments2 <- separate(treatments, year_mo, c("year", "month"))

# View treatments2 data
treatments2

# Unite year and month to form year_mo column

unite(treatments2, year_mo, year, month)

Summary of key tidyr functions
gather() - Gather columns into key-value paris
spread() - Spread key-value paris into columns
separate() - Separate one column into multiple
unite() - Unite multiple columns into one

Type Check-Up and Conversion
class("hello")
[1] "character"
class(3.844)
[1] "numeric"
class(77L)
[1] "integer"
class(factor("yes"))
[1] "factor"
class(TRUE)
[1] "logical"
as.character(2016)
[1] "2016"
as.numeric(TRUE)
[1] 1
as.integer(99)
[1] 99
as.factor("something")
[1] something
Levels : something
as.logical(0)
[1] FALSE

library(lubridate)
temp_df <- data.frame(date_string = c('2010-01-20', '2011-03-21', '2010-07-11'), data_string2 = c('01/20/2010', '03/21/2011', '07/11/2010'), temperature = c(12, 20, 27), stringAsFactors = F)

temp_df$date <- as.Date(temp_df$date_string)
plot(x = temp_df$date, y = temp_df$temperature)

temp_df$date <- as.Date(temp_df$date_string2)
temp_df$date <- mdy(temp_df$date_string2)
plot(temp_df$date, temp_df$temperature)

lubridate
Package to convert strings into dates

# Load the Lubridate package
library(lubridate)

# Experiment with basic lubridate functions
ymd("2015-08-25")
[1] "2015-08-25"
ymd("2015 August 25")
[1] "2015-08-25")
mdy("August 25, 2015")
[1] "2015-08-05"
hms("13:33:09")
[1] "13H 33M 9S"
ymd_hms("2015/08/25 13.33.09")
[1] "2015-08-25 13:33:09 UTC"

[69 stringr]
- Package for string manipulation
- Key functions
  - str_trim() - Trim leading and trailing white space
  - str_pad() - Pad with additional characters
  - str_detect() - Detect a pattern
  - str_replace() - Find and replace a pattern

library(stringr)

str_trim(" this is a test ")
[1] "this is a test"

# 자리수 맞춰주기 7자리로 맞춰주기 위해서 0으로 빈자리 채우는데 왼쪽부터 채워라
# Pad string with zeros
str_pad("24493", width = 7, side = "left", pad = "0")

# Create character vector of names
friends <- c("Sarah", "Tom", "Alice")

# search for string in vector
str_detect(friends, "Alice")
[1] FALSE FALSE TRUE

# Replace string in vector
str_replace(friends, "Alice", "David")
[1] "Sarah" "Tom" "David"

[610 Missing Values]
Missing values
- Data are missing for many reasons
- Sometimes associated with variable/outcome of interest
- In R, represented as NA
- May appear in other forms

Special values
- Inf - "Infinite value"
- NaN - "Not a Number"

'공부 > R Programming' 카테고리의 다른 글

Data Science Week 09 (0)	2021.05.03
pums.sample R (0)	2021.04.17
[Week 04] Lectures (0)	2021.03.28
[Week 03] Lectures (0)	2021.03.28
Data Science Week 03 - 02 (0)	2021.03.19

혼밥맨

[Week 06] Lectures

[Week 06] Lectures

'공부 > R Programming' 카테고리의 다른 글

댓글

티스토리툴바

[Week 06] Lectures

[Week 06] Lectures

'공부 > R Programming' 카테고리의 다른 글

관련글

댓글

티스토리툴바