[Week 06] Lectures
<64 data cleaning p1>
<66 tidyr part1>
tidy data? is ...
Gather columns into key-value pairs
library(tidyr)
wide_df <- data.frame(col = c('X', 'Y'), A=c(1, 4), B= c(2, 5), c=(3, 6))
# Look at wide_df
wide_df
# Gather the columns of wide_df
# '-col'는 'wide_df$col을 제외하고 모아라'라는 뜻
gather(wide_df, my_key, my_val, -col)
# Spread는 gather의 반대 역할
long_df <- gather(wide_df, my_key, my_val, -col)
# Look at long_df
long_df
# Spread the key-value pairs of long_df
spread(long_df, my_key, my_val)
# SEPARATE
treamtments <- data.frame(patient = rep(c('X', 'Y'), 3), treatment = rep(c('A', 'B'), each = 3), year_mo = rep(c('2010-10', '2012-08', '2014-12'), each = 2), response = c(1, 4, 2, 5, 3, 6))
# View the treatments data
treatments
# Separate year_mo into two columns
separate(treatments, year_mo, c("year", "month"))
# UNITE
treatments2 <- separate(treatments, year_mo, c("year", "month"))
# View treatments2 data
treatments2
# Unite year and month to form year_mo column
unite(treatments2, year_mo, year, month)
Summary of key tidyr functions
gather() - Gather columns into key-value paris
spread() - Spread key-value paris into columns
separate() - Separate one column into multiple
unite() - Unite multiple columns into one
Type Check-Up and Conversion
class("hello")
[1] "character"
class(3.844)
[1] "numeric"
class(77L)
[1] "integer"
class(factor("yes"))
[1] "factor"
class(TRUE)
[1] "logical"
as.character(2016)
[1] "2016"
as.numeric(TRUE)
[1] 1
as.integer(99)
[1] 99
as.factor("something")
[1] something
Levels : something
as.logical(0)
[1] FALSE
library(lubridate)
temp_df <- data.frame(date_string = c('2010-01-20', '2011-03-21', '2010-07-11'), data_string2 = c('01/20/2010', '03/21/2011', '07/11/2010'), temperature = c(12, 20, 27), stringAsFactors = F)
temp_df$date <- as.Date(temp_df$date_string)
plot(x = temp_df$date, y = temp_df$temperature)
temp_df$date <- as.Date(temp_df$date_string2)
temp_df$date <- mdy(temp_df$date_string2)
plot(temp_df$date, temp_df$temperature)
lubridate
Package to convert strings into dates
# Load the Lubridate package
library(lubridate)
# Experiment with basic lubridate functions
ymd("2015-08-25")
[1] "2015-08-25"
ymd("2015 August 25")
[1] "2015-08-25")
mdy("August 25, 2015")
[1] "2015-08-05"
hms("13:33:09")
[1] "13H 33M 9S"
ymd_hms("2015/08/25 13.33.09")
[1] "2015-08-25 13:33:09 UTC"
[69 stringr]
- Package for string manipulation
- Key functions
- str_trim() - Trim leading and trailing white space
- str_pad() - Pad with additional characters
- str_detect() - Detect a pattern
- str_replace() - Find and replace a pattern
library(stringr)
str_trim(" this is a test ")
[1] "this is a test"
# 자리수 맞춰주기 7자리로 맞춰주기 위해서 0으로 빈자리 채우는데 왼쪽부터 채워라
# Pad string with zeros
str_pad("24493", width = 7, side = "left", pad = "0")
# Create character vector of names
friends <- c("Sarah", "Tom", "Alice")
# search for string in vector
str_detect(friends, "Alice")
[1] FALSE FALSE TRUE
# Replace string in vector
str_replace(friends, "Alice", "David")
[1] "Sarah" "Tom" "David"
[610 Missing Values]
Missing values
- Data are missing for many reasons
- Sometimes associated with variable/outcome of interest
- In R, represented as NA
- May appear in other forms
Special values
- Inf - "Infinite value"
- NaN - "Not a Number"
'공부 > R Programming' 카테고리의 다른 글
Data Science Week 09 (0) | 2021.05.03 |
---|---|
pums.sample R (0) | 2021.04.17 |
[Week 04] Lectures (0) | 2021.03.28 |
[Week 03] Lectures (0) | 2021.03.28 |
Data Science Week 03 - 02 (0) | 2021.03.19 |
댓글