Populate column which a range of possible values in other columns: in R -
for example dataframe:
df <- structure(list(region = structure(1:8, .label = c("a", "b", "c", "d", "e", "f", "g", "h"), class = "factor"), y.2012 = c(5.5, na, 4.7, 3.6, na, na, 4.6, na), y.2013 = c(5.7, na, na, 3.8, na, 6.2, na, na), y.2014 = c(na, 6.3, na, 4.1, 5.1, na, na, na )), .names = c("region", "y.2012", "y.2013", "y.2014"), class = "data.frame", row.names = c(na, -8l))
i want add column (called 'variable') populated recent year of data available. example, 2014 preferable, followed 2013, , 2012. if no data available, na can left.
i want resulting dataframe like:
region y.2012 y.2013 y.2014 variable 1 5.5 5.7 na 5.7 2 b na na 6.3 6.3 3 c 4.7 na na 4.7 4 d 3.6 3.8 4.1 4.1 5 e na na 5.1 5.1 6 f na 6.2 na 6.2 7 g 4.6 na na 4.6 8 h na na na na
what best way tackle problem? advice appreciated.
we can use apply
margin=1
loop on rows , last non-na value (as columns ordered smallest largest years)
df$variable <- apply(df[-1], 1, function(x) { i1 <- tail(x[!is.na(x)],1) if(length(i1)>0) i1 else na}) df$variable #[1] 5.7 6.3 4.7 4.1 5.1 6.2 4.6 na
or option max.col
, rowsums
df[-1][cbind(1:nrow(df),max.col(!is.na(df[-1]), "last")* (na^!rowsums(!is.na(df[-1]))))] #[1] 5.7 6.3 4.7 4.1 5.1 6.2 4.6 na
Comments
Post a Comment