R: convert XML data to data frame -
for homework assignment attempting convert xml file data frame in r. have tried many different things, , have searched ideas on internet have been unsuccessful. here code far:
library(xml) url <- 'http://www.ggobi.org/book/data/olive.xml' doc <- xmlparse(myurl) root <- xmlroot(doc) dataframe <- xmlsapply(xmltop, function(x) xmlsapply(x, xmlvalue)) data.frame(t(dataframe),row.names=null)
the output giant vector of numbers. attempting organize data data frame, not know how adjust code obtain that.
it may not verbose xml
package xml2
doesn't have memory leaks , laser-focused on data extraction. use trimws
really recent addition r core.
library(xml2) pg <- read_xml("http://www.ggobi.org/book/data/olive.xml") # <record>s recs <- xml_find_all(pg, "//record") # extract , clean columns vals <- trimws(xml_text(recs)) # extract , clean (if needed) area names labs <- trimws(xml_attr(recs, "label")) # mine column names 2 variable descriptions # xpath construct lets grab either <categ…> or <real…> tags # , grabs 'name' attribute of them cols <- xml_attr(xml_find_all(pg, "//data/variables/*[self::categoricalvariable or self::realvariable]"), "name") # converts each set of <record> columns data frame # after first converting each row numeric , assigning # names each column (making easier matrix data frame conv) dat <- do.call(rbind, lapply(strsplit(vals, "\ +"), function(x) { data.frame(rbind(setnames(as.numeric(x),cols))) })) # assign area name column data frame dat$area_name <- labs head(dat) ## region area palmitic palmitoleic stearic oleic linoleic linolenic ## 1 1 1 1075 75 226 7823 672 na ## 2 1 1 1088 73 224 7709 781 31 ## 3 1 1 911 54 246 8113 549 31 ## 4 1 1 966 57 240 7952 619 50 ## 5 1 1 1051 67 259 7771 672 50 ## 6 1 1 911 49 268 7924 678 51 ## arachidic eicosenoic area_name ## 1 60 29 north-apulia ## 2 61 29 north-apulia ## 3 63 29 north-apulia ## 4 78 35 north-apulia ## 5 80 46 north-apulia ## 6 70 44 north-apulia
update
i'd prbly last bit way now:
library(tidyverse) strsplit(vals, "[[:space:]]+") %>% map_df(~as_data_frame(as.list(setnames(., cols)))) %>% mutate(area_name=labs)
Comments
Post a Comment