R: convert XML data to data frame -


for homework assignment attempting convert xml file data frame in r. have tried many different things, , have searched ideas on internet have been unsuccessful. here code far:

library(xml) url <- 'http://www.ggobi.org/book/data/olive.xml' doc <- xmlparse(myurl) root <- xmlroot(doc)  dataframe <- xmlsapply(xmltop, function(x) xmlsapply(x, xmlvalue)) data.frame(t(dataframe),row.names=null) 

the output giant vector of numbers. attempting organize data data frame, not know how adjust code obtain that.

it may not verbose xml package xml2 doesn't have memory leaks , laser-focused on data extraction. use trimws really recent addition r core.

library(xml2)  pg <- read_xml("http://www.ggobi.org/book/data/olive.xml")  # <record>s recs <- xml_find_all(pg, "//record")  # extract , clean columns vals <- trimws(xml_text(recs))  # extract , clean (if needed) area names labs <- trimws(xml_attr(recs, "label"))  # mine column names 2 variable descriptions # xpath construct lets grab either <categ…> or <real…> tags # , grabs 'name' attribute of them cols <- xml_attr(xml_find_all(pg, "//data/variables/*[self::categoricalvariable or                                                       self::realvariable]"), "name")  # converts each set of <record> columns data frame # after first converting each row numeric , assigning # names each column (making easier matrix data frame conv) dat <- do.call(rbind, lapply(strsplit(vals, "\ +"),                                  function(x) {                                    data.frame(rbind(setnames(as.numeric(x),cols)))                                  }))  # assign area name column data frame dat$area_name <- labs  head(dat) ##   region area palmitic palmitoleic stearic oleic linoleic linolenic ## 1      1    1     1075          75     226  7823      672        na ## 2      1    1     1088          73     224  7709      781        31 ## 3      1    1      911          54     246  8113      549        31 ## 4      1    1      966          57     240  7952      619        50 ## 5      1    1     1051          67     259  7771      672        50 ## 6      1    1      911          49     268  7924      678        51 ##   arachidic eicosenoic    area_name ## 1        60         29 north-apulia ## 2        61         29 north-apulia ## 3        63         29 north-apulia ## 4        78         35 north-apulia ## 5        80         46 north-apulia ## 6        70         44 north-apulia 

update

i'd prbly last bit way now:

library(tidyverse)  strsplit(vals, "[[:space:]]+") %>%    map_df(~as_data_frame(as.list(setnames(., cols)))) %>%    mutate(area_name=labs) 

Comments

Popular posts from this blog

matlab - error with cyclic autocorrelation function -

django - (fields.E300) Field defines a relation with model 'AbstractEmailUser' which is either not installed, or is abstract -

c# - What is a good .Net RefEdit control to use with ExcelDna? -