Je vais à travers l'Apprentissage de la Machine pour les Pirates, et je suis bloqué à cette ligne:
from.weight <- ddply(priority.train, .(From.EMail), summarise, Freq = length(Subject))
Qui génère l'erreur suivante:
Error in attributes(out) <- attributes(col) :
'names' attribute [9] must be the same length as the vector [1]
C'est un traceback():
> traceback()
11: FUN(1:5[[1L]], ...)
10: lapply(seq_len(n), extract_col_rows, df = x, i = i)
9: extract_rows(x$data, x$index[[i]])
8: `[[.indexed_df`(pieces, i)
7: pieces[[i]]
6: function (i)
{
piece <- pieces[[i]]
if (.inform) {
res <- try(.fun(piece, ...))
if (inherits(res, "try-error")) {
piece <- paste(capture.output(print(piece)), collapse = "\n")
stop("with piece ", i, ": \n", piece, call. = FALSE)
}
}
else {
res <- .fun(piece, ...)
}
progress$step()
res
}(1L)
5: .Call("loop_apply", as.integer(n), f, env)
4: loop_apply(n, do.ply)
3: llply(.data = .data, .fun = .fun, ..., .progress = .progress,
.inform = .inform, .parallel = .parallel, .paropts = .paropts)
2: ldply(.data = pieces, .fun = .fun, ..., .progress = .progress,
.inform = .inform, .parallel = .parallel, .paropts = .paropts)
1: ddply(priority.train, .(From.EMail), summarise, Freq = length(Subject))
La priorité des priorités.le train de l'objet est un bloc de données, et voici plus d'infos:
> mode(priority.train)
[1] "list"
> names(priority.train)
[1] "Date" "From.EMail" "Subject" "Message" "Path"
> sapply(priority.train, mode)
Date From.EMail Subject Message Path
"list" "character" "character" "character" "character"
> sapply(priority.train, class)
$Date
[1] "POSIXlt" "POSIXt"
$From.EMail
[1] "character"
$Subject
[1] "character"
$Message
[1] "character"
$Path
[1] "character"
> length(priority.train)
[1] 5
> nrow(priority.train)
[1] 1250
> ncol(priority.train)
[1] 5
> str(priority.train)
'data.frame': 1250 obs. of 5 variables:
$ Date : POSIXlt, format: "2002-01-31 22:44:14" "2002-02-01 00:53:41" "2002-02-01 02:01:44" "2002-02-01 10:29:23" ...
$ From.EMail: chr "removed@removed.ca" "removed@removed.net" "removed@removed.ca" "removed@removed.net" ...
$ Subject : chr "please help a newbie compile mplayer :-)" "re: please help a newbie compile mplayer :-)" "re: please help a newbie compile mplayer :-)" "re: please help a newbie compile mplayer :-)" ...
$ Message : chr " \n Hello,\n \n I just installed redhat 7.2 and I think I have everything \nworking properly. Anyway I want to in"| __truncated__ "Make sure you rebuild as root and you're in the directory that you\ndownloaded the file. Also it might complain of a few depen"| __truncated__ "Lance wrote:\n\n>Make sure you rebuild as root and you're in the directory that you\n>downloaded the file. Also it might compl"| __truncated__ "Once upon a time, rob wrote :\n\n> I dl'd gcc3 and libgcc3, but I still get the same error message when I \n> try rpm --rebuil"| __truncated__ ...
$ Path : chr "../03-Classification/data/easy_ham/01061.6610124afa2a5844d41951439d1c1068" "../03-Classification/data/easy_ham/01062.ef7955b391f9b161f3f2106c8cda5edb" "../03-Classification/data/easy_ham/01063.ad3449bd2890a29828ac3978ca8c02ab" "../03-Classification/data/easy_ham/01064.9f4fc60b4e27bba3561e322c82d5f7ff" ...
Warning messages:
1: In encodeString(object, quote = "\"", na.encode = FALSE) :
it is not known that wchar_t is Unicode on this platform
2: In encodeString(object, quote = "\"", na.encode = FALSE) :
it is not known that wchar_t is Unicode on this platform
Je poste un échantillon, mais le contenu est un peu long et je ne pense pas que le contenu est pertinent ici.
La même erreur se produit également ici:
> ddply(priority.train, .(Subject))
Error in attributes(out) <- attributes(col) :
'names' attribute [9] must be the same length as the vector [1]
Quelqu'un a une idée sur ce qui se passe ici? L'erreur semble être généré par un objet différent de priorité.train, parce que ses noms d'attribut, apparemment, a 9 éléments.
J'apprécierais toute aide. Merci!
Problème résolu
J'ai trouvé le problème grâce à @user1317221_G du conseil de l'aide de l'dput de la fonction. Le problème est avec le champ de Date, qui est à ce point une liste qui contient des 9 champs (sec, min, heure, mday, lundi, à l'année, wday, yday, isdst). Pour résoudre le problème j'ai simplement converti les dates de caractères pour les vecteurs, utilisé ddply ensuite converti les dates de retour à ce Jour:
> tmp <- priority.train$Date
> priority.train$Date <- as.character(priority.train$Date)
> from.weight <- ddply(priority.train, .(From.EMail), summarise, Freq = length(Subject))
> priority.train$Date <- tmp
> rm(tmp)