Test shows big data text analysis inconsistent, inaccurate - 1 views
-
Thijs Versloot on 02 Feb 15Big data analytic systems are reputed to be capable of finding a needle in a universe of haystacks without having to know what a needle looks like. The very best ways to sort large databases of unstructured text is to use a technique called Latent Dirichlet allocation (LDA). Unfortunately, LDA is also inaccurate enough at some tasks that the results of any topic model created with it are essentially meaningless, according to Luis Amaral, a physicist whose specialty is the mathematical analysis of complex systems and networks in the real world and one of the senior researchers on the multidisciplinary team from Northwestern University that wrote the paper. Even for an easy case, big data analysis is proving to be far more complicated than many of the companies selling analysis software want people to believe.
-
Paul N on 04 Feb 15Most of those companies are using outdated algorithms like this LDA and just apply them like retards on those huge datasets. Of course they're going to come out with bad solutions. No amount of data can make up for bad algorithms.