40.0%50.0%60.0%70.0%80.0%90.0%100.0%genderagestyle contentallFigure 1 10-fold cross validation results for the age and gender classifiersClassed as 10's20's30's10's7036102717720's916632684430's17814651351Table 7 Confusion matrix for the age classifier using all featuresFor age, content proves to be slightly more useful than style, but – as in gender – the combination is most useful. The confusion matrix indicates that, using content and style features together, 10s are distinguishable from 30’s with accuracy above 96% and distinguishing 10s from 20s is also achievable with accuracy of 87.3%. Many 30s are mis-classed as 20s, however, yielding overall accuracy is 76.2%CONCLUSIONSWe have assembled a large corpus of blogs labeled for a variety of demographic attributes. This large sample permits us insight into the demographic distribution of bloggers. We have found that teenage bloggers are predominantly female, while older bloggers are predominantly male. Moreover, within each age group, male and female bloggers blog about different thing and use different blogging styles