Skip to main content

Home/ OZ/NZ educators/ Group items tagged generator Story casey

Rss Feed Group items tagged

Tony Searl

What is data science? - O'Reilly Radar - 1 views

  • how to use data effectively -- not just their own data, but all the data that's available and relevant
  • Increased storage capacity demands increased sophistication in the analysis and use of that data
  • Once you've parsed the data, you can start thinking about the quality of your data
  • ...20 more annotations...
  • It's usually impossible to get "better" data, and you have no alternative but to work with the data at hand
  • The most meaningful definition I've heard: "big data" is when the size of the data itself becomes part of the problem
  • Precision has an allure, but in most data-driven applications outside of finance, that allure is deceptive. Most data analysis is comparative:
  • Storing data is only part of building a data platform, though. Data is only useful if you can do something with it, and enormous datasets present computational problems
  • Hadoop has been instrumental in enabling "agile" data analysis. In software development, "agile practices" are associated with faster product cycles, closer interaction between developers and consumers, and testing
  • Faster computations make it easier to test different assumptions, different datasets, and different algorithms
  • It's easer to consult with clients to figure out whether you're asking the right questions, and it's possible to pursue intriguing possibilities that you'd otherwise have to drop for lack of time.
  • Machine learning is another essential tool for the data scientist.
  • According to Mike Driscoll (@dataspora), statistics is the "grammar of data science." It is crucial to "making data speak coherently."
  • Data science isn't just about the existence of data, or making guesses about what that data might mean; it's about testing hypotheses and making sure that the conclusions you're drawing from the data are valid.
  • The problem with most data analysis algorithms is that they generate a set of numbers. To understand what the numbers mean, the stories they are really telling, you need to generate a graph
  • Visualization is crucial to each stage of the data scientist
  • Visualization is also frequently the first step in analysis
  • Casey Reas' and Ben Fry's Processing is the state of the art, particularly if you need to create animations that show how things change over time
  • Making data tell its story isn't just a matter of presenting results; it involves making connections, then going back to other data sources to verify them.
  • Physicists have a strong mathematical background, computing skills, and come from a discipline in which survival depends on getting the most from the data. They have to think about the big picture, the big problem. When you've just spent a lot of grant money generating data, you can't just throw the data out if it isn't as clean as you'd like. You have to make it tell its story. You need some creativity for when the story the data is telling isn't what you think it's telling.
  • It was an agile, flexible process that built toward its goal incrementally, rather than tackling a huge mountain of data all at once.
  • we're entering the era of products that are built on data.
  • We don't yet know what those products are, but we do know that the winners will be the people, and the companies, that find those products.
  • They can think outside the box to come up with new ways to view the problem, or to work with very broadly defined problems: "here's a lot of data, what can you make from it?"
1 - 2 of 2
Showing 20 items per page