Skip to main content

Home/ VirgoLab/ Group items tagged data mining

Rss Feed Group items tagged

Roger Chen

Data mining is not just a data recovery tool | Styx online - 0 views

  • Data Mining is a process of discovering meaningful new correlations, patterns and trends by sifting through large amounts of data stored in repositories, using statistical, data analysis and mathematical techniques
  • Data mining is the crucial process that helps companies better comprehend their customers. Data mining can be defined as ‘the nontrivial extraction of implicit, previously unknown, and potentially useful information from data’ and also as ‘the science of extracting useful information from large sets or databases’.
Roger Chen

Analysis: data mining doesn't work for spotting terrorists - 0 views

  • Automated identification of terrorists through data mining (or any other known methodology) is neither feasible as an objective nor desirable as a goal of technology development efforts.
  • criminal prosecutors and judges are concerned with determining the guilt or innocence of a suspect in the wake of an already-committed crime; counter-terror officials are concerned with preventing crimes from occurring by identifying suspects before they've done anything wrong.
  • The problem: preventing a crime by someone with no criminal record
  • ...3 more annotations...
  • In fact, most terrorists have no criminal record of any kind that could bring them to the attention of authorities or work against them in court.
  • As the NRC report points out, not only is the training data lacking, but the input data that you'd actually be mining has been purposely corrupted by the terrorists themselves.
  • So this application of data mining bumps up against the classic GIGO (garbage in, garbage out) problem in computing, with the terrorists deliberately feeding the system garbage.
Roger Chen

Data Randomization - 0 views

  •  
    Attacks that exploit memory errors are still a serious problem. We present data randomization, a new technique that provides probabilistic protection against these attacks by xoring data with random masks. Data randomization uses static analysis to partition instruction operands into equivalence classes: it places two operands in the same class if they may refer to the same object in an execution that does not violate memory safety. Then it assigns a random mask to each class and it generates code instrumented to xor data read from or written to memory with the mask of the memory operand's class. Therefore, attacks that violate the results of the static analysis have unpredictable results. We implemented a data randomization prototype that compiles programs without modifications and can preventmany attacks with low overhead. Our prototype prevents all the attacks in our benchmarks while introducing an average runtime overhead of 11% (0%to 27%) and an average space overhead below 1%.
Roger Chen

Semantic Library » Zotero and semantic principles - 0 views

  • Our Zotero Server, connected to the client, will enable all kinds of new collaboration opportunities and data-mining of aggregated collections. We also plan to provide hooks into high-performance computing projects like the SEASR text-mining project based at UIUC
  • Data mining is becoming a major trend in eResearch as computing power increases and more and more researchers have direct access to open data sets. In the future, we won’t just be citing articles, figures, images, movies, and books, we’ll also be citing specific data points.
Roger Chen

The End Of The Scientific Method… Wha….? « Life as a Physicist - 0 views

  • His basic thesis is that when you have so much data you can map out every connection, every correlation, then the  data becomes the model. No need to derive or understand what is actually happening — you have so much data that you can already make all the predictions that a model would let you do in the first place. In short — you no longer need to develop a theory or hypothesis - just map the data!
  • First, in order for this to work you need to have millions and millions and millions of data points. You need, basically, ever single outcome possible, with all possible other factors. Huge amounts of data. That does not apply to all branches of science.
  • The second problem with this approach is you will never discover anything new. The problem with new things is there is no data on them!
  • ...3 more annotations...
  • Correlations are a way of catching a scientist’s attention, but the models and mechanisms that explain them are how we make the predictions that not only advance science, but generate practical applications. One only needs to look at a promising field that lacks a strong theoretical foundation—high-temperature superconductivity springs to mind—to see how badly the lack of a theory can impact progress
  • Anderson is right — we are entering a new age where the ability to mine these large amounts of data are going to open up whole new levels of understanding
  • This is a new tool, and it will open up all sorts of doors for us. But the end of the scientific method? No — because that implies an end of discovery. And end of new things.
Roger Chen

Current Approaches to Data Mining Blogs - ESIWiki - 0 views

  •  
    Summary of the current doirction of blog research using data mining.
Roger Chen

Data Mining Souce Code Newsletter - Blogs - 0 views

  •  
    Download Free Data Mining Source Code In C/C++, C#, Visual Basic, Visual Basic.NET, Java, and other programming languages
Roger Chen

課程管理系統的資料探勘:以Moodle為例 - 0 views

  •  
    Data mining in course management systems: Moodle case study and tutorial by: Cristobal Romero, Sebastian Ventura, Enrique Garcia Computers & Education, Vol. In Press(2007), Corrected Proof
Roger Chen

ACM SIGKDD - 0 views

  •  
    ACM SPecial Interest Group on Knowledge Discovery and Data Mining
Roger Chen

PAKDD 2009 : 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining - 0 views

  •  
    13th Pacific-Asia Conference on Knowledge Discovery and Data Mining
Roger Chen

CRISP-DM - Home - 0 views

  •  
    CRISP = Cross Industry Standard Process for Data Mining
Roger Chen

KNIME - Konstanz Information Miner - 0 views

shared by Roger Chen on 01 Aug 08 - Cached
  •  
    KNIME, pronounced [naim], is a modular data exploration platform that enables the user to visually create data flows (often referred to as pipelines), selectively execute some or all analysis steps, and later investigate the results through interactive views on data and models.
Roger Chen

The End of Theory: The Data Deluge Makes the Scientific Method Obsolete - 0 views

  • Sixty years ago, digital computers made information readable. Twenty years ago, the Internet made it reachable. Ten years ago, the first search engine crawlers made it a single database.
  • Google's founding philosophy is that we don't know why this page is better than that one: If the statistics of incoming links say it is, that's good enough.
  • The scientific method is built around testable hypotheses. These models, for the most part, are systems visualized in the minds of scientists. The models are then tested, and experiments confirm or falsify theoretical models of how the world works. This is the way science has worked for hundreds of years.
  • ...6 more annotations...
  • Peter Norvig, Google's research director, offered an update to George Box's maxim: "All models are wrong, and increasingly you can succeed without them."
  • Once you have a model, you can connect the data sets with confidence. Data without a model is just noise.
    • Roger Chen
       
      That's what Chris Anderson thought is old-school.
  • But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete.
    • Roger Chen
       
      Come to conclusion? I don't think so.
  • There is now a better way. Petabytes allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.
  • What can science learn from Google?
  • This kind of thinking is poised to go mainstream.
    • Roger Chen
       
      ???
  •  
    "All models are wrong, and increasing you can succeed without them."
Roger Chen

Data Mining, Analytics and Artificial Intelligence: Financial Services Business Analyti... - 0 views

  •  
    An universal evidence-based business analytics model for the financial services industry
1 - 20 of 59 Next › Last »
Showing 20 items per page