Skip to main content

Home/ Advanced Concepts Team/ Group items tagged data

Rss Feed Group items tagged

Dario Izzo

NASA Brings Earth Science 'Big Data' to the Cloud with Amazon Web Services | NASA - 3 views

  •  
    NASA answer to the big data hype
  •  
    "The service encompasses selected NASA satellite and global change data sets -- including temperature, precipitation, and forest cover -- and data processing tools from the NASA Earth Exchange (NEX)" Very good marketing move for just three types of selected data (MODIS, Landsat products) plus four model runs (past/projection) for the the four greenhouse gas emissions scenarios of the IPCC. It looks as if they are making data available to adress a targeted question (crowdsourcing of science, as Paul mentioned last time, this time climate evolution), not at all the "free scrolling of the user around the database" to pick up what he thinks useful, mode. There is already more rich libraries out there when it comes to climate (http://icdc.zmaw.de/) Maybe simpler approach is the way to go: make available the big data sets categorized by study topic (climate evolution, solar system science, galaxies etc.) and not by instrument or mission, which is more technical, so that the amateur user can identify his point of interest easily.
  •  
    They are taking a good leap forward with it, but it definitely requires a lot of post processing of the data. Actually it seems they downsample everything to workable chunks. But I guess the power is really in the availability of the data in combination with Amazon's cloud computing platform. Who knows what will come out of it if hundreds of people start interacting with it.
Guido de Croon

Will robots be smarter than humans by 2029? - 2 views

  •  
    Nice discussion about the singularity. Made me think of drinking coffee with Luis... It raises some issues such as the necessity of embodiment, etc.
  • ...9 more comments...
  •  
    "Kurzweilians"... LOL. Still not sold on embodiment, btw.
  •  
    The biggest problem with embodiment is that, since the passive walkers (with which it all started), it hasn't delivered anything really interesting...
  •  
    The problem with embodiment is that it's done wrong. Embodiment needs to be treated like big data. More sensors, more data, more processing. Just putting a computer in a robot with a camera and microphone is not embodiment.
  •  
    I like how he attacks Moore's Law. It always looks a bit naive to me if people start to (ab)use it to make their point. No strong opinion about embodiment.
  •  
    @Paul: How would embodiment be done RIGHT?
  •  
    Embodiment has some obvious advantages. For example, in the vision domain many hard problems become easy when you have a body with which you can take actions (like looking at an object you don't immediately recognize from a different angle) - a point already made by researchers such as Aloimonos.and Ballard in the end 80s / beginning 90s. However, embodiment goes further than gathering information and "mental" recognition. In this respect, the evolutionary robotics work by for example Beer is interesting, where an agent discriminates between diamonds and circles by avoiding one and catching the other, without there being a clear "moment" in which the recognition takes place. "Recognition" is a behavioral property there, for which embodiment is obviously important. With embodiment the effort for recognizing an object behaviorally can be divided between the brain and the body, resulting in less computation for the brain. Also the article "Behavioural Categorisation: Behaviour makes up for bad vision" is interesting in this respect. In the field of embodied cognitive science, some say that recognition is constituted by the activation of sensorimotor correlations. I wonder to which extent this is true, and if it is valid for extremely simple creatures to more advanced ones, but it is an interesting idea nonetheless. This being said, if "embodiment" implies having a physical body, then I would argue that it is not a necessary requirement for intelligence. "Situatedness", being able to take (virtual or real) "actions" that influence the "inputs", may be.
  •  
    @Paul While I completely agree about the "embodiment done wrong" (or at least "not exactly correct") part, what you say goes exactly against one of the major claims which are connected with the notion of embodiment (google for "representational bottleneck"). The fact is your brain does *not* have resources to deal with big data. The idea therefore is that it is the body what helps to deal with what to a computer scientist appears like "big data". Understanding how this happens is key. Whether it is the problem of scale or of actually understanding what happens should be quite conclusively shown by the outcomes of the Blue Brain project.
  •  
    Wouldn't one expect that to produce consciousness (even in a lower form) an approach resembling that of nature would be essential? All animals grow from a very simple initial state (just a few cells) and have only a very limited number of sensors AND processing units. This would allow for a fairly simple way to create simple neural networks and to start up stable neural excitation patterns. Over time as complexity of the body (sensors, processors, actuators) increases the system should be able to adapt in a continuous manner and increase its degree of self-awareness and consciousness. On the other hand, building a simulated brain that resembles (parts of) the human one in its final state seems to me like taking a person who is just dead and trying to restart the brain by means of electric shocks.
  •  
    Actually on a neuronal level all information gets processed. Not all of it makes it into "conscious" processing or attention. Whatever makes it into conscious processing is a highly reduced representation of the data you get. However that doesn't get lost. Basic, low processed data forms the basis of proprioception and reflexes. Every step you take is a macro command your brain issues to the intricate sensory-motor system that puts your legs in motion by actuating every muscle and correcting every step deviation from its desired trajectory using the complicated system of nerve endings and motor commands. Reflexes which were build over the years, as those massive amounts of data slowly get integrated into the nervous system and the the incipient parts of the brain. But without all those sensors scattered throughout the body, all the little inputs in massive amounts that slowly get filtered through, you would not be able to experience your body, and experience the world. Every concept that you conjure up from your mind is a sort of loose association of your sensorimotor input. How can a robot understand the concept of a strawberry if all it can perceive of it is its shape and color and maybe the sound that it makes as it gets squished? How can you understand the "abstract" notion of strawberry without the incredibly sensible tactile feel, without the act of ripping off the stem, without the motor action of taking it to our mouths, without its texture and taste? When we as humans summon the strawberry thought, all of these concepts and ideas converge (distributed throughout the neurons in our minds) to form this abstract concept formed out of all of these many many correlations. A robot with no touch, no taste, no delicate articulate motions, no "serious" way to interact with and perceive its environment, no massive flow of information from which to chose and and reduce, will never attain human level intelligence. That's point 1. Point 2 is that mere pattern recogn
  •  
    All information *that gets processed* gets processed but now we arrived at a tautology. The whole problem is ultimately nobody knows what gets processed (not to mention how). In fact an absolute statement "all information" gets processed is very easy to dismiss because the characteristics of our sensors are such that a lot of information is filtered out already at the input level (e.g. eyes). I'm not saying it's not a valid and even interesting assumption, but it's still just an assumption and the next step is to explore scientifically where it leads you. And until you show its superiority experimentally it's as good as all other alternative assumptions you can make. I only wanted to point out is that "more processing" is not exactly compatible with some of the fundamental assumptions of the embodiment. I recommend Wilson, 2002 as a crash course.
  •  
    These deal with different things in human intelligence. One is the depth of the intelligence (how much of the bigger picture can you see, how abstract can you form concept and ideas), another is the breadth of the intelligence (how well can you actually generalize, how encompassing those concepts are and what is the level of detail in which you perceive all the information you have) and another is the relevance of the information (this is where the embodiment comes in. What you do is to a purpose, tied into the environment and ultimately linked to survival). As far as I see it, these form the pillars of human intelligence, and of the intelligence of biological beings. They are quite contradictory to each other mainly due to physical constraints (such as for example energy usage, and training time). "More processing" is not exactly compatible with some aspects of embodiment, but it is important for human level intelligence. Embodiment is necessary for establishing an environmental context of actions, a constraint space if you will, failure of human minds (i.e. schizophrenia) is ultimately a failure of perceived embodiment. What we do know is that we perform a lot of compression and a lot of integration on a lot of data in an environmental coupling. Imo, take any of these parts out, and you cannot attain human+ intelligence. Vary the quantities and you'll obtain different manifestations of intelligence, from cockroach to cat to google to random quake bot. Increase them all beyond human levels and you're on your way towards the singularity.
Thijs Versloot

The big data brain drain - 3 views

  •  
    Echoing this, in 2009 Google researchers Alon Halevy, Peter Norvig, and Fernando Pereira penned an article under the title The Unreasonable Effectiveness of Data. In it, they describe the surprising insight that given enough data, often the choice of mathematical model stops being as important - that particularly for their task of automated language translation, "simple models and a lot of data trump more elaborate models based on less data." If we make the leap and assume that this insight can be at least partially extended to fields beyond natural language processing, what we can expect is a situation in which domain knowledge is increasingly trumped by "mere" data-mining skills. I would argue that this prediction has already begun to pan-out: in a wide array of academic fields, the ability to effectively process data is superseding other more classical modes of research.
Thijs Versloot

The challenges of Big Data analysis @NSR_Family - 2 views

  •  
    Big Data bring new opportunities to modern society and challenges to data scientists. On the one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This paper gives overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures.
jcunha

'Superman memory crystal' that could store 360TB of data forever | ExtremeTech - 0 views

  •  
    A new so called 5D data storage that could potentially survive for billions of years. The research consists of nanostructured glass that can record digital data in five dimensions using femtosecond laser writing.
  • ...2 more comments...
  •  
    Very scarce scientific info available.. I'm very curious to see a bit more in future. From https://spie.org/PWL/conferencedetails/laser-micro-nanoprocessing I made a back of envelop calc: for 20 nm spaced, each laser spot in 5D encryption encodes 3 bits (it seemed to me) written in 3 planes, to obtain the claimed 360TB disk one needs very roughly 6000mm2, which does not complain with the dimensions shown in video. Only with larger number of planes (order of magnitude higher) it could be.. Also, at current commercial trends NAND Flash and HDD allow for 1000 Gb/in2. This means a 360 TB could hypothetically fit in 1800mm2.
  •  
    I had the same issue with the numbers when I saw the announcement a few days back (https://www.southampton.ac.uk/news/2016/02/5d-data-storage-update.page). It doesn't seem to add up. Plus, the examples they show are super low amounts of data (the bible probably fits on a few 1.44 MB floppy disk). As for the comparison with NAND and HDD, I think the main argument for their crystal is that it is supposedly more durable. HDDs are chronically bad at long term storage, and also NAND as far as I know needs to be refreshed frequently.
  •  
    Yes Alex, indeed, the durability is the point I think they highlight and focus on (besides the fact the abstract says something as the extrapolated decay time being comparable to the age of the Universe..). Indeed memories face problems with retention time. Most of the disks retain the information up to 10 years. When enterprises want to store data for longer times than this they use... yeah, magnetic tapes :-). Check a interesting article about magnetic tape market revival here http://www.information-age.com/technology/data-centre-and-it-infrastructure/123458854/rise-fall-and-re-rise-magnetic-tape I compared for fun, to have one idea of what we were talking about. I am also very curious so see the writing and reading times in this new memory :)
  •  
    But how can glass store the information so long? Glass is not even solid?!
Luís F. Simões

HP Dreams of Internet Powered by Phone Chips (And Cow Chips) | Wired.com - 0 views

  • For Hewlett Packard Fellow Chandrakat Patel, there’s a “symbiotic relationship between IT and manure.”
  • Patel is an original thinker. He’s part of a group at HP Labs that has made energy an obsession. Four months ago, Patel buttonholed former Federal Reserve Chairman Alan Greenspan at the Aspen Ideas Festival to sell him on the idea that the joule should be the world’s global currency.
  • Data centers produce a lot of heat, but to energy connoisseurs it’s not really high quality heat. It can’t boil water or power a turbine. But one thing it can do is warm up poop. And that’s how you produce methane gas. And that’s what powers Patel’s data center. See? A symbiotic relationship.
  • ...1 more annotation...
  • Financial house Cantor Fitzgerald is interested in Project Moonshot because it thinks HP’s servers may have just what it takes to help the company’s traders understand long-term market trends. Director of High-Frequency Trading Niall Dalton says that while the company’s flagship trading platform still needs the quick number-crunching power that comes with the powerhog chips, these low-power Project Moonshot systems could be great for analyzing lots and lots of data — taking market data from the past three years, for example, and running a simulation.
  •  
    of relevance to this discussion: Koomey's Law, a Moore's Law equivalent for computing's energetic efficiency http://www.economist.com/node/21531350 http://hardware.slashdot.org/story/11/09/13/2148202/whither-moores-law-introducing-koomeys-law
Thijs Versloot

Linked Open Earth Observation Data for Precision Farming - 1 views

  •  
    Lots of Earth Observation data has become available at no charge in Europe and the US recently and there is a strong push for more open EO data. With precision farming, advanced agriculture using GPS, satellite observations and tractors with on-board computers, the farming process is performed as accurately and efficiently as possible. This is achieved by combining data from earth observations with other geospatial sources such as cadastral data, data on the quality of the soil, vegetation and protected areas. This enables the farmer to find the optimal trade-off in maximizing his yield with minimal use of fertilizers and pesticides while respecting environmental protection.
santecarloni

[1105.1293] Eigengestures for natural human computer interface - 1 views

  •  
    We present the application of Principal Component Analysis for data acquired during the design of a natural gesture interface. We investigate the concept of an eigengesture for motion capture hand gesture data and present the visualisation of principal components obtained in the course of conducted experiments. We also show the influence of dimensionality reduction on reconstructed gesture data quality.
LeopoldS

Operation Socialist: How GCHQ Spies Hacked Belgium's Largest Telco - 4 views

  •  
    interesting story with many juicy details on how they proceed ... (similarly interesting nickname for the "operation" chosen by our british friends) "The spies used the IP addresses they had associated with the engineers as search terms to sift through their surveillance troves, and were quickly able to find what they needed to confirm the employees' identities and target them individually with malware. The confirmation came in the form of Google, Yahoo, and LinkedIn "cookies," tiny unique files that are automatically placed on computers to identify and sometimes track people browsing the Internet, often for advertising purposes. GCHQ maintains a huge repository named MUTANT BROTH that stores billions of these intercepted cookies, which it uses to correlate with IP addresses to determine the identity of a person. GCHQ refers to cookies internally as "target detection identifiers." Top-secret GCHQ documents name three male Belgacom engineers who were identified as targets to attack. The Intercept has confirmed the identities of the men, and contacted each of them prior to the publication of this story; all three declined comment and requested that their identities not be disclosed. GCHQ monitored the browsing habits of the engineers, and geared up to enter the most important and sensitive phase of the secret operation. The agency planned to perform a so-called "Quantum Insert" attack, which involves redirecting people targeted for surveillance to a malicious website that infects their computers with malware at a lightning pace. In this case, the documents indicate that GCHQ set up a malicious page that looked like LinkedIn to trick the Belgacom engineers. (The NSA also uses Quantum Inserts to target people, as The Intercept has previously reported.) A GCHQ document reviewing operations conducted between January and March 2011 noted that the hack on Belgacom was successful, and stated that the agency had obtained access to the company's
  •  
    I knew I wasn't using TOR often enough...
  •  
    Cool! It seems that after all it is best to restrict employees' internet access only to work-critical areas... @Paul TOR works on network level, so it would not help here much as cookies (application level) were exploited.
Thijs Versloot

Test shows big data text analysis inconsistent, inaccurate - 1 views

  •  
    Big data analytic systems are reputed to be capable of finding a needle in a universe of haystacks without having to know what a needle looks like. The very best ways to sort large databases of unstructured text is to use a technique called Latent Dirichlet allocation (LDA). Unfortunately, LDA is also inaccurate enough at some tasks that the results of any topic model created with it are essentially meaningless, according to Luis Amaral, a physicist whose specialty is the mathematical analysis of complex systems and networks in the real world and one of the senior researchers on the multidisciplinary team from Northwestern University that wrote the paper. Even for an easy case, big data analysis is proving to be far more complicated than many of the companies selling analysis software want people to believe.
  •  
    Most of those companies are using outdated algorithms like this LDA and just apply them like retards on those huge datasets. Of course they're going to come out with bad solutions. No amount of data can make up for bad algorithms.
Luís F. Simões

Kaggle: Crowdsourcing Data Modeling - 2 views

  • Kaggle is an innovative solution for statistical/analytics outsourcing. We are the leading platform for data modeling and prediction competitions. Companies, governments and researchers present datasets and problems - the world's best data scientists then compete to produce the best solutions. At the end of a competition, the competition host pays prize money in exchange for the intellectual property behind the winning model.
Luís F. Simões

When Astronomy Met Computer Science | Cosmology | DISCOVER Magazine - 1 views

  • “That’s impossible!” he told Borne. “Don’t you realize that the entire data set NASA has collected over the past 45 years is one terabyte?”
  • The LSST, producing 30 terabytes of data nightly, will become the centerpiece of what some experts have dubbed the age of peta­scale astronomy—that’s 1015 bits (what Borne jokingly calls “a tonabytes”).
  • A major sky survey might detect millions or even billions of objects, and for each object we might measure thousands of attributes in a thousand dimensions. You can get a data-mining package off the shelf, but if you want to deal with a billion data vectors in a thousand dimensions, you’re out of luck even if you own the world’s biggest supercomputer. The challenge is to develop a new scientific methodology for the 21st century.”
  •  
    Francesco please look at this and get back wrt to the /. question .... thanks
LeopoldS

Helix Nebula - Helix Nebula Vision - 0 views

  •  
    The partnership brings together leading IT providers and three of Europe's leading research centres, CERN, EMBL and ESA in order to provide computing capacity and services that elastically meet big science's growing demand for computing power.

    Helix Nebula provides an unprecedented opportunity for the global cloud services industry to work closely on the Large Hadron Collider through the large-scale, international ATLAS experiment, as well as with the molecular biology and earth observation. The three flagship use cases will be used to validate the approach and to enable a cost-benefit analysis. Helix Nebula will lead these communities through a two year pilot-phase, during which procurement processes and governance issues for the public/private partnership will be addressed.

    This game-changing strategy will boost scientific innovation and bring new discoveries through novel services and products. At the same time, Helix Nebula will ensure valuable scientific data is protected by a secure data layer that is interoperable across all member states. In addition, the pan-European partnership fits in with the Digital Agenda of the European Commission and its strategy for cloud computing on the continent. It will ensure that services comply with Europe's stringent privacy and security regulations and satisfy the many requirements of policy makers, standards bodies, scientific and research communities, industrial suppliers and SMEs.

    Initially based on the needs of European big-science, Helix Nebula ultimately paves the way for a Cloud Computing platform that offers a unique resource to governments, businesses and citizens.
  •  
    "Helix Nebula will lead these communities through a two year pilot-phase, during which procurement processes and governance issues for the public/private partnership will be addressed." And here I was thinking cloud computing was old news 3 years ago :)
tvinko

Big data or Pig data - 6 views

  •  
    my best Pakistani friend's blog (I recommend to follow it)
  •  
    Nice. Though would have liked a better example ..
  •  
    this is a parody on the Norvig-Chomsky debate that has been going on for the last year or so. You can read more about it here: http://norvig.com/chomsky.html
johannessimon81

Big data, bigger expectations? - 1 views

Thijs Versloot

Role of data visualization in the scientific community @britishlibrary - 1 views

  •  
    In a new exhibition titled Beautiful Science: Picturing Data, Inspiring Insight [bl.uk], the British Library pays homage to the important role data visualization plays in the scientific process. The exhibition can be visited from 20 February until 26 May 2014, and contains works ranging from John Snow's plotting of the 1854 London cholera infections on a map to colourful depictions of the Tree of Life.
Thijs Versloot

Record #Laser data transmission to the #moon achieved 622Mbps - 1 views

  •  
    Oct. 22, 2013 NASA Laser Communication System Sets Record with Data Transmissions to and from Moon NASA's Lunar Laser Communication Demonstration (LLCD) has made history using a pulsed laser beam to transmit data over the 239,000 miles between the moon and Earth at a record-breaking download rate of 622 megabits per second (Mbps).
Luís F. Simões

Inferring individual rules from collective behavior - 2 views

  •  
    "We fit data to zonal interaction models and characterize which individual interaction forces suffice to explain observed spatial patterns." You can get the paper from the first author's website: http://people.stfx.ca/rlukeman/research.htm
  •  
    PNAS? Didnt strike me as sth very new though... We should refer to it in the roots study though: "Social organisms form striking aggregation patterns, displaying cohesion, polarization, and collective intelligence. Determining how they do so in nature is challenging; a plethora of simulation studies displaying life-like swarm behavior lack rigorous comparison with actual data because collecting field data of sufficient quality has been a bottleneck." For roots it is NO bottleneck :) Tobias was right :)
  •  
    Here they assume all relevant variables influencing behaviour are being observed. Namely, the relative positions and orientations of all ducks in the swarm. So, they make movies of the swarm's movements, process them, and them fit the models to that data. In the roots, though we can observe the complete final structure, or even obtain time-lapse movies showing how that structure came out to be, getting the measurements of all relevant soil variables (nitrogen, phosphorus, ...) throughout the soil, and over time, would be extremely difficult. So I guess a replication of the kind of work they did, but for the roots, would be hard. Nice reference though.
Luís F. Simões

The 70 Online Databases that Define Our Planet - 0 views

  • an ambitious European plan to simulate the entire planet. The idea is to exploit the huge amounts of data generated by financial markets, health records, social media and climate monitoring to model the planet's climate, societies and economy. The vision is that a system like this can help to understand and predict crises before they occur so that governments can take appropriate measures in advance.
  •  
    website of the project working on the 'Living Earth Simulator': http://www.futurict.ethz.ch/FuturICT five page summary of the FuturICT Proposal: http://www.futurict.ethz.ch/data/FuturICT-FivePageSummary.pdf
jmlloren

Data.gov - 2 views

shared by jmlloren on 24 Feb 10 - Cached
  •  
    Interesting databases to play with data mining algorithms
1 - 20 of 231 Next › Last »
Showing 20 items per page