Skip to main content

Home/ Groups/ Data Working Group
Amy West

2011AGUworkshop - Federation of Earth Science Information Partners - 1 views

  •  
    All the presentations are good, but I found the Data formats, Creating documentation & metadata, working w/an archive & preservation strategies particularly good. Solid examples of formats, metadata, and real-life preservation. Plus, as mgs of UDC/AgEcon, hopefully more archives over time, I think we should look hard at what they tell researchers to look for in an archive.
Lisa Johnston

Data Management - Dana Library's Data Support - Research Guides at Rutgers University - 1 views

  •  
    Lots of new materials here...we should add some of this to our web site!
Lisa Johnston

Got big data? Crunch it with Google's BigQuery | VentureBeat - 2 views

  •  
    BigQuery??  "the service is designed for large-scale internal data analytics, to companies of all sizes, and it's adding a web interface so you can do it all in the cloud."
Lisa Johnston

Michael Nielsen on Networked Science - WSJ.com - 1 views

  •  
    The New Einsteins Will Be Scientists Who ShareFrom cancer to cosmology, researchers could race ahead by working together-online and in the open
Lisa Johnston

Digital Preservation Courses & Workshops - Digital Preservation Outreach and Education ... - 0 views

  •  
    more online training opportunities...the DPOE program 
Lisa Johnston

Open Online Research Data Management Course for Ph.D Students - 2 views

  •  
    If anyone plans on running through the online course, please do update the data management website with any new information. The content closely follows the structure of our pages. thanks
Lisa Johnston

BioMed Central Blog : Data sharing: lessons from the Wellcome Trust Sanger Institute - 1 views

  •  
    The Wellcome Trust Sanger Institute in the UK, a key player in the Human Genome Project, has often led the way in this area, and in the latest issue of Genome Medicine, Tim Hubbard and Stephanie Dyke from the Wellcome Trust Sanger Institute explain  how they developed and implemented the Institute's policy.
Lisa Johnston

HPCwire: SDSC Cloud Supports New NSF Mandate for Data Management - 0 views

  •  
    Standard "on-demand" storage costs for UC researchers on the SDSC Cloud start at only $3.25 a month per 100GB (gigabytes) of storage. A "condo" option, which allows users to make cost-effective long term investment in hardware that becomes part of the SDSC Cloud, is also available. Full details can be found at https://cloud.sdsc.edu/hp/index.php.
Amy West

Interagency Data Stewardship/Citations/provider guidelines - Federation of Earth Scienc... - 0 views

    • Amy West
       
      Little confused by what's meant by "data sets should be cited like books" since they go on to provide really good reasons why data aren't like books, e.g. need subsetting information, access date for dynamic databases.
  • The guidelines build from the IPY Guidelines and are compatible with the DataCite Metadata Scheme for the Publication and Citation of Research Data, Version 2.2, July 2011.
  • In some cases, the data set authors may have also published a paper describing the data in great detail. These sort of data papers should be encouraged, and both the paper and the data set should be cited when the data are used.
  • ...27 more annotations...
  • Ongoing updates to a time series do change the content of the data set, but they do not typically constitute a new version or edition of a data set. New versions typically reflect changes in sampling protocols, algorithms, quality control processes, etc. Both a new version and an update may be reflected in the release date.
  • Locator, Identifier, or Distribution Medium
  • Then it is necessary to include a persistant reference to the location of the data.
  • This may be the most challenging aspect of data citation. It is necessary to enable "micro-citation" or the ability to refer to the specific data used--the exact files, granules, records, etc.
  • Data stewards should suggest how to reference subsets of their data. With Earth science data, subsets can often be identified by referring to a temporal and spatial range.
  • A particular data set may be part of a compilation, in which case it is appropriate to cite the data set somewhat like a chapter in an edited volume.
  • Increasingly, publishers are allowing data supplements to be published along with peer-reviewed research papers. When using the data supplement one need only cite the parent reference. F
  • Confusingly, a Digital Object Identifier is a locator. It is a Handle based scheme whereby the steward of the digital object registers a location (typically a URL) for the object. There is no guarantee that the object at the registered location will remain unchanged. Consider a continually updated data time series, for example.
  • While it is desirable to uniquely identify the cited object, it has proven extremely challenging to identify whether two data sets or data files are scientifically identical.
  • At this point, we must rely on location information combined with other information such as author, title, and version to uniquely identify data used in a study.
  • The key to making registered locators, such as DOIs, ARKS, or Handles, work unambiguously to identify and locate data sets is through careful tracking and documentation of versions.
  • how to handle different data set versions relative to an assigned locator.
  • Track major_version.minor_version.[archive_version].
  • Typically, something that affects the whole data set like a reprocessing would be considered a major version.
  • Assign unique locators to major versions.
  • Old locators for retired versions should be maintained and point to some appropriate web site that explains what happened to the old data if they were not archived.
  • A new major version leads to the creation of a new collection-level metadata record that is distributed to appropriate registries. The older metadata record should remain with a pointer to the new version and with explanation of the status of the older version data.
  • Major and minor version should be listed in the recommended citation.
  • inor versions should be explained in documentation
  • Ongoing additions to an existing time series need not constitute a new version. This is one reason for capturing the date accessed when citing the data.
  • we believe it is currently impossible to fully satisfy the requirement of scientific reproducibility in all situations
  • To aid scientific reproducibility through direct, unambiguous reference to the precise data used in a particular study. (This is the paramount purpose and also the hardest to achieve). To provide fair credit for data creators or authors, data stewards, and other critical people in the data production and curation process. To ensure scientific transparency and reasonable accountability for authors and stewards. To aid in tracking the impact of data set and the associated data center through reference in scientific literature. To help data authors verify how their data are being used. To help future data users identify how others have used the data.
  • The ESIP Preservation and Stewardship cluster has examined these and other current approaches and has found that they are generally compatible and useful, but they do not entirely meet all the purposes of Earth science data citation.
  • In general, data sets should be cited like books.
  • hey need to use the style dictated by their publishers, but by providing an example, data stewards can give users all the important elements that should be included in their citations of data sets
  • Access Date and Time--because data can be dynamic and changeable in ways that are not always reflected in release dates and versions, it is important to indicate when on-line data were accessed.
  • Additionally, it is important to provide a scheme for users to indicate the precise subset of data that were used. This could be the temporal and spatial range of the data, the types of files used, a specific query id, or other ways of describing how the data were subsetted.
Amy West

The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Da... - 2 views

  •  
    Paper on data sharing from social sciences perspective; also some analysis of sharing so far.
Amy West

Open access to research data a lot tougher than you think - 2 views

  • It means that researchers need to deal with the formatting and deposition of data, an annoying step when they would rather be focusing on their next project. Given the time lag, it's also difficult to associate the correct metadata with the material that's being a
  • According to the commentary, scientists view data deposition as a burden due to the extra work it involves. Research data is usually not in the correct format for submission to repositories when the project is completed, and so the scientist must take the time to convert it.
  • The authors here propose a new approach to data management, where each research institution should employ data managers to work with scientists and administer local, structured data storage. Local storage and support is the preference of most scientists, who would rather not hand off control of their data to remote strangers.
Amy West

Data Citation from the perspective of tracking data reuse - 3 views

  •  
    heather piowar
Amy West

total-impact.org - 2 views

  •  
    Welcome to Total-Impact. This site allows you to track the impact of various online research artifacts. It grabs metrics from many different sites and displays them all in one place.
1 - 20 of 226 Next › Last »
Showing 20 items per page