Skip to main content

Home/ Data Working Group/ Group items tagged publishing

Rss Feed Group items tagged

Amy West

In case you can't read…. | Prof-Like Substance - 1 views

  • When I am putting a talk together it would never occur to me not to include a health dose of unpublished data. The only times in my career that I have talked about mostly published data have been when I first started as a postdoc and in the early days of being a PI, when I didn't have enough new data to even make a coherent story, but that accounts for maybe three professional talks out of man
  • s it a fear of being scooped or a penchant for keeping one's ideas close to the chest that promotes the Summary Talk?
  • I think it's field dependent. Personally, I can rarely get enough information from a talk to know whether to believe a result or not. This means that unpublished data usually ends up with me thinking "maybe, maybe not".
  • ...10 more annotations...
  • (A good talk like this has enough of a citation on the slide that I can jot down where to go if I want to know details on any particular result.)
  • I'm in a highly competitive biomed field, and I was taught never to present something unless it was either submitted or ready to be submitted.
  • I don't really spend any time worrying about being scooped because I collect my own data.
  • Why look at a poster or talk of 100% published work, I've already seen the stuff in a journal to start with
  • Final year materials chemist = keeping cards close to my chest. Once bitten, never again.
  • In neuro, I'd say that at smaller conferences and less high-profile talks at big conferences (i.e. not keynotes or featured lectures), the bulk of what you're hearing is unpublished. ALL posters are unpublished--in fact, I think (?) it's a rule at SfN that the content of posters can't be published already.
  • In my field I'd guess that most talks include data that is in press or at some close to publication sta
  • A big name should be more generous, but then again they do have to save guard the career of the student/postdoc who generated the data. Also the star or keynote speaker is expected to address a wider audience, and make their talk relevant to the overall theme of the conference.
  • In my (experimental) social science, most conferences explicitly say that you cannot submit to present already published or even accepted work.
  • In my field (Astronomy), I'd say 95% of the talks are about unpublished data.
  •  
    A blog post & comments on what's preferred in conference presentations: published or unpublished data. Interesting.
Lisa Johnston

Open Science Data Initiative (OSDI) - 0 views

  •  
    he Open Science Data Initiative is an initiative led by Oak Ridge National Laboratory in partnership with Microsoft's Public Sector Developer Evangelism team. OSDI is based on OGDI which in turn uses the Azure Services Platform to make it easier to publish and use a wide variety of scientific data from government agencies. OSDI is an sample of OGDI's open source 'starter kit' (coming soon) with code that can be used to publish data on the Internet in a Web-friendly format with easy-to-use, open API's. OSDI-based web API's can be accessed from a variety of client technologies such as Silverlight, Flash, JavaScript, PHP, Python, Ruby, mapping web sites, etc. Whether you are a researcher wishing to use scientific data, a hobyist developer, or a "budding scientist", these open API's will enable you to build innovative applications, visualizations and mash-ups that empower people through access to scientific information. This site is built using the OGDI starter kit software assets and provides interactive access to some publicly-available data sets along with sample code and resources for writing applications using the OSDI APIs.
Amy West

Interagency Data Stewardship/Citations/provider guidelines - Federation of Earth Scienc... - 0 views

    • Amy West
       
      Little confused by what's meant by "data sets should be cited like books" since they go on to provide really good reasons why data aren't like books, e.g. need subsetting information, access date for dynamic databases.
  • The guidelines build from the IPY Guidelines and are compatible with the DataCite Metadata Scheme for the Publication and Citation of Research Data, Version 2.2, July 2011.
  • In some cases, the data set authors may have also published a paper describing the data in great detail. These sort of data papers should be encouraged, and both the paper and the data set should be cited when the data are used.
  • ...27 more annotations...
  • Ongoing updates to a time series do change the content of the data set, but they do not typically constitute a new version or edition of a data set. New versions typically reflect changes in sampling protocols, algorithms, quality control processes, etc. Both a new version and an update may be reflected in the release date.
  • Locator, Identifier, or Distribution Medium
  • Then it is necessary to include a persistant reference to the location of the data.
  • This may be the most challenging aspect of data citation. It is necessary to enable "micro-citation" or the ability to refer to the specific data used--the exact files, granules, records, etc.
  • Data stewards should suggest how to reference subsets of their data. With Earth science data, subsets can often be identified by referring to a temporal and spatial range.
  • A particular data set may be part of a compilation, in which case it is appropriate to cite the data set somewhat like a chapter in an edited volume.
  • Increasingly, publishers are allowing data supplements to be published along with peer-reviewed research papers. When using the data supplement one need only cite the parent reference. F
  • Confusingly, a Digital Object Identifier is a locator. It is a Handle based scheme whereby the steward of the digital object registers a location (typically a URL) for the object. There is no guarantee that the object at the registered location will remain unchanged. Consider a continually updated data time series, for example.
  • While it is desirable to uniquely identify the cited object, it has proven extremely challenging to identify whether two data sets or data files are scientifically identical.
  • At this point, we must rely on location information combined with other information such as author, title, and version to uniquely identify data used in a study.
  • The key to making registered locators, such as DOIs, ARKS, or Handles, work unambiguously to identify and locate data sets is through careful tracking and documentation of versions.
  • how to handle different data set versions relative to an assigned locator.
  • Track major_version.minor_version.[archive_version].
  • Typically, something that affects the whole data set like a reprocessing would be considered a major version.
  • Assign unique locators to major versions.
  • Old locators for retired versions should be maintained and point to some appropriate web site that explains what happened to the old data if they were not archived.
  • A new major version leads to the creation of a new collection-level metadata record that is distributed to appropriate registries. The older metadata record should remain with a pointer to the new version and with explanation of the status of the older version data.
  • Major and minor version should be listed in the recommended citation.
  • inor versions should be explained in documentation
  • Ongoing additions to an existing time series need not constitute a new version. This is one reason for capturing the date accessed when citing the data.
  • we believe it is currently impossible to fully satisfy the requirement of scientific reproducibility in all situations
  • To aid scientific reproducibility through direct, unambiguous reference to the precise data used in a particular study. (This is the paramount purpose and also the hardest to achieve). To provide fair credit for data creators or authors, data stewards, and other critical people in the data production and curation process. To ensure scientific transparency and reasonable accountability for authors and stewards. To aid in tracking the impact of data set and the associated data center through reference in scientific literature. To help data authors verify how their data are being used. To help future data users identify how others have used the data.
  • The ESIP Preservation and Stewardship cluster has examined these and other current approaches and has found that they are generally compatible and useful, but they do not entirely meet all the purposes of Earth science data citation.
  • In general, data sets should be cited like books.
  • hey need to use the style dictated by their publishers, but by providing an example, data stewards can give users all the important elements that should be included in their citations of data sets
  • Access Date and Time--because data can be dynamic and changeable in ways that are not always reflected in release dates and versions, it is important to indicate when on-line data were accessed.
  • Additionally, it is important to provide a scheme for users to indicate the precise subset of data that were used. This could be the temporal and spatial range of the data, the types of files used, a specific query id, or other ways of describing how the data were subsetted.
Amy West

Publishing Data - 3 views

  •  
    The Australian national data service site; cool because they distinguish between depositing data and registering its existence and each is a different service. Even though we're a single university, offering a registry might appeal to those sections of the U that are more interested in autonomy while still giving us some information about who does what here on campus.
Amy West

Liveblog: BRDI: Author Deposit Mandates for Federal Research Grantees : Gavin Baker - 0 views

  • DC Principles Coalition: We believe in free access to science, within the constraints of our business models.
  • The public doesn’t need access to the full articles
  • The problem is that consumers want everything for free.
  • ...6 more annotations...
  • Repositories can do all the functions of journals except quality control, and we don’t want government doing that.
  • Social sciences often left out of discussions about data curation, open access, etc.
  • We could argue that taxpayers paid for the research in general, not necessarily each publication.
  • But the Public Access Policy requires the peer-reviewed manuscript, not the one after which the publishers add value. The America COMPETES model, for un-peer-reviewed grant proposals, is almost useless to the public. In health, you want the refereed results, not the grantee’s report to the agency.
  • If journals can’t survive, from an economic perspective, that’s not harm — it’s just a failure to adapt.
  • Journal growth trends with funding for researchers. As universities want to be more prestigious, they aim to publish more. Trying to have access to everything requires too much money — you have to prioritize.
Amy West

PLoS Computational Biology: Defrosting the Digital Library: Bibliographic Tools for the... - 0 views

  • Presently, the number of abstracts considerably exceeds the number of full-text papers,
  • full papers that are available electronically are likely to be much more widely read and cited
  • Since all of these libraries are available on the Web, increasing numbers of tools for managing digital libraries are also Web-based. They rely on Uniform Resource Identifiers (URIs [25] or “links”) to identify, name, and locate resources such as publications and their authors.
  • ...27 more annotations...
  • We often take URIs for granted, but these humble strings are fundamental to the way the Web works [58] and how libraries can exploit it, so they are a crucial part of the cyberinfrastructure [59] required for e-science on the Web.
  • link to data (the full-text of a given article),
  • To begin with, a user selects a paper, which will have come proximately from one of four sources: 1) searching some digital library, “SEARCH” in Figure 4; 2) browsing some digital library (“BROWSE”); 3) a personal recommendation, word-of-mouth from colleague, etc., (“RECOMMEND”); 4) referred to by reading another paper, and thus cited in its reference list (“READ”)
  • There is no universal method to retrieve a given paper, because there is no single way of identifying publications across all digital libraries on the Web
  • Publication metadata often gets “divorced” from the data it is about, and this forces users to manage each independently, a cumbersome and error-prone process.
  • There is no single way of representing metadata, and without adherence to common standards (which largely already exist, but in a plurality) there never will be.
  • Where DOIs exist, they are supposed to be the definitive URI. This kind of automated disambiguation, of publications and authors, is a common requirement for building better digital libraries
  • Publication metadata are essential for machines and humans in many tasks, not just the disambiguation described above. Despite their importance, metadata can be frustratingly difficult to obtain.
  • So, given an arbitrary URI, there are only two guaranteed options for getting any metadata associated with it. Using http [135], it is possible to for a human (or machine) to do the following.
  • This technique works, but is not particularly robust or scalable because every time the style of a particular Web site changes, the screen-scraper will probably break as well
  • This returns metadata only, not the whole resource. These metadata will not include the author, journal, title, date, etc., of
  • As it stands, it is not possible to perform mundane and seemingly simple tasks such as, “get me all publications that fulfill some criteria and for which I have licensed access as PDF” to save locally, or “get me a specific publication and all those it immediately references”.
  • Having all these different metadata standards would not be a problem if they could easily be converted to and from each other, a process known as “round-tripping”.
  • many of these mappings are non-trivial, e.g., XML to RDF and back again
  • more complex metadata such as the inbound and outbound citations, related articles, and “supplementary” information.
  • Personalization allows users to say this is my library, the sources I am interested in, my collection of references, as well as literature I have authored or co-authored. Socialization allows users to share their personal collections and see who else is reading the same publications, including added information such as related papers with the same keyword (or “tag”) and what notes other people have written about a given publication.
  • CiteULike normalizes bookmarks before adding them to its database, which means it calculates whether each URI bookmarked identifies an identical publication added by another user, with an equivalent URI. This is important for social tagging applications, because part of their value is the ability to see how many people (and who) have bookmarked a given publication. CiteULike also captures another important bibliometric, viz how many users have potentially read a publication, not just cited it.
  • Connotea uses MD5 hashes [157] to store URIs that users bookmark, and normalizes them after adding them to its database, rather than before.
  • he source code for Connotea [159] is available, and there is an API that allows software engineers to build extra functionality around Connnotea, for example the Entity Describer [160].
  • Personalization and socialization of information will increasingly blur the distinction between databases and journals [175], and this is especially true in computational biology where contributions are particularly of a digital nature.
  • This is usually because they are either too “small” or too “big” to fit into journals.
  • As we move in biology from a focus on hypothesis-driven to data-driven science [1],[181],[182], it is increasingly recognized that databases, software models, and instrumentation are the scientific output, rather than the conventional and more discursive descriptions of experiments and their results.
  • In the digital library, these size differences are becoming increasingly meaningless as data, information, and knowledge become more integrated, socialized, personalized, and accessible. Take Postgenomic [183], for example, which aggregates scientific blog posts from a wide variety of sources. These posts can contain commentary on peer-reviewed literature and links into primary database sources. Ultimately, this means that the boundaries between the different types of information and knowledge are continually blurring, and future tools seem likely to continue this trend.
  • he identity of people is a twofold problem because applications need to identify people as users in a system and as authors of publications.
  • Passing valuable data and metadata onto a third party requires that users trust the organization providing the service. For large publishers such as Nature Publishing Group, responsible for Connotea, this is not necessarily a problem.
  • business models may unilaterally change their data model, making the tools for accessing their data backwards incompatible, a common occurrence in bioinformatics.
  • Although the practice of sharing raw data immediately, as with Open Notebook Science [190], is gaining ground, many users are understandably cautious about sharing information online before peer-reviewed publication.
  •  
    Yes, but Alexandria was also a lot smaller; not totally persuaded by analogy here...
David Govoni

Scratchpads | Biodiversity Online - 0 views

  •  
    "Scratchpads are an easy to use, social networking application that enable communities of researchers to manage, share and publish taxonomic data online. Sites are hosted at the Natural History Museum London, and offered free to any scientist that complet
Lisa Johnston

Data Archiving - The American Naturalist - 2 views

  •  
    Science depends on good data. Data are central to our understanding of the natural world, yet most data in ecology and evolution are lost to science-except perhaps in summary form-very quickly after they are collected. ... Yet these data, even after the main results for which they were collected are published, are invaluable to science, for meta‐analysis, new uses, and quality control.
Amy West

Liveblog: BRDI: Briefings from Federal Interagency Data and Information Groups : Gavin ... - 0 views

    • Amy West
       
      What are they talking about here?
  • But people in the libraries think that someone is supposed to do the work for them before they do anything — originally done by publishers.
Amy West

National Patterns of R and D Resources - 0 views

  •  
    Describes and analyzes current patterns of research and development (R&D) in the US. In years when the full report is not published, the Division of Science Resources Statistics makes available "data update tables" to provide public access to the most c
Amy West

IPYDIS: How to Cite a Data Set - 2 views

  •  
    maybe some more info for our Data citation page on the website...nice examples anyway
1 - 14 of 14
Showing 20 items per page