Group items tagged

Filter: All | Bookmarks | Topics Simple Middle

Bowtie: An ultrafast, memory-efficient short read aligner - 0 views

bowtie-bio.sourceforge.net

bowtie dna sequence genetics

shared by Mike Chelen on 06 Jan 09 - Cached

Mike Chelen on 06 Jan 09

Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of 25 million reads per hour on a typical workstation with 2 gigabytes of memory. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: 1.3 GB for the human genome. It supports alignment policies equivalent to Maq and SOAP but is much faster: about 35x faster than Maq and over 350x faster than SOAP when aligning to the human genome.

<div class="cArrow"> </div><div class="cContentInner">Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of 25 million reads per hour on a typical workstation with 2 gigabytes of memory. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: 1.3 GB for the human genome. It supports alignment policies equivalent to Maq and SOAP but is much faster: about 35x faster than Maq and over 350x faster than SOAP when aligning to the human genome. </div>

...

Cancel

SourceForge.net: CloudBurst - cloudburst-bio - 0 views

apps.sourceforge.net/...index.php

cloudburstbio mapreduce cloudburst bioinformatics genetics

shared by Mike Chelen on 17 Dec 08 - Cached

Mike Chelen on 17 Dec 08

CloudBurst: Highly Sensitive Short Read Mapping with MapReduce Michael Schatz Center for Bioinformatics and Computational Biology, University of Maryland Next-generation DNA sequencing machines are generating an enormous amount of sequence data, placing unprecedented demands on traditional single-processor read mapping algorithms. CloudBurst is a new parallel read-mapping algorithm optimized for mapping next-generation sequence data to the human genome and other reference genomes, for use in a variety of biological analyses including SNP discovery, genotyping, and personal genomics. It is modeled after the short read mapping program RMAP, and reports either all alignments or the unambiguous best alignment for each read with any number of mismatches or differences. This level of sensitivity could be prohibitively time consuming, but CloudBurst uses the open-source Hadoop implementation of MapReduce to parallelize execution using multiple compute nodes. CloudBurst's running time scales linearly with the number of reads mapped, and with near linear speedup as the number of processors increases. In a 24-processor core configuration, CloudBurst is up to 30 times faster than RMAP executing on a single core, while computing an identical set of alignments. In a large remote compute clouds with 96 cores, CloudBurst reduces the running time from hours to mere minutes for typical jobs involving mapping of millions of short reads to the human genome. CloudBurst is available open-source as a model for parallelizing other bioinformatics algorithms with MapReduce.

<div class="cArrow"> </div><div class="cContentInner">CloudBurst: Highly Sensitive Short Read Mapping with MapReduce Michael Schatz Center for Bioinformatics and Computational Biology, University of Maryland Next-generation DNA sequencing machines are generating an enormous amount of sequence data, placing unprecedented demands on traditional single-processor read mapping algorithms. CloudBurst is a new parallel read-mapping algorithm optimized for mapping next-generation sequence data to the human genome and other reference genomes, for use in a variety of biological analyses including SNP discovery, genotyping, and personal genomics. It is modeled after the short read mapping program RMAP, and reports either all alignments or the unambiguous best alignment for each read with any number of mismatches or differences. This level of sensitivity could be prohibitively time consuming, but CloudBurst uses the open-source Hadoop implementation of MapReduce to parallelize execution using multiple compute nodes. CloudBurst's running time scales linearly with the number of reads mapped, and with near linear speedup as the number of processors increases. In a 24-processor core configuration, CloudBurst is up to 30 times faster than RMAP executing on a single core, while computing an identical set of alignments. In a large remote compute clouds with 96 cores, CloudBurst reduces the running time from hours to mere minutes for typical jobs involving mapping of millions of short reads to the human genome. CloudBurst is available open-source as a model for parallelizing other bioinformatics algorithms with MapReduce. </div>

...

Cancel

genome.gov | A Catalog of Published Genome-Wide Association Studies - 0 views

www.genome.gov/26525384

genetics snp pubmed

shared by Mike Chelen on 28 Jan 09 - Cached

Mike Chelen on 28 Jan 09

The genome-wide association study (GWAS) publications listed here include only those attempting to assay at least 100,000 single nucleotide polymorphisms (SNPs) in the initial stage. Publications are organized from most to least recent date of publication, indexing from online publication if available. Studies focusing only on candidate genes are excluded from this catalog. Studies are identified through weekly PubMed literature searches, daily NIH-distributed compilations of news and media reports, and occasional comparisons with an existing database of GWAS literature (HuGE Navigator). SNP-trait associations listed here are limited to those with p-values < 1.0 x 10-5. Note that we are now including all identified SNP-trait associations meeting this p-value threshhold. Multipliers of powers of 10 in p-values are rounded to the nearest single digit; odds ratios and allele frequencies are rounded to two decimals. Standard errors are converted to 95 percent confidence intervals where applicable. Allele frequencies, p-values, and odds ratios derived from the largest sample size, typically a combined analysis (initial plus replication studies), are recorded below if reported; otherwise statistics from the initial study sample are recorded. Odds ratios < 1 in the original paper are converted to OR > 1 for the alternate allele. Where results from multiple genetic models are available, we prioritized effect sizes (OR's or beta-coefficients) as follows: 1) genotypic model, per-allele estimate; 2) genotypic model, heterozygote estimate, 3) allelic model, allelic estimate. Gene regions corresponding to SNPs were identified from the UCSC Genome Browser. Gene names are those reported by the authors in the original paper. Only one SNP within a gene or region of high linkage disequilibrium is recorded unless there was evidence of independent association.

<div class="cArrow"> </div><div class="cContentInner">The genome-wide association study (GWAS) publications listed here include only those attempting to assay at least 100,000 single nucleotide polymorphisms (SNPs) in the initial stage. Publications are organized from most to least recent date of publication, indexing from online publication if available. Studies focusing only on candidate genes are excluded from this catalog. Studies are identified through weekly PubMed literature searches, daily NIH-distributed compilations of news and media reports, and occasional comparisons with an existing database of GWAS literature (HuGE Navigator). SNP-trait associations listed here are limited to those with p-values < 1.0 x 10-5. Note that we are now including all identified SNP-trait associations meeting this p-value threshhold. Multipliers of powers of 10 in p-values are rounded to the nearest single digit; odds ratios and allele frequencies are rounded to two decimals. Standard errors are converted to 95 percent confidence intervals where applicable. Allele frequencies, p-values, and odds ratios derived from the largest sample size, typically a combined analysis (initial plus replication studies), are recorded below if reported; otherwise statistics from the initial study sample are recorded. Odds ratios < 1 in the original paper are converted to OR > 1 for the alternate allele. Where results from multiple genetic models are available, we prioritized effect sizes (OR's or beta-coefficients) as follows: 1) genotypic model, per-allele estimate; 2) genotypic model, heterozygote estimate, 3) allelic model, allelic estimate. Gene regions corresponding to SNPs were identified from the UCSC Genome Browser. Gene names are those reported by the authors in the original paper. Only one SNP within a gene or region of high linkage disequilibrium is recorded unless there was evidence of independent association.</div>

...

Cancel

SourceForge.net: Running CloudBurst on Amazon EC2 - cloudburst-bio - 0 views

apps.sourceforge.net/...index.php

cloudburstbio bioinformatics ec2

shared by Mike Chelen on 17 Dec 08 - Cached

Mike Chelen on 17 Dec 08

Hadoop comes bundled with launch scripts to simplify initializing an Amazon Elastic Compute Cloud (EC2) cloud for Hadoop. Once initialized, running CloudBurst is identical to running on a local cluster. If you use EC2 regularly with the same datasets (i.e. the human genome as a reference), you will probably want to copy the data once to Amazon Simple Storage Service (S3) so you can quickly copy the data from S3 to your cloud at low cost.

<div class="cArrow"> </div><div class="cContentInner">Hadoop comes bundled with launch scripts to simplify initializing an Amazon Elastic Compute Cloud (EC2) cloud for Hadoop. Once initialized, running CloudBurst is identical to running on a local cluster. If you use EC2 regularly with the same datasets (i.e. the human genome as a reference), you will probably want to copy the data once to Amazon Simple Storage Service (S3) so you can quickly copy the data from S3 to your cloud at low cost. </div>

...

Cancel

Welcome to BioConductor - bioconductor.org - 0 views

bioconductor.org

shared by Mike Chelen on 12 Dec 08 - Cached

Mike Chelen on 12 Dec 08

bioconductor.org Bioconductor is an open source and open development software project for the analysis and comprehension of genomic data.

<div class="cArrow"> </div><div class="cContentInner">bioconductor.org Bioconductor is an open source and open development software project for the analysis and comprehension of genomic data.</div>

...

Cancel

SourceForge.net: Vancouver Short Read Analysis Package - 0 views

sourceforge.net/...vancouvershortr

shared by Mike Chelen on 11 Dec 08 - Cached

Mike Chelen on 11 Dec 08

This package contains code for use with Short Read DNA Sequencing technologies, and includes packages for ChIP-Seq, Whole Transcriptome Shotgun Sequencing, Whole Genome Shotgun Sequencing, SNP Detection, Transcript expression and file conversion.

<div class="cArrow"> </div><div class="cContentInner">This package contains code for use with Short Read DNA Sequencing technologies, and includes packages for ChIP-Seq, Whole Transcriptome Shotgun Sequencing, Whole Genome Shotgun Sequencing, SNP Detection, Transcript expression and file conversion. </div>

...

Cancel

http://vancouvershortr.sourceforge.net/ - 0 views

vancouvershortr.sourceforge.net

shared by Mike Chelen on 11 Dec 08 - Cached

Mike Chelen on 11 Dec 08

This package of tools encompasses many of the common pieces of software required for the analysis of short read sequences produced by the Second Generation DNA sequencing machines. (eg. Illumina/Solexa sequencers, ABI SOLiD and 454). The focus of this project is on post-alignment analysis, thus the input for this process should be the files produced by sequence aligners such as MAQ, Eland or Exonerate. The output should be provided in several formats, including BED and WIG files which are readable by the UCSC Genome Browser.

<div class="cArrow"> </div><div class="cContentInner">This package of tools encompasses many of the common pieces of software required for the analysis of short read sequences produced by the Second Generation DNA sequencing machines. (eg. Illumina/Solexa sequencers, ABI SOLiD and 454). The focus of this project is on post-alignment analysis, thus the input for this process should be the files produced by sequence aligners such as MAQ, Eland or Exonerate. The output should be provided in several formats, including BED and WIG files which are readable by the UCSC Genome Browser.</div>

...

Cancel

Flags and Lollipops - Bioinformatics Blog - 0 views

www.ghastlyfop.com/...index.html

biology blog genetics genomics informatics

shared by Mike Chelen on 12 Sep 08 - Cached

Identifying statistical dependence in genomic sequences via mutual information estimates - 0 views

portal.acm.org/citation.cfm

biology free genetics knowledge science statistics

shared by Mike Chelen on 07 Sep 08 - Cached

OpenHelix Blog - 0 views

www.openhelix.com/blog

biology blog genetics genomics open science

shared by Mike Chelen on 12 Sep 08 - Cached

Science 2.0 - introduction and perspectives for Poland « Freelancing science - 0 views

freelancingscience.com/...on-and-perspectives-for-poland

science2.0 elearning eresearch

shared by Mike Chelen on 20 May 09 - Cached

transcript of Science 2.0 based on a presentation I gave on conference on open science organized in Warsaw earlier this month
...

Cancel
prepared for mixed audience and focused on perspectives for Poland
...

Cancel
new forms of communication between scientists
...

Cancel
...44 more annotations...
research become meaningful only after confronting results with the scientific community
...

Cancel
peer-reviewed publication is the best communication channel we had so far
...

Cancel
new communication channels complement peer-reviewed publication
...

Cancel
two important attributes in which they differ from traditional models: openness and communication time
...

Cancel
increased openness and shorter communication time happens already in publishing industry (via Open Access movement and experiments with alternative/shorter ways of peer-review)
...

Cancel
say few words about experiments that go little or quite a lot beyond publication
...

Cancel
My Experiment as an example of an important step towards openness
...

Cancel
least radical idea you can find in modern Science 2.0 world
...

Cancel
virtual research environment
...

Cancel
focus is put on sharing scientific workflows
...

Cancel
use case
...

Cancel
diagram of the “methods” sections from experimental (including bioinformatics analyses) publications
...

Cancel
make it easier for others to understand what we did
...

Cancel
can open towards other scientists we can also open towards non-experts
...

Cancel
people from all over the world compete in improving structural models of proteins
...

Cancel
helps in improving protein structure prediction software and in understanding protein folding
...

Cancel
combine teaching and data annotation
...

Cancel
metagenome sequences in first case and chemistry spectra in the second
...

Cancel
interactive visualizations of chemical structures, genomes, proteins or multidimensional data
...

Cancel
communicate some difficult concepts faster
...

Cancel
new approaches in conference reporting
...

Cancel
report in real time from the conference
...

Cancel
followed by a number of people, including even the ones that were already on the conference
...

Cancel
“open notebook science” which means conducting research using publicly available, immediately updated laboratory notebook
...

Cancel
The reason I did a model for Cameron’s grant was that I subscribed to his feed before
...

Cancel
I didn’t subscribe to Cameron because I knew his professional profile
...

Cancel
I read his blog, I commented on it and he commented on mine, etc.
...

Cancel
participation in online communities
...

Cancel
important part of Science 2.0 is the fact that it has human face
...

Cancel
PhDs about the same time
...

Cancel
first was from a major Polish institute, the second from a major European one
...

Cancel
what a head of a lab both would apply to will see
...

Cancel
gap we must fill, this is between current research and lectures we give today
...

Cancel
access to real-time scientific conversation
...

Cancel
follow current research and decide what is important to learn
...

Cancel
synthetic biology
...

Cancel
not all universities in world have synthetic biology courses
...

Cancel
didn’t stop these students, and they plan to participate in IGEM again
...

Cancel
not only scientists – there are librarians, science communicators, editors from scientific journals, people working in biotech industry
...

Cancel
community of life scientists
...

Cancel
even people without direct connection to science
...

Cancel
diverse skills and background
...

Cancel
online conference
...

Cancel
interact with them and to learn from them
...

Cancel

1 - 11 of 11

Showing 20▼ items per page