New home for Discovering Biology in a Digital World

Sometime in the next day or two, Scienceblogs will shut down.  We’ve enjoyed the opportunity to blog here for the past 10+ years. Not to worry, @digitalbio and @finchtalk will continue blogging, but more so from their own site at Digital World Biology.  The Scienceblogs posts have been reposted at Digital World Biology’s scienceblog archive, and new posts will be at Discovering Biology in a Digital World, now at Digital World Biology.
@digitalbio, @finchtalk

Synbiobeta: The Future is Now

@synbiobeta concluded it’s #sbbsf17 annual meeting on synthetic biology Oct 5, 2017. The progress companies are making in harnessing biology as a platform for manufacturing and problem solving is world changing.

Locations of Synbio Companies

What is Synthetic Biology?
Synthetic biology is a term that is used to describe the convergence of biotechnology and engineering. The dramatic cost decreases in our ability to read and write DNA (sequence and synthesize), combined with increasing capabilities in automation and informatics has catalyzed a new sub-industry within biotechnology. Synthetic biology utilizes high-throughput combinatorial approaches to quickly evolve living systems to make new drugs, foods, and chemicals, to mention a few. It is the culmination of nearly 50 years of biotechnology discovery and refinement, The new products created by biological systems are poised to tackle many of the world’s greatest problems
While the core activities of synthetic biology, gene cloning and constructing plasmids for gene expression have been around for nearly 50 years, synthetic biology capitalizes on technologies that miniaturize and implement processes in massively parallel formats. Miniaturization, be it solid phase or micro/nano container formats, significantly reduces reagent costs. Massively parallel formats increase the numbers of reactions and operations that can run simultaneously from a few hundred, as is done with 96-well plates, to millions and billions, as are now routine in DNA sequencing and DNA synthesis. These capabilities allow researchers to test new designs to optimize enzymes, create specialty proteins, and control gene expression in random and directed ways.
The Synbiobeta Meeting
The @synbiobeta meeting celebrated its seventh year with more than 600 attendees and 50 exhibiting companies. Most of the presentations were by new companies demonstrating both the current and future impact of synthetic biology on the economy. A major theme being how biological systems will be able to produce materials that are currently produced from petroleum products. Replacing oil’s pervasive presence is an important goal in battling climate change, reducing environmental pollution, and promoting national security.
The presentations were organized around themes with several key notes and “fireside” chats that involved interview formats. Jason Kelly, CEO of Ginkgo Bioworks kicked the meeting off by highlighting his favorite Synthetic Biology companies. These included Bolt Threads, harnessing spider silk production in bacteria; Impossible Foods, working on “cellular agriculture” – the flavor is in the heme;  and Twist Bioscience, leading the way with oligo and gene synthesis. Ginkgo recently agreed to purchase one billion bases of synthesized DNA from Twist, demonstrating the scale of synthetic biology. Kelly also talked about the CRISPR pig (link is external)that recently appeared in Science Magazine. George Church’s group, in collaboration with many others, used CRISPR-based gene editing to remove endogenous retroviruses, which will opens opportunities for future xeno-transplantation. Of course there likely will be issues with immunity to those pesky pig proteins and more CRISPieR pigs will be needed before goal of xeno-transplantation can be fully realized. Kelly closed his talk by discussing how foundry concepts from industrial processes are being applied to biology to emphasize how the convergence of engineering in biotechnology.
The remaining talks were organized into themes that emphasize both what is possible in synthetic biology and what is needed to enable those possibilities. Sessions like Biomaterials and Consumer Products, Cell Factories for Biopharmaceuticals and Healthcare, Thought for Food, Innovations for Ocean Sustainability,  Protein is the Killer App, Environmental Application of Synthetic Biology, and the keynotes, had an obvious focus on what is being done with synthetic biology, whereas sessions like the Panel Discussion on the Future of DNA Synthesis, Computation and Synthetic Biology, and Sponsor presentations from LabcyteIDTTwistGenScript, had focus on the enabling technologies needed to do synthetic biology.
The next blogs will share details from these sessions. In the mean time check out the companies enganged in synthetic biology.

What is Biotech?

The biotechnology (biotech) industry is incredibly diverse. Recently, I wrote about the size of the biotech industry, which is, of course, related to how biotechnology is defined. As a strict definition, biotechnology is the use of biology to turn raw materials into useful products. However, the act of developing a biotech product requires many enabling technologies, reagents, and services that form today’s modern industry.

The term biotechnology was first coined in 1919 by Károly Ereky, a Hungarian agricultural engineer, who foresaw a time when biology could be used for turning raw material into useful products. The emerging field of synthetic biology represents the natural progression of this idea as our ability to synthesize gene sequences and engineer biochemical pathways and even entire microorganisms in rational designs for a myriad of purposes from speciality chemicals, to food, to energy improves.
While biotechnology products such as bread, wine, and beer, have been around for millennia, the earliest biotechnology companies, as exemplified by Genentech, were founded in the late 1970s after the initial discoveries of restriction enzymes and the realization they could be harnessed for use in DNA cloning. Many of these companies focused on producing human therapeutic proteins, like human insulin, in cost-effective ways. To carry out this work, these companies also needed reagents such as restriction enzymes that were in themselves biotech products. Hence, an ecosystem of companies developed into a larger industry.
That industry today is diverse and includes companies with therapeutic missions; technology focused companies that provide analytical instrumentation, systems for automation, reagents for assays and production; companies that focus on diagnostics for determining appropriate therapeutic and medical interventions; service organizations that specialize in using advanced technology as well as providing clinical trial, regulatory, and other experience to groups; and software companies that specialize in different kinds of informatics. Some companies are very large aggregates of many specialties and others, such as startups and early commercial ventures, are narrowly focused on a specific disease, application, or technology.
For students gaining hands on training at one of the more than 100 programs throughout the United States, or in degree programs in colleges and universities, the biotech industry provides many opportunities. Basic training in preparing solutions, working with DNA and proteins, performing immunoassays, and working with lab equipment provide a common set of skills that fulfill many job requirements. As the Bio-Link programs also emphasize the importance of record keeping in laboratory notebooks, such students are well-suited for positions in industry. In addition to general lab skills, Bio-Link programs may also offer specialized training that is suited to local industry needs.
The web site, with its database of over 5600 (and growing) biotechnology -companies and employers, provides an overview of the industry. Each biotechnology-company in the database has one or more assigned terms as a way to describe a business’ core activity. These terms can be used to filter companies, based on what they do, to understand opportunities for educational objectives, trends for instruction, and job prospects for those seeking employment. The data can also be used to characterize the industry in general and local (by state) ways.
Presently, nearly 400 terms are used to describe the industry. When the top 100 terms are visualized in a word cloud (above), where the size of a term indicates how many companies have that term, several themes can be be observed. A large number of companies (667) are engaged in small molecule development. Who knew small molecules could be so big? A majority of these companies are traditional pharmaceutical companies, but as these companies can also have biotech products they are part of the biotechnology industry. The second largest category are medical device companies. These are in biotechnology because, like pharmaceutical companies, some device companies also make biotechnology products, and some devices are made from biological materials. Other terms in the word cloud emphasize the ecosystem nature of the biotechnology industry. Antibodies, for example, can be reagents in diagnostic assays. They can also be therapeutics. Contract services and research are activities that support other companies.
Finally, these terms show the vast diversity of the industry. As noted describes the industry with nearly 400 terms. When the frequency of each term is examined, one sees that many terms are used only a few items; 279 terms are associated with 10 or fewer companies, and 115 terms are associated with only single companies. This long-tail appears because biotechnology is concerned with solving new problems by translating research discoveries into useful products to benefit society. As such the biotechnology field is always evolving–just like biology.

How Big is Biotech?

A simple web search says biotech is really big. One estimate indicates that the industry will have $400 billion in sales in 2017 with growth to over $775 billion by 2024 [1]. Another report suggests there are over 77,000 employers [2]. That’s big, but is it real, and what you can do with this information?

Worldwide locations of biotechnology employers.  Source

At we’re interested in helping students and graduates of biotech programs at community and four-year colleges learn about the multitude of opportunities available in the biotech industry. To be helpful we need to know more about the industry than the big numbers. After all, telling someone they have great opportunity because there is a trillion dollar industry with over 77,000 companies doesn’t help if we don’t have context.  We need to know how good those numbers are, where the opportunities are located, and many other details.
One challenge companies face is to understand their real market opportunity in the context of a $400 billion market.  Students who are considering a biotech career, need to know about their long-term prospects as part of this industry.  These prospects are related to a company’s opportunities for sustainability and growth.  These opportunities are a function of the company’s addressable market.
As part of our consulting activities at Digital World Biology, we engage in market analyses for groups developing biotechnology businesses. Big numbers like $400 billion always shrink when specific addressable markets that are suitable for a given technology are considered. This is due to the fact that biotechnology is a “long tail” industry.  In fact, because of the long tail, uses nearly 400 keywords to characterize employers’ business activities in general and specific ways.
Biotech products and services are a part of a diverse continuum of enabling technologies (and reagents), diagnostic technologies and targets, therapeutic interventions, or biologic improvements. For example, DNA sequencing, protein purification, and mass spectrometry are enabling technologies that are used in specific applications and thus have their own addressable target market segments. As platform technologies they have larger addressable markets than services that employ these technologies for various applications, but compared to the entire biotechnology market, these segments represent a few percent of the total. Similarly diagnostics can be a platform technology, like genotyping, or be disease related – genotyping for certain conditions, detecting particular infectious agents, or be quality-control related as in food safety. Therapeutics, synthetic biology, and other areas considered biotechnology follow similar patterns. In all, the addressable markets for companies in these areas can be between $10’s of millions to $10’s of billions.
A harder problem is defining the number of potential employers. As noted, one estimate claims there are 77,000 employers [2]. This value is based on NAICS codes. NAICS (North American Industry Classification System) codes are used by federal agencies to classify businesses. Over 15 million companies are listed in the NAICS database [3]. When you start a business you need to tell the government what your business does by picking a NAICS code. Sounds simple, right? The challenge is that NAICS codes often do not exist for innovative companies – the very kind that are continually emerging in the biotechnology industry. So, a new biotechnology company does its best to pick a NAICS code, which leads to overestimates because many NAICS codes include a large number of unrelated businesses. For example, NAICS codes associated with agriculture, for which agri-biotech would fit also include traditional farming-based organizations. Other examples of over counting occur in NAICS codes for hospital suppliers, various kinds of medical wholesalers, medical laboratories, and many others.
How can we tell if NAICS codes overestimate the  number of biotechnology employers? One way is by looking at different directory resources. Such directories are compiled by individuals with experience in the field. presently lists over 5600 worldwide employers in over 8,000 locations. These data were compiled from several sources and are updated on a regular basis from industry specific news feeds and other sources. Other sites advertising publicly available directories have similar numbers. (an industry lobbying group) directory lists approximately 1500 member organizations. This lower number is due the fact organizations have to self identify, but it is helpful in understanding what kinds of organizations consider them selves to be biotechnology related. Finally, one organization claims a fee-based directory of nearly 13,000 bioscience, life sciences, biotechnology, pharmaceutical, and medical supply organizations.
Clearly, the size of the biotechnology industry is related to how biotechnology is defined. The broadest definitions include multiple sub industries that span the gamut of human health, energy production, and agriculture. The overall dollar value of the market is likely reasonable, but an expert-based consensus of the overall size of the industry appears to be closer to 10,000 organizations, rather than 77,000.
[2] The Value of Bioscience Innovation in Growing Jobs and Improving Quality of Life,

BioDatabases 2017 – What's out there?

It’s time for the annual blog about the annual Nucleic Acids Research (NAR) database issue. This is the 24th database issue for NAR and the seventh blog for @finchtalk. Like most years I have no idea what I’m going to write about until I start reading the new issue. Something always inspires me.
This year’s inspiration came from missing data.

In 2017, NAR lists 1662 databases or 23 fewer than last year.

As summarized in the database issue’s introduction, Galperin, Fernández-Suarez, and Rigden tell us this year’s issue has 152 papers. 54 of those describe new databases, 98 provide updates, and 16 are updates of databases that have been published elsewhere.  18 duplicate entries and 30 obsolete database have been removed.  But we are not told how many databases are in the catalog. That is an exercise for the reader.
Given that last year the authors stated that there were 1685 databases one would assume that this year’s total would be 1685+54+16-18-30=1707, or 1691 if the 16 updated databases were in the catalog and just described somewhere else. But, since we are not told that, we need to figure it out on our own.
Fortunately, the entire list of databases is available, so all you have to do is visit the page and count the entries. Ok, that would be tedious and take forever because you’d have to check your  work  and likely get lost several times doing so. Instead, one can capture the text and write a Perl script to count the entries.  When I did this, I got 1662 for an answer. This is neither 1707 nor 1691. As the catalog is maintained through the year, more databases have likely been removed than were reported in the article.
As I counted the entries, I also looked at the titles and descriptions and thought about what could we learn from this information. After all, these 1662 databases are used to develop scientific knowledge. Can we use this data to learn about the kinds of things scientists are interested in?
Now my simple Perl script grew from a command line that counted empty lines to a script that had to grab the second line of each entry – triggered by an empty line using a state machine, with an initialization to get the first entry – parse that second line and count the words. For students interested in bioinformatics, this is a common exercise with data.
Once that was done, a review of the words indicated some clean up was in order. Common words, that added little value, were removed. Also, plurals were converted to singular forms to avoid duplication of terms. The last step was to use wordle™ to create a tag cloud of terms found in the database descriptions.
So, what did I learn?
First, database is the most common term. Nearly 25% of the descriptions use that term. The next most frequent term is protein, which is followed by gene, genome, human, sequence, and data. The term structure, something we’re interested in at Digital World Biology, is the eighth most frequent term. It is followed by genomic, interaction, and expression.
While DNA sequencing captures attention in the news, understanding how genotypes impact phenotypes requires that we deeply understand the relationship between sequence, structure, and function. Thus, it is not surprising that the most common terms describing biological databases would include words that describe this relationship.   frequency-of-terms1
The other interesting finding is the sheer number of unique words. The tag cloud above summarizes 150 of 2370 total words. To be listed in the tag cloud a word had to used at least nine times.  Words used only once occurred over 1500 times. These are interesting and instructive too. A few of the words indicate that there are databases that include information on waterfleas, mites, exosomes, leptospira, paramecium, amoebazoa, honey,  plexipus, bananas, and many others. The words used once list also includes misspellings, word fragments, and words that add context to descriptions, many of these are chemical and biochemical terms.
The real importance of the number and variety of words used to describe the databases however, is that biological databases store and organize data and information about biology.  And, the complexity of biology cannot be stored in a single source.

Understanding the CRISPR Cas9 system

On Sept. 30th, I’m going to be co-presenting a Bio-Link webinar on Genome Engineering with CRISPR-Cas9 with Dr. Thomas Tubon from Madison College.  If you’re interested, Register here.  Since my part will be to help our audience understand the basics of this system, I prepared a short tutorial with Molecule World .  Enjoy!
A Quick CRISPR Tutorial

  1. Go to the Digital World Biology CRISPR Structure Collection.
  2. Download the second item in the list, 5F9R, by clicking the link in the Download structure column.
  3. Identify the three components of the CRISPR – Cas system:  The Cas9 protein, the guide RNA, and the target DNA.
  4. Use the camera button to capture an image of each component and paste it in your lab notebook.

To view the individual components, find and highlight the different kinds of sequences one at a time.

Use a combination of hiding structures and changing the coloring and drawing styles to make it easier to distinguish the parts.

You may want to zoom in and zoom out to see how the nucleic acids are positioned within the protein.

Suggestion – one nice way to view the structure is to select the protein, show all the residues, make it spacefill, and color it neutral.  Lock the chains.

Then select the nucleic acids to highlight them and change the coloring and drawing styles so you can see how they’re positioned in the protein.


5.  When, you’re done exploring, use Reset View to restore the original drawing and coloring styles.


6.  Select the DNA and RNA and apply the residue coloring style.


7.  Deselect the DNA, hide the unselected residues, and hide the sequence viewer.


8.  Turn the structure around to view it from all angles.

Q1.  Do you see anything interesting about the structure of the RNA?

This is the guide RNA.  It has a special shape that’s recognized by the Cas protein.

9.  Select the DNA and hide everything else.

Q2.  Where do you think the DNA gets cut?

10.  Lock the DNA and protein chains.


11.  Select nearby residues to see how part of the RNA interacts with the target DNA.


Q3.  How does the selected part of the RNA interact with the target DNA?

12.  Hide the unselected residues.

Q4.  Now, where do you think the DNA gets cut?

13.  Look at the RNA sequence.  Select the base on the 3′ side of the last highlighted base.  (This should be a U).


14.  Open the color key.

Q5.  What kinds of bases are forming pairs?  Notice the RNA and DNA sequences are complementary.

The GUU sequence is a special motif that helps control where the DNA gets cut.

Q6.  What part of the RNA sequence would you change if you wanted to cut a different sequence of DNA?

Hint:  It may help, to unlock the chains and apply Rainbow coloring to identify the 5′ and 3′ ends of the DNA and RNA molecules.

Teach Biology? We want to learn about your use of computers in the classroom

Computers, biological data (molecular sequences, structures, and other data), websites, and databases are integral to modern research. Innovations like precision, or personalized medicine, expect a certain level of patient participation, and our future food and environmental sustainability will require that society can access a multitude of computer-based resources. Thus, higher education has an important role in providing students with employable skills as well as the ability to use data to make important personal and societal decisions. Toward that goal it is worthwhile understanding how computers are being used in biology education today.

The Network for Integrating Bioinformatics into Life Sciences Education (NIBLSE; “nibbles”) is a National Science Foundation Research Coordination Network for Undergraduate Biology Education (RCN-UBE) devoted to establishing bioinformatics as essential to the undergraduate life sciences curriculum. To that end, we are asking the community to help us determine core bioinformatics competencies for the undergraduate curriculum.

We are asking you to complete a short, anonymous survey if you are in one or more of the following groups:

  • Educators who teach undergraduate life sciences at a 2-year or 4-year college, university, or technical school.
  • Educators who supervise graduate students and who expect, or would like to expect, graduate student familiarity with bioinformatics.
  • Biologists and/or bioinformaticians who teach/provide training in bioinformatics as part of their work at a company or organization, but not as part of a for-credit course at a college or university.

The survey should take you approximately 15 minutes to complete.

Follow this link to the Survey: Take the Survey

Or copy and paste the URL below into your Internet browser:

If you know someone who would be interested in taking the survey, please share the link below.  As the URL above will only work once, such respondents should follow this link to the survey.
We invite you to read more about our activities and other ways to contribute and provide feedback at our project website or contact us at the address below. Thank you in advance for your input.
The NIBLSE Leadership Team
Mark Pauley (, University of Nebraska at Omaha
Elizabeth Dinsdale, San Diego State University
William Morgan, College of Wooster
Anne Rosenwald, Georgetown University
Eric Triplett, University of Florida
This survey is covered by IRB 161-16-EX. NIBLSE is supported by NSF Award #1539900.
NIBLSE is a proud partner of QUBES.

Zika virus, drug discovery, and student projects

It’s well understood in science education that students are more engaged when they work on problems that matter.  Right now, Zika virus matters.  Zika is a very scary problem that matters a great deal to anyone who might want to start a family and greatly concerns my students.
I teach a bioinformatics course where students use computational tools to research biology.  Since my students are learning how to use tools that can be applied to this problem, I decided to have them apply their new bioinformatics skills to identify drugs that work against Zika virus.
We don’t have the lab facilities to test drug candidates, but it’s nice for students to realize they’re learning skills that could be put to use.
Here’s what we’re doing:

  1. Looking at background information about Zika virus.
  2. Using blastp to identify related proteins that are also bound to drugs.
  3. Using molecular modeling to see if those drugs might also bind to Zika virus proteins.

Getting up-to-speed on Zika virus
We found a great compilation of Zika resources at the NCBI.  CIDRAP has a great set of Zika resources as well.
My students go to the NCBI Zika resource, select the link to publications, and scan the titles to see what’s new.  This list is a bit overwhelming, so I ask them to focus on the first and last sentences in the abstract from P. Brasil et. al., Zika Virus Infection in Pregnant Women in Rio de Janeiro, and on this publication from Tang, et. al.  They need to identify birth defects associated with Zika virus infection and summarize two kinds of data that support the association between infection and birth defects.
Next, they use the Health Map link to see where infections are occurring.  It gets more personal when you see cases happening in your state.

Health Map shows Zika virus cases in real time.
Health Map shows Zika virus cases in real time.

We also look at the ViralZone page from Expasy to learn about the Zika life cycle and see how the Zika polyprotein gets chopped into smaller parts.  This has a link to an interesting Wikipedia page for a Zika virus receptor (DC-SIGN or CD209) that appears to be expressed in the uterus and on brain cells–at least that’s my interpretation of the RNA expression data.
But, it’s easy to get lost clicking too many links, so we go on to protein blast.
Identifying potential drug targets with BLAST
I think the easiest way to find a drug against a virus is to start by looking at compounds we already know about.  We know that many successful antiviral drugs target viral proteases and polymerases, so my students go to the Zika virus reference genome (thanks NCBI!) and get the protein sequences for the Zika virus protease NS3 and the Zika virus RNA dependent RNA polymerase.
Then they use protein blast to search the NCBI structure database and see if there are 3D structures from related viruses that are bound to drugs.
Once they’ve found a structure to work with, they reverse the search and use blastp to compare their new sequence to the sequence of the Zika protein.
Using molecular models to see if drugs might bind to Zika virus
Once our students have found structures that contain a drug, they look at amino acids that are near the drug to see if those residues are similar to those in Zika virus.
Would Sovaldi® (Sofosbuvir) work against Zika virus?
Whenever possible, I like to give examples to show an investigation might work.  When I noticed that some of my blast results included proteins from Hepatitis C virus, I decided to use this as an example.  There’s a drug that works by inhibiting the RNA polymerase in Hepatitis C  (Sovaldi® from Gilead), so I decided to find out if it might work against Zika as well.
Hepatitis C virus RNA polymerase bound to Sovadi® (Sofosbuvir) from 4WTG colored by charge.
Hepatitis C virus RNA polymerase bound to Sovadi® (Sofosbuvir) from 4WTG colored by charge.

First, I did a blastp search and compared the protein sequence from the structure 4WTG against Zika virus RNA polymerase.
blastp results from comparing Zika virus RNA polymerase to the Hepatitis C virus polymerase in 4WTG
blastp results from comparing Zika virus RNA polymerase to the Hepatitis C virus polymerase in 4WTG.

Only 25% of the amino acids are identical, but the E value is 0.007, so that’s encouraging.   I decided to take a closer look.
I used 4WTG as a query sequence in blastp to align it to the Zika virus polymerase sequence.  Then, I downloaded the 4WTG structure and opened it in Molecule World. I selected the drug and used the Select Nearby feature to identify amino acids that might be bound to the drug. Returning to the aligned sequences, I highlighted those amino acids in the alignment.
Interestingly, the drug binds to amino acids that are present in the same positions in both Zika virus RNA polymerase and in the Hepatitis C virus RNA polymerase.  Cool!
I took a closer look.  In the top image, two manganese atoms bound to the drug are also bound to aspartic acid residues.  These are present in both proteins.
Amino acids that interact with Sovaldi® are colored by residue in Molecule World and drawn as tubes.
Amino acids that interact with Sovaldi® are colored by residue in Molecule World and drawn as tubes.

In the bottom image, I can see an arginine that’s present in both proteins.  Here, it appears to participate in an ionic interaction with the drug.
Amino acids that interact with Sovaldi® are drawn with in a space filling mode and colored by element in Molecule World.
Amino acids that interact with Sovaldi® are drawn with in a space filling mode and colored by element in Molecule World.

Now, these models don’t prove that Sovaldi would inhibit Zika virus replication.  But it might be worth taking a look.  If I were culturing brain stem cells like Tang, et. al (3), I might take out a loan to buy some Sovaldi® and add it to the growth medium.   Just to see what happens.
For now, I’m looking forward to seeing what my students find.
Note:  All the molecular modeling work described here was carried out with the Molecule World iPad app from Digital World Biology.

  1.  The Zika Virus Resource at the National Center for Biotechnology Information
  2. Brasil P,  Zika Virus Infection in Pregnant Women in Rio de Janeiro N Engl J Med. 2016 Mar 4. [Epub ahead of print]

  3. Tang et al., Zika Virus Infects Human Cortical Neural Progenitors and Attenuates Their Growth, Cell Stem Cell (2016),
  4. Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs”, Nucleic Acids Res. 25:3389-3402.
  5. Appleby TC, Perry JK, Murakami E, Barauskas O, Feng J, Cho A, Fox D 3rd,
    Wetmore DR, McGrath ME, Ray AS, Sofia MJ, Swaminathan S, Edwards TE. Viral
    replication. Structural basis for RNA replication by the hepatitis C virus
    polymerase. Science. 2015 Feb 13;347(6223):771-5. doi: 10.1126/science.1259210.


DNA: it's in your blood

Did you know small fragments of DNA are circulating in your blood stream?
These short pieces of DNA are left behind after cells self-destruct. This self-destruction, or apoptosis, is a normal process. In the case of fetal development, certain cells in our hands die, leaving behind individual fingers. Immune system cells leave traces of DNA behind after they’ve tackled invading microbes. DNA can also appear in the blood when people have cancer.
I had the good fortune, last Monday, to hear Matthew Snyder describe this cell-free DNA in a fascinating talk and learn why DNA in the blood can be a useful thing. It turns out that cell free DNA is a potentially useful tool for evaluating fetal health, guiding cancer treatment, and monitoring organ transplants.
According to Snyder, the use of cell free DNA for diagnosing trisomy 21, is one of the fastest growing molecular tests in the history of medicine.  Some of the rapid adoption of this test is driven by pregnant women who request it.
People are interested in the prospect of using cell free DNA for other kinds of tests as well. It could be used as a biomarker to indicate the presence of cancer, or perhaps other kinds of disease.
Snyder’s research involves sequencing this cell free DNA and trying to figure out where it came from. You might think that the DNA in one person would be pretty much the same from one cell to another. And with a few exceptions, like B and T cells, that’s the case. But the DNA fragments that float around in our blood aren’t random. We can only find DNA fragments in our blood because they were hidden from hungry nucleases during the self-destruction process. Normally, those enzymes would have chopped that DNA into tiny bits.
Cell free DNA exists because the proteins that transcribe DNA and the histone proteins that package it into nucleosomes also protect DNA from being digested.

Proteins protecting DNA from digestion.
Proteins protecting DNA from digestion.

The really interesting thing, in terms of cell free DNA, is that nucleosomes and transcription factors sit on different regions of DNA in different cells.  Since different bits of DNA get protected in different cells, we can sequence the cell free bits of DNA figure out where it came from.  That information can tell us about a type of cancer or help us evaluate the health of multiple cell types.
This structure has been colored by charge. The negatively charged DNA (red) is wrapped about the positively charged histone proteins. Blue represents a positive charge.
This structure has been colored by charge. The negatively charged DNA (red) is wrapped about the positively charged histone proteins. Blue represents a positive charge.

Snyder MW, Kircher M, Hill AJ, Daza RM, Shendure J. Cell-free DNA Comprises an In Vivo Nucleosome Footprint that Informs Its Tissues-Of-Origin. Cell. 2016 Jan 14;164(1-2):57-68. doi: 10.1016/j.cell.2015.11.050.