Episode 3 | Transcript

Episode 3: “DNA Barcoding and Projectome Mapping”

Published Sun, 9 Sep 2018 | Transcript

Allen Sulzen: Welcome to the Carbon Copies Podcast, episode three. I'm your host Allen Sulzen, and today we'll be continuing our workshop series in this episode with a presentation from Professor Tony Zador, of Cold Spring Harbor Laboratory, who's presenting work using the most recent advances in DNA barcoding to map a brain's projectome. The projectome differs from the more widely recognized connectome in that the former traces the brain's general neurite extent, while the latter explicitly captures the individual's synaptic connections that underlie the brain's interconnected network. Zador's work with DNA barcoding injects randomly generated DNA snippets from a vast library of possible base pair combinations into localized populations of neurons. The DNA is replicated with a virus so as to fill a given neuron wherever that neuron's dendrites and axons may stretch, including far across the brain. The brain is then imaged such that each barcode is uniquely exposed via color or direct sequencing, thereby revealing the entire projected structure throughout the brain, of the population of neurons hosted locally around the injection site. This technique has already been utilized by the well known Allen Brain Atlas. Zador further presented a nascent extension of this work that offers the possibility of using the same basic technique to map connectomes. Let's jump right in. I give you professor Tony Zador.

Tony Zador: Thanks for the introduction. I'm indeed gonna be talking about a different approach to uploading the mind to connectomics, based on molecular approaches. Basically, our idea is that we can use molecular biology to uniquely label all the cells in the brain, grind up the brain, in one way or another, and use high throughput DNA sequencing, to read out the projections and connections of the brain. I should say as a caveat right now, that we haven't achieved that goal yet. It has the potential plus side of much higher throughput, but there's a lot of technology development that needs to happen before that works. We're getting very close to making it work in the mouse and there are some more fundamental but perhaps not insurmountable challenges to getting it to work in humans. I think the bad news is that it's going to be very, very hard to copy every neuron, synapse and protein. Even if we're able to measure all of this, it would be very challenging to know what to do with that.

Would we make a huge simulation? If we made a huge simulation we would probably have to have very strong opinions about which of those components was important in order to sort of make up for the properties we didn't measure. I think the good news is that maybe we don't need to do it exactly that way. So I'll point out that the brain that each of us walks around with today is not the same as the one we were born with. It's not the one we had 25, five or even one year ago; yet we have this illusion of identity continuity. We feel like we're at some level, a continuous extension of the person we were when we were born, five years ago and certainly yesterday; and so I think that suggests that there's a path to uploading a brain in which we introduced some kind of a system that interacts with and mirrors our brains during life.

Such a system would have the potential to sort of represent the substrate for permanent uploading of our minds into potentially, a very different kind of a system. I'll say that I think we have little hope of uploading a brain if we don't understand many of the fundamental principles of brain function. I would say that we know a great deal about individual neurons and synapses and we have very little understanding of how they work together to produce thoughts, feelings, identity, memories and consciousness. What we need to do is understand simple circuits; I like to sort of put a historical perspective on this and just remind people that the first person I think, to really study circuits in the way that many of us are trying to do these days and tease apart the circuit was Descartes, who proposed a circuit by which a person withdraws his foot from a fire; a reflex, if you like.

The mechanism that he proposed was that the flames of the fire displace the skin which pulls on a tiny thread. The thread opens a pore in the ventricle, then the animal spirits flow from the ventricle through a hollow tube, and the animal spirits inflate the muscle of the leg causing withdrawal. Many of you may recognize that our modern interpretation of what goes on differs in some ways. We no longer believe in a precise location of the source of the animal spirits f. Nevertheless, I think Descartes did make an important move forward here by actually writing down a potential circuit that we could in principle copy and use to build a brain and I think that's the goal of many of us working on this. So why is circuitry important? Again, I don't need to belabor that point for this group.

I'll just say that we want to be able to reverse engineer the brain. To do that, I think we need to not only make a lot of measurements (as I do think those measurements are important) and develop new technologies to make those measurements, but we have to also really think hard about how those circuits that we get by making those measurements give rise to the kinds of things that would ultimately be needed to build a brain. The other really immediate need for understanding circuits is that most neuropsychiatric disorders arise from disruptions of neural circuitry. Autism, Schizophrenia, depression, addiction and Bipolar Disorder, are probably in large part indeed caused by disruptions of either structural or functional circuitry. The incredibly limited set of tools that we've been forced to work with until very recently, have really limited progress, to the point that, as many of you may know, major pharmaceutical companies have largely given up the search in their own labs for discovering new drugs.

This is because Big Pharma is good making drugs, as opposed to finding targets as we don't even understand what those targets are. That's really emphasized by the fact that in the last decade, the major frontier in treatments of neuropsychiatric disorders have been in deep brain stimulation circuit based treatments, that exploit the little that we do know about the organization of circuits. If we could know more about those circuits in mammals such as humans, or in models of brain function, this would be incredibly useful. So why is circuitry hard as shown here? The problem is simply that the axons are all tangled together.

That may not exactly be putting it simply, but that's really one of the big challenges. You'll see a bunch of neurons here; a bunch of wires, which are somehow related to neurons depicted here, along with neurons not on the screen that we really need to be able to track. Of course with Serial Electron Microscopy that's increasingly possible and yet that remains a very slow. The challenges are not just in the data collection but also in the data analysis. Although that technology is improving rapidly, I'm gonna propose that there is a shortcut that compliments the EM approach. Okay. Our core idea is to re-cast connectomics, as a problem of high throughput DNA sequencing, by labeling each neuron with a unique sequence or DNA barcode. The idea is that we don't have to trace neurons, but merely read the identity of a neuron, and its soma at any place we wish to look without looking at all the intervening sections. That greatly simplifies the problem and it allows us to exploit the tremendous improvements in DNA sequencing technology, and more generally Molecular Biology.

So why sequencing? Simply because sequencing is fast and cheap. Many of you may be familiar with the fact that sequencing has gotten cheaper and yet it's still striking when you look at this graph to see just how quickly DNA sequencing technology has improved. On the x axis is time since 2001, which is when the human genome was sequenced, and on the y axis is cost. For compare, this refers to costs on a log scale. So Moore's law on this is a straight line. Moore's law is a pretty healthy clip of improvement, and yet you see that the cost per genome has blown away Moore's law since about 2008. Any approach that you can hitch on to the incredible improvements of DNA sequencing, seems to me to be a win.

What I'm going to focus on, is the form of this technology that actually works really well in my lab today. We've set up a core facility and we now offer this as a service to collaborators. This is projectile mapping, which is the mapping of long range projections at high throughput at single neuron resolution. To put this into perspective and motivate it, I'm showing here a slide from the Allen Brain Atlas. What the Allen Institute did, was a huge service to the entire mouse neuroscience community. They systematically went through and made injections throughout the mouse brain with a virus encoding GFP. If they make the injection on the right, they will wait about three weeks or four weeks for the neurons that take up the virus to express it all through their axonal terminals.

Then you can look throughout the brain and wherever there is green, the neurons that you've injected sent a projection and they did this systematically, one injection per mouse, localized as far as they could to a single brain area. Then, for each of those thousand plus mice that they injected, they sliced the brain, took pictures and put the pictures up online. It's been an incredibly useful resource to all of us who work on the system. With that being said, it has limitations. The challenge is that bulk injections don't distinguish single neuron projection patterns as illustrated here. So on the left, you see an injection site of those three circles or three neurons. It is possible that if green is seen in the three target areas shown here, it would reflect the fact that neuron one projects exclusively to area one, neuron two to area two, and neuron three to area three. It could also be, that instead of a one to one mapping, there is an all to all protection mapping or with the brain being the brain, most likely it's complicated and that complexity actually is important to understanding what's going on.

That's the complexity that is missing when we use these kinds of bulk injections. The alternative to these bulk injections is to do single neuron fills, but that's incredibly low throughput. So here's a wonderful paper studying the projections from the cortex to the striatum in rats, for the purposes of this. What's important is that it's one neuron, correct. This very nice paper showed the reconstructions of 25 neurons from 28 rats. This is a very skilled group. Since then, throughput has increased a little bit, but it's still not going to allow us to map all the neurons in a single brain, at any throughput. The inspiration for our approach actually came from a technique developed by Jeff Lichtman and Josh Sanes, and published in 2007 called Brainbow.

The idea is that to make the tracing easier, it would be convenient if the axons had different colors. They had this brilliant idea of a very clever genetic trick for endowing each neuron a with a randomly selected collection of three colors drawn from a pallet of about a dozen. If you work out the combinatorics given by the genetics they did, that led to a theoretical diversity of about 200 colors in principle. That would actually be pretty nice, but unfortunately, color spaces were limited and hard to read. So although the theoretical diversity was out of 200, the number of resolvable colors that you could actually resolve with real microscopes, (actually it's not even an issue with the microscopes. It's the overlap of fluorophores) in general is less than 10 generously.

It is in actuality, probably closer to five. That's shown here, where those three purple neurons are probably different purples, but they can't be resolved. The core idea here is to replace colors with DNA sequences. The reason is simply that there are a lot of possible DNA sequences. Map sequence is the name of the technique we developed as a kind of digital brainbow where you replace colors with numbers and have each neuron express a unique sequence of DNA and genetic barcode with the theoretical diversity that is essentially unbounded. Just to drive that home, there are four nucleotides, ATGC. If you have a random barcode of length one, you have potential diversity of four of 2016. By the time you get up to 30,

you'll get a potential diversity of 10 to the 18, which is vastly more than the number of neurons in the mouse brain, a human brain, or for that matter, even in the human body. A random thirty-mer, is sufficient to uniquely label every cell in your body, uh, with very, very low chance of double labeling. More concretely, ten to the nine barcodes are sufficient to label all ten to the seven neurons in the mouse cortex with sufficient diversity so that the chance that two neurons will be labeled the same is pretty small.

In practice, how do we do that? We've generated a virus. Right now we're working with a particular virus called Sindbis. Viruses are generated in the lab from a plasmid. To generate a viral library, all we do is we take the plasmid that generates the virus, and we introduce these random sequences directly into the plasma.

So we get a plasmid library and those plasmids give rise to a viral library of sufficient diversity. There's the remaining challenge of getting the barcodes to the synapse, because typically most RNAs don't travel to distant axons. In this slide, I'm summarizing the PhD thesis of a very talented graduate student in my lab, Ian Peikon, who figured out a modification of a synaptic protein that is shown on the far right that allows that protein, which normally traffics to axon terminals to grab onto the RNA barcode through a so called n lambda domain. It grabs onto the box b component of the RNA barcode that we put in, and drags it out to the synapse. So our first proof of principle of this was to the locus coeruleus, was which published a year and a half ago at the time of this recording.

We studied the projections of the locus coeruleus throughout the brain; It was already known that the locus coeruleus projects throughout the cortex, relatively uniformly, but it was suspected and believed (but not really known), that the neurons and locus coeruleus really just kind of bathe the cortex in noradrenaline. So the locus coeruleus the sole source of noradrenaline to the cortex. Noradrenaline is a kind of wake up signal which tells the brain to go on alert. The usual view is that there's a cortex wide alert signal that goes off when the locus coeruleus is accurate. Alternatively, one can imagine that the locus coeruleus actually can specifically activate sub-parts of the cortex. So for example, if you need to specifically listen for sound, you might imagine that the auditory cortex is selectively turned on and maybe if you need to selectively be on the alert for light, it might be the visual cortex.

For the most part in these experiments, it turns out that the RNA acts like GFP. It acts like a bulk signal that bathes the brain. The way the experiment goes is we inject in the locus coeruleus (that cross in the back there), then cut the brain into slices, after which we extract the RNA from each of the slices and figure out which barcodes appear in each of those slices. Then we can demultiplex that and we can get a picture that looks like this, for example. Here is the spatial profile of projections for barcode 28, which corresponds to neuron 28. The projection strength is measured in barcode molecules, which we quantify carefully through sequencing. The x axis shown here is the rostral caudal position from those slices.

It's important to recognize that with this technique, the spatial resolution is determined at the time of dissection. So we collapsed all the barcodes that are found in whatever chunk of brain that we collect, and can resolve them further. This particular neuron you see, has a very specific spatial pattern of projections. Another neuron, (and that's at odds with the traditional model) has a much broader one, barcode 79 and there are even more complicated ones like I'm bi modal projections. In all we were able to get a thousand single neurons and their projections from just a handful of animals in just a couple weeks. So this is orders of magnitude more throughput for this system. *Inaudible*

So I'm more than bank tubes infinitely more. In general a thousand neurons in a handful of experiments is high throughput. On a technical note, many of you are probably worrying about double infections, i.e., what if you get two bar codes into the same neuron? In the interest of time, I'm not going to go over why that's not bad, but it isn't bad. You can ask me during the questions, What would be bad would be degenerate barcodes; insufficient barcode diversity that where the same barcode ending up in three different neurons and that would be disastrous. So we go out of our way to avoid that and then we confirm that we avoided it. If people have these technical questions, I invite you to ask me during the question period. I'll also say that more recently we have, in collaboration with Vogel's Lab who did a bunch of single neuron reconstructions in visual cortex, compared their reconstructions of 35 neurons in a couple of years to some map Sikh studies of those same projections, 533 neurons in a handful of experiments. For the purposes of this talk, what's important is that they agreed to essentially the resolution that we can determine in this experiment, and that there's a whole scientific story that I don't have time to go into, but that should be coming out sometime in the next few weeks, and some Bio archives right now.

We have a core facility which is up and running. We're now studying the projections in different animal models, different brain areas, and we're scaling up to do the whole brain in a single experiment. We tile the the entire cortex with barcode, then we take a much finer slices than in the previous work, align it to the Allen Brain, reference that list, then we can project these back to Allen reference coordinates. Therefore, in a single experiment in about a week for about 10,000 bucks, we can get the projection patterns of about 100,000 individual neurons. For each neuron, we can see the areas to which it projects. So here the strongest projections of two of these 100,000 neurons (these are randomly selected neurons to give you a feel for what kind of data we have). This is so much data of a different form, that although we've had these data now for almost a year to a year and a half, we're still figuring out how to make sense of them and what to say about them, however, we're writing this up right now. Nevertheless, none of this gets us to the connectome so far. I've only told you about the projectome.

In the last couple of minutes, I'd like to talk about the connectome. So the limitations of map seq are that it first of all, has a crude spatial resolution as I mentioned as it's limited by dissection. There is no notion of morphology or cell type gene expression. My own background is in computation and in physiology. The fact that anatomy is traditionally very hard to correlate or relate to neural activity (knowing what the neurons did when the neurons were still inside the animal's head and the animal was doing interesting things), is distressing. As I mentioned, I haven't told you how we can get from the projectome to the actual synaptic connections that were needed to figure out the connectome. The solution to all of those is a technique that we call Barsi, that's modified from in situ sequencing, which is a technique that was developed in George Church's lab which we learn from them through collaboration and then modified and adapted to the particular goals here.

Okay. So to tell you how that technology works, I have to actually take you through just a little bit of the nuts and bolts of conventional DNA sequencing. Of those people who do sequencing, only few have a vague idea of how the sequencing actually works. When I say conventional sequencing, I mean conventional short read, high throughput sequencing as done by Illumina. Here is how it works. On the left, you have a tube full of a DNA, little bits of DNA that you want, sequenced. You wash it onto a so called flow cell. Those little black bars sticking up are sequences that grab on to the Purple DNA that you want sequenced. Then the next step is to amplify each one of those single pieces of DNA into a little cluster on the slide of identical bits of DNA. So that's done by bridge amplification.

The key thing is that then the sequencing actually becomes an imaging problem. You take a snapshot of those clusters, but before taking the snapshot, you label each cluster with the next nucleotide in its sequence. So again, I'm not sure if you can see my pointer, but for example, this one in the left, the first nucleotide in that sequence might be a c and so it's labeled with a fluorophore that's white. This one on the top is an a, so it might be pink. Same with this one. This one here on the right is a t. Then you take a picture of this. Traditionally, they used a spinning disk confocal. They've actually changed that technology, but originally it was essentially a confocal microscope. Following the picture, you cleave those fluorophores off and add the next nucleotide. To build up the sequence of DNA associated with each one of those single molecules really is a matter of taking a bunch of pictures and aligning those pictures properly.

So the number of dots per picture represents the number of molecules that you're sequencing and the number of cycles tells you the length of the DNA that you're able to get. So in this case, c, t, t, are the first three bases and then some dots. Typically, these technologies can nowadays go up to several hundred nucleotides. The key point here is that conventional high throughput next generation sequencing, involves taking a picture. For in situ sequencing, we can do essentially the same thing, except that instead of first extracting the DNA, we first immobilize the RNA in the cell and then using a different set of molecular biology tricks, we convert each one of those individual molecules of RNA into a ball of cDNA (complimentary DNA), that corresponds to the sequence we're interested in, in the RNA, and from there on, in, it's essentially the same problem of imaging.

Here's an example that I hope will clarify what I'm talking about. This is a picture of a bunch of neurons in culture. The bright circles are cell bodies. I'm showing you these pictures from cell culture because the pictures are prettier, but we also do this routinely now in slices. Here's the key for reading out what you see. Each neuron is full of one barcode derived, or in a couple of cases two bar codes, but typically one barcode derived from infection by our virus. Just to show you how this plays out, here are two neurons that I'll show you the sequence of. The first space is an a and we cleaved that off. The next space is a g and so on. So by taking a series of pictures, we can read out the sequence of barcode in each neuron and with different optics, modifying and optimizing a couple of things we can also do the axons and dendrites.

We're now applying this to study the projections and auditory cortex. In the interest of time, I'm going to skip over. Most of them just say that there is a picture of about 1500 neurons whose sequences we've determined in a brain slice. Well, we first took a brain, then injected the virus and auditory cortex, and we waited two days, and we did in situ sequencing bar seq at the injection site and looked at the long range projections using conventional map seq. In that way, we're able to get the projections and cell body positions of about 1500 neurons. So for each one of those little dots we can click on it, and I could tell you the strength of projections to the dozen or so areas that we've studied. Again, there's a whole story that we're working on here.

We can combine this with in vivo activity. We first used two photon imaging to figure out the activity of a bunch of cells and then ex vivo, we can find those same cells and then figure out their projections. We can combine this with gene expression. I'll skip over that. Here's what I think was most relevant to the focus of this meeting, which is how do we achieve synaptic resolution? Essentially I've already told you the answer, which is that we combine the axonal and dendritic labeling with barcodes, in this tangled mess that I showed you. Then we can figure out what the two barcodes are of the potential processes that are forming a potential synapse. We have to actually go one step further and figure out whether there is an actual synapse there. Currently, we're doing that by using a synaptic marker. Going forward we have some molecular tricks for actually entangling the barcodes associated with the two processes so that a single rolony will form that includes the sequences of the two barcodes, but only when the two barcodes are touching at a synapse.

In summary, map seq and bar seq open up the possibility of combining connectivity gene expression vivo, functioning at high throughput, in a low cost way. I'll just summarize by saying that if the brain were so simple, we could understand it, we would be so simple we couldn't and I can't end without first thinking my collaboratone, especially Eustace, Cheyenne, Lee, and former lab members including Ian Peikon. Also my funders, especially IARPA, NiH and the Paul Allen Family Foundation for some very early funding for this. With that, I will close.

Allen Sulzen: That's all for today's episode. To learn more about our past events, or to find out about upcoming live events, please visit us on the web at carboncopies.org.