I wrote this mostly for my own reference, so it might have some gaps or unexplained terms. But if you dig genetics and want to learn some new stuff, keep reading…
Did you know people share 7% of genetic material with the E.coli bacteria, 21% with worms, 90% with mice and 98% with chimpanzees? For me, this makes genetics one of the most interesting subjects I’ve studied. The idea that all living organisms, of different sizes and shapes somehow had one common ancestor (LUCA – Last Universal Common Ancestor) is mind blowing.
Genetics is life. From complexity to diversity. From disease to death, sex and evolution.
Let starts with LUCA, thought to have lived some 3.5 -3.8 billion years ago, it’s the most recent common ancestor to all life on Earth. This caveat directly leads us to the tree of life – which I’m sure most of have seen in some part of our learning. For the purposes of this article I will be sticking to the (oversimplified) three domain tree of life illustrated in my diagram below:
Addendum: For the sake of accuracy I should state that the genetic tree is more along the lines of a tree with many interconnected branches representing a multitude of horizontal gene transfers. It’s nowhere as simple and clear-cut as the diagram above.
Some key distinctions to note:
Bateria & Archaea are prokaryotes, meaning: they are mostly unicellular organisms that don’t have a distinct nucleus (not membrane bound). The key difference to note about Bacteria and Archaea is that Archaea live in more extreme environments – the kind that could kill any other organism. They can withstand high temperatures (thermophiles) and can thrive in salty habitats. They’re the extremists.
Eukaryotes on the other hand are typically multicellular and make up animals and plants. They are know to contain a complex, membrane bound nucleus and are usually much larger in size (the cells).
So what is a genome?
It could vary so much that the most ideal definition would be something along the lines of “A set of genetic instructions within a biological compartment.”
Note the little blue corner in the diagram above? That makes up all the animals and land plants on Earth, the rest is all microbes. This goes to show that most of our genetic biodiversity is found within microorganisms, and we go about understanding genetics by studying these minuscule microorganisms.
Did you know 50% of our (human – yea! Because cats read my blog) come directly from bacteria. This takes us to genetic mergers, where roughly 1.8 billion years ago a bacterial cell was engulfed by an Archaean cell to eventually make Eukaryota, and when I mean “eventually” I’m talking a few hundred thousand years. And how this happened is more imaginable than you might guess:
Imagine two cells (A & B) that live side by side. Cell A eats cell B, which gets digested within A. Then one day cell B remains undigested, and turns out Cell B could survive inside A. This goes back and forth for several thousand years, until the two start complimenting each other and form a host-endosymbiont relationship (host = cell A, endosymbiont = cell B). But cell B could still live outside cell A.
Another few thousand years pass by, and a mutation results in cell B being unable to live outside cell A. This is a very important step in the process. Maybe it can no longer produce a protein necessary for it survival and cell A provides it. Since cell A and B were two whole & unique organisms – they carry much of the same genes which do pretty much the same thing – which is redundant. So both start loosing genes from their nuclei and start becoming more dependant on each other, it’s important to note that cell B looses more of it’s genomes than cell A, which is the host in this (what we humans might call unhealthy) relationship. They can never go back from this change and we like to call it the evolutionary ratchet – because it only goes one way.
In this process, cell A gains some of the genes of cell B and visa-versa. Eventually cell A and B become one – where cell B turns out to become an organelle within cell A. A good example, we all depend on as the power-house of our cells, is the mitochondrion – once an independent bacterium. The same process follows for how eukaryotes gained chloroplast (which came from a cyanobacterium acting as the endosymbiont and the host being a eukaryote.)
This once non-exclusive relationship turned genetic merger gave rise to all complex cells and organisms we see today. Tracking back to my introduction to genetic mergers, the bacterium that merged to create our mitochondrion had left a +50% endosymbiotic footprint, this component carries most of the genes encoding metabolic enzymes, transporters and signal transduction, which is complimented by the Archaean component that contains mostly genes for information transmission.
So now when you think about human cells – we could say they have two distinct sets of genomes; one from the mitochondrion and the rest from the nucleus.
| Note to self | Convergent evolution: This is where two different organisms independently evolve similar traits to survive in similar environments.
No. Not him. Lokiarchaeota, they’re a ancient group of single celled organism from the group Archaea. What makes them interesting is that they share some characteristics with more complex life forms, including humans. While my simplification above may have made the jump from single celled organisms to complex creatures seem almost hierarchical – it’s infact a subject that baffles scientists even today. It’s hoped that the discovery of Loki (September 2015) could answer this great evolutionary mystery.
If you might be wondering why this might be so hard to figure out, its because we can’t go back in time, the best scientist could do is study the genetic makeup of organisms to trace back and construct an order of evolutionary events.
Fun fact: Lokiarchaeota were first discovered off the coast of Norway, 15 kilometers from deep sea vents actually known as Loki’s castle! 
Let’s delve a bit deeper into the genetics and talk about genome size
If you come across a random cell and want to find out how big its genome is, what you typically do would be sequence it’s genome and assemble it using a computer and then count the number of nucleotides. This process is usually done using a modern sequencing machine which are commercially available (all you need to do is send them the DNA sample that needs measuring).
They come in various lengths and we usually use the following units to measure them:
- 1 base pair = 1bp
- 1,000bp = 1kb
- 1,000,000bp = 1mb
- 1,000,000,000bp = 1gb
- Finally 1pg (picogram) = 1gb
If you were wondering a human genome size would be about 3pg – which is 3 billion base pairs in length.
But the above mentioned sequencing process doesn’t work too well of big genomes. Sometimes repeats could confuse the computer algorithm, gaps could may not be identified or there just might be too much data for the computer to process.
So we use an alternative method called staining and imaging. This is where the DNA is stained with a fluorescent chemical and a high powered microscope is used to capture the image of the sample onto a computer which could carry out a digital image analysis and determine the genome size based on the level of fluorescence. Although you should note that haploid and diploid cells could result in wrong interpretations when using this method – because genome size shouldn’t account for multiple copies of chromosomes.
The C-value Paradox
So the next sensible question to ask would be: is there a relationship between genome size and the complexity of the organism? It’s not far fetched to assume that the more complex an organism is, the more genes it would need right? Not really.
There is a discordance between complexity and genome size. There’re some organisms which have more genes than humans. It was called the C-value Paradox by C.A. Thomas, Jr. in 1971.
So what does having a big genome size really mean?
It means you’ve got more introns, or non-sense genes which don’t code for anything useful. 99.9% of the 130 billion base pairs of a marble fish are non-coding and 90% of the genomes of the microsporidian parasite, which has the smallest nuclear genome size are coding.
So why do some organisms evolve to have large amounts of non-sense genes in their DNA? We could only hypothesize. One of the more promising explanations is the idea that the bigger the non-coding regions, the larger the nuclear volume and the larger the cell. So some organisms may prefer to have a larger cell – it could be helping them better survive in their environment or it could even do the exact opposite. It is largely known that cells with a larger genome have a longer process of cell division.
While this new finding may have nullified the C-value paradox, it’s brought to light a new problem – the C-value enigma.
The C-value enigma asks the following important questions 
- What sorts of DNA make up this non-sense sections?
- How are they gained and lost over time?
- Do they have an effect or function in cells?
- Why are some genomes enormous while others are extremely streamlined?
There is research to suggest that in some invertebrate groups the genome size is positively correlated to the body size, but it’s far from being a universal trait. It’s the same case with developmental rates which correlate negatively with genome size in some organisms.
The research is ongoing – you can read about it and take a look at some of the research data on Gregory Lab.
I look around now and all I see are remnants of genetic mergers and mutations all walking around in this (arguably) perfect system we’ve built for ourselves. A long way to come from LUCA don’t you think?
To be continued…
 LUCA: https://en.wikipedia.org/wiki/Lastuniversalancestor
 About Loki: https://www.quantamagazine.org/20151029-lokiarchaeota-origin-complex-life/
 Credits to my Genetics Professor David Smith for introducing me to the world of genomic evolution.
 Gregory Lab http://www.gregorylab.org/research/