Information

Gene expression: which allele is considered?


For human beings, we have two copies of each gene inherited from the parents. The question is, when referring to gene expression, which copy (or allele) is considered?


Genome assemblies usually feature a haploid set of genes, unless particular effort is applied to assemble separately the homeologs. With that said, the gene copy you have in the assembly is usually one of the two, and usually the one you had more genomic reads for.

In terms of gene expression, if you measure it by RNASeq you will get reads from both and both of the read pools will map on the same gene. The ones that come directly from it will map with no mismatches (ideally) and the ones that come from the homeolog chromosome will map with a few mismatches (probably).

The expression of the "gene" will therefore be the expression of both genes, summed together. If we consider it as the same gene, though, it doesn't really make a difference if you sum them up or count them twice separately; unless we are talking about specific homeolog silencing like the Xist mechanism, in which it is important to remember that one copy of the two is off.


9.12: Imprinted Genes

  • Contributed by John W. Kimball
  • Professor (retired) at Tufts University & Harvard

Imprinted genes are genes whose expression is determined by the parent that contributed them. Imprinted genes violate the usual rule of inheritance that both alleles in a heterozygote are equally expressed.

Examples of the usual rule:

  • If a child inherits the gene for blood group A from either parent and the gene for group B from the other parent, the child's blood group will be AB.
  • If a child inherits the gene encoding hemoglobin A from either parent and the gene encoding hemoglobin S from the other parent, the child's red blood cells will contain roughly equal amounts of the two types of hemoglobin.

But there are a few exceptions to this rule. A small number of genes in mammals (

80 of them at a recent count) and in angiosperms have been found to be imprinted. Because most imprinted genes are repressed, either

  • the maternal (inherited from the mother) allele is expressed exclusively because the paternal (inherited from the father) allele is imprinted or
  • vice versa.

The process begins during gamete formation when

  • in males certain genes are imprinted in developing sperm and
  • in females, others are imprinted in the developing egg.

All the cells in a resulting child will have the same set of imprinted genes from both its father and its mother EXCEPT for those cells ("germplasm") that are destined to go on to make gametes. All imprints &mdash both maternal and paternal &mdash are erased in them.


Incomplete Dominance

In Mendel's experiments, offspring always looked like one of their two parents due to the complete dominance of one allele over the other. This is not always the case because some genes display incomplete dominance that is, individuals with heterozygous alleles exhibit a phenotype intermediate between those with homozygous alleles. For example, this figure depicts the outcome of a cross between a snapdragon with red flowers and one with white flowers - the F1 hybrids have pink flowers. In this case neither of the alleles for flower color are completely dominant over the other. Therefore, individuals with heterozygous alleles have a phenotype unlike those with either set of homozygous alleles.


Figure. Incomplete dominance in snapdragon color. (Click image to enlarge).

As demonstrated in this figure, the Punnett square for this cross is like that for any other monohybrid cross. However, the ratio of phenotypes in the F2 generation is not 3:1 (dominant:recessive), as seen with completely dominant alleles, but rather a 1:2:1 ratio of red:pink:white flowers. In this example the alleles are symbolized differently than in the previous examples. Since neither allele dominates over the other, the use of an uppercase and lowercase version of the same letter is inappropriate. In this example the character (flower color) is indicated by a letter (C), and the alleles encoding the trait (white, blue or red) are listed as uppercase subscripts (recall, they are both uppercase because neither is dominant to the other). You may see other symbolic representations for incomplete dominance, but don't let this confuse you. The important thing to know is that some genes are expressed in an incomplete dominant manner.

At the following Web sites, find the correct answer to the multiple-choice monohybrid or dihybrid cross questions. Work out each problem for yourself. To view an explanation of the problem, select the "TUTORIAL" button. After viewing the correct answer, close the Monohybrid Cross Problem Set or Dihybrid Cross window to return to this page. (note: These sites are a part of the Monohybrid and Dihybrid Problem Sets provided by The Biology Project at the University of Arizona.)

Problem 9: Incomplete dominance - This problem is a part of the Monohybrid Cross Problem Set.

Problem 10: Disappearance of parental phenotypes in the F1 generation - This problem is also a part of the Monohybrid Cross Problem Set.

Problem 11: Incomplete dominance in a dihybrid cross - This problem is a part of the Dihybrid Cross Problem Set.


Law of Segregation

Based on his research, Mendel proposed the law of segregation. This law states that paired factors (genes) must segregate equally into gametes such that offspring have an equal likelihood of inheriting either factor . The law supports what Mendel observed in his F1 and F2 data. The equal segregation of alleles is the reason we can apply the Punnett square to accurately predict the offspring of parents with known genotypes. The physical basis of Mendel’s law of segregation is the first division of meiosis, where the homologous chromosomes with their different versions of each gene are segregated into daughter nuclei. Since meiosis was not understood by the scientific community during Mendel’s lifetime, his work cannot be fully appreciated.


Materials and methods

Genome sequence and annotation

Sequence and feature files (.gff files) for the S288c genome were obtained from the Saccharomyces Genome Database (http://www.yeastgenome.org) on 7 March 2007. The sequence for YJM789 was obtained from Wei et al (2007) and aligned to the S288c genome using the procedure described by Wei et al (2007) .

Microarray data

Microarray data are available at ArrayExpress (http://www.ebi.ac.uk/microarray-as/ae/). The cDNA hybridizations are available under accession number E-TABM-569 and the array design is available under A-AFFY-116. We have also used genomic DNA hybridizations from Mancera et al (2008) (accession number E-TABM-470). See Supplementary information for details.

Array design

We designed a custom Affymetrix tiling array (product no. 520055) with a total of ∼6.5 million probes (25-mers) including perfect match and mismatch probes. The probes tile both strands of the S288c genome at a resolution of 8 bp, with a shift between the strands of 4 bp ( David et al, 2006 ). The array also includes ∼106 000 probes complementary to the YJM789 genome ( Wei et al, 2007 ) at positions of polymorphism between the strains. We also added 10 647 negative-control probes of randomly generated sequences with GC content ranging from 2 to 25 GCs.

Yeast strains and sample preparation

Laboratory and clinically derived S. cerevisiae strains used in this work were isogenic to S288c and YJM789 and were designated as ‘S’ and ‘Y’, respectively. Three independent heterozygous hybrid strains (designated as ‘Y/S’) were obtained by crossing Y and S strains. Reciprocal hemizygote strains for PHO84 alleles were constructed by crossing relevant Y and S background strains. Supplementary Table VI lists all strains used in this study.

Total RNA was extracted from yeast cultures grown at 30°C in YPD medium (2% peptone, 1% yeast extract and 2% dextrose) and processed for array hybridizations as described earlier ( Perocchi et al, 2007 ). Importantly, to remove reverse transcription artifacts, first-strand cDNA was synthesized in the presence of 6.25 μg/ml actinomycin D. As cDNA is chemically same as DNA, we did not expect any systematic differences between cDNA and genomic DNA labeling.

For making mixture series, cDNA from S and Y strains was mixed in the following proportions, according to mass: 0:1, 1:3, 1:1, 3:1 and 1:0.

Probe filtering and classification

Using the following procedure, we classified each probe as common, S-specific, Y-specific or control. Ungapped alignments of the probes to the S288c genome and the aligned portion of the YJM789 genome were produced using the software exonerate ( Slater and Birney, 2005 ). We considered all perfect matches and near matches (up to two mismatches). A common probe has a unique perfect match to both parental genomes at the same alignment position and no near match. An S-specific probe has a unique perfect match and no further near matches to the S288c genome. It has no perfect match to the YJM789 genome and no near match to the YJM789 genome, except possibly at the same aligned position as its perfect match position in S288c. Y-specific probes were defined analogously. Specific probes whose match overlaps a polymorphism at ±4 bp of its central base were called ‘centered specific probes (CSP)’. Finally, we ensured that each negative control probe had neither a perfect nor a near match in either genome.

Normalization and background subtraction

Calibration of intensities between arrays was done using a variant of quantile normalization ( Bolstad et al, 2003 ), as follows. The sets of cDNA and genomic DNA (gDNA) hybridizations were treated separately. As specific probes are expected to have different behavior depending on the strain, we restricted the quantile normalization to the set of common probes and used linear interpolation to normalize the intensities of the specific probes.

The background of cDNA hybridizations was subtracted as described earlier ( Huber et al, 2006 ). Briefly, probes were binned into 10 groups according to their intensity level in the gDNA hybridizations. For each probe group and for each cDNA hybridization, probes falling outside annotated transcribed regions were used to estimate a background level. This level was then subtracted from the intensities of all probes within the group. To subtract the background of DNA hybridizations, we grouped probes by GC content. For each group and hybridization, we estimated the background level as the 10% trimmed mean of the negative control probes and subtracted it from all probes of the group.

New transcript identification and transcript probe sets

We ran a segmentation algorithm combining heterozygote cDNA hybridizations with parental cDNA hybridizations using the R package ‘tilingArray’ ( Huber et al, 2006 ). Segmentation was carried out on the set of common probes, for which the assumption of a constant level across the transcript can be made. For each chromosome, the segmentation parameter S (number of segments) was set so that the average segment size was 1500 bp. Segments corresponding to unannotated transcripts were then categorized as unannotated intergenic or antisense as described earlier ( David et al, 2006 ) (‘intergenic’ were termed ‘isolated’ in the earlier study). Segments with less than 20 probes were discarded. A subsequent manual inspection discarded six dubious antisense segments and recovered 10.

We subsequently inferred the expression of a transcript from the intensities of its probe set. We defined the probe set of a new transcript as the probes for which the match entirely falls within the boundaries of the segment. We defined the probe set of an annotated transcript as the probes whose match entirely falls within the boundaries of an annotated S288c exon.

Probe intensity model

We modeled yij, the normalized and background-subtracted intensity of probe i in hybridization j, as

where λ1i and λ2i are the affinities of the probe to its matches in each genome, c1ij and c2ij are the expression levels of the respective complementary sequences in the sample j and εij are the errors. The affinities and the expression levels are non-negative real numbers expressed in arbitrary units. For common probes, we have λ1i2i.

We considered five possible types of hybridization samples: genomic DNA (gDNA) of the two homozygous strains S and Y, their cDNA, and cDNA of the heterozygous Y/S. We set ckij=2 if sample j is homozygous genomic DNA of genome k. Moreover, we fixed ckij=0 if sample j is genomic DNA or cDNA of homozygous strain different from k.

Following Rocke and Durbin (2001) , we modeled the variance of the errors εij as functions of the expected intensity Iij1ic1ij2ic2ij:

The coefficients aj, bj and γ were inferred using the R package vsn ( Huber et al, 2002 ) by treating the cDNA and the gDNA hybridization as two separate groups. We assumed the scaled errors to be independent and identically distributed and of mean 0.

Least-squares regression

For the cDNA samples of each strain, we assumed a constant level of each allele across one transcript's probe set. The regression proceeds with each transcript separately using probes only of the transcript probe set.

We denoted p1 and p2 the nominal expression levels of the alleles in the homozygous strains, h1 and h2 the levels of each allele in the Y/S strain. From equation (1), We obtained a set of equations for all hybridizations j and probes i that depend on the hybridization sample types:

We fitted the model by weighted least squares. More precisely, we searched for a set of affinities and expression levels that minimizes the sum of squared scaled residuals:

subject to λ0, p0, h0 and λ1i2i for common probes, where the weights wij = 1/ var (εij were estimated by using equation (2).

We took advantage of the form of the model for the optimization procedure. Indeed, assuming fixed weights, the cost function is a sum of squared terms bilinear in λ and (p, h). For a given expression-level vector (p, h), there is a closed-form solution to the unique optimal affinity vector λ and vice versa. We thus devised a component-wise optimization algorithm that iteratively optimizes expression levels given affinities and reciprocally, updating the weights at each step using equation (2). We considered that the algorithm had converged, if all fitted expression levels of the last 2 iterations differ by less than a value corresponding to 10% of the background level, and stopped the algorithm if convergence did not occur before the 30th iteration.

Confidence intervals

We estimated confidence intervals per ORF probe set by resampling the scaled residuals with replacement. The regression results in fitted parameters and thus, according to the model, in an estimated intensity Îij, an estimated weight ŵij and a scaled residual ε′ij for each observed intensity:

We generated new synthetic data as noisy measurements of the fitted intensities: where the function σ is a random sampling with replacement of the index pairs ij. We repeated this B=999 times and obtained B estimates of the parameters. For all statistics of interest (expression level, allelic differential expression, etc.), 95% equi-tailed confidence intervals were estimated according to the non-parametric basic confidence limit as described in Davison and Hinkley (1997) .

P-values and false discovery rates

  • H1: Levels in parent equal: p1=p2
  • H2: Levels in hybrid equal: h1=h2

We fitted an appropriately constrained model for each probe set and for each hypothesis (p1=p2 and h1=h2). Similar to the procedure for estimating confidence intervals, we generated B=999 new synthetic data as noisy measurements of those fitted intensities. Here again we sampled scaled residuals of the primary unconstrained fit, because they reflect the true noise better than those of the constrained fits. On each simulated dataset, we performed an unconstrained regression. For each hypothesis respectively, we considered the T-statistic.

The P-value is then approximated by

where t is the statistic value for the primary, unconstrained fit and ti * , I=1, …, B are the bootstrap statistic values ( Davison and Hinkley, 1997 ).

Treating each hypothesis H1 and H2 separately, q-values, i.e. false discovery rates (FDR), were obtained using the R package qvalue ( Storey and Tibshirani, 2003 ) with default parameters.

Sequence validation of differentially expressed transcripts

Quantitative estimates of allelic expression ratios by sequencing were obtained using the method described by Ge et al (2005) . Primers (Supplementary Table VII) were synthesized such that they spanned multiple SNPs between the two alleles of a transcript. From two independent Y/S strains, XHS768 and XHS769, cDNA was synthesized using random hexamers and PCR was carried out on the resulting cDNA for sequence analysis. PCR products using the same primers on genomic DNA of a Y/S strain, XHS768, was used to provide reference traces in situation of 1:1 allelic concentrations. The resulting sequence traces were analyzed with the software PeakPicker ( Ge et al, 2005 ), which estimates allelic expression ratios from relative peak heights at SNP positions. We calculated the allelic ratios of transcripts as the median over all SNPs and traces (Supplementary Table VII). Out of the 24 transcripts tested, one (HOP1) did not confirm polymorphic positions in the genomic DNA. Two others (ICL2 and YDL237W) were rejected from further analysis for having ratio estimates derived from less than two SNPs.

ADE coefficient

We defined the ADE coefficient as (∣ hYhS∣)/(hY + hS), where hY and hS are the expression levels of the Y allele and S allele, respectively in the heterozygote.

Proportion of cis- and trans-regulatory effects

The ratio of cis-regulatory divergence to the total regulatory divergence ( Wittkopp et al, 2008 ) is computed as ∣ C∣/(∣ C ∣ + ∣ T ∣) where C, the cis-regulatory effect, is the log ratio of the allelic expression levels in the hybrid and T, the trans-regulatory effect, is the difference between the log ratio of the parental gene expression levels and C.

Analysis of PHO84 reciprocal hemizygote strains hybridizations

The hybridizations of the two PHO84 reciprocal hemizygote strains were analyzed using the same model as described above. Total transcript expression levels (i.e., hY+hS, the sum of the two allele levels for each transcript) were considered for comparison.


Adult and Fetal

Yvonne A. Evrard , Sharon Y.R. Dent , in Handbook of Stem Cells , 2004

Histone Hypoacetylation Is Linked to Genomic Imprinting

Allele -specific imprinting is another embryonic event associated with changes in the patterns and levels of histone acetylation. Imprinting involves the selective inactivation of either the maternal or paternal allele during gametogenesis resulting in monoallelic gene expression in the embryo. Differential expression of the imprinted genes is then maintained through subsequent cellular divisions and differentiation. Maintenance of the imprinted signal depends on preserving the epigenetic marks inherited from either the maternal or paternal germ line. Differential methylation of CpG islands has long been associated with imprinted genes. More recently, specific post-translational histone modifications have also been found to be associated with maternally and paternally imprinted genes. Specifically, HDAC activity has been found to be associated with the methylated CpG islands found with silenced alleles. 55

CpG-rich regulatory elements associated with the imprinted mouse genes Snrpn, Igf2r, and U2af1-rs1 are methylated on the silenced maternal allele but are not methylated on the transcribed paternal allele. 56 This differentially methylated DNA region also has allelic differences in histone H3 modifications. Histone H3 is acetylated and H3-K4 is methylated on the transcribed paternal allele, whereas the silenced maternal allele is methylated on H3-K9. Further examination of the U2af1-rs1 and Snrpn alleles showed that all H3 lysines associated with the silenced maternal allele were hypoacetylated, but only H4-K5 was underacetylated on the paternal allele. The deacetylation H3 on the maternal allele appears to be targeted by CpG methylation. 7 Altogether these studies indicate that allele-specific imprinting involves changes in the histone modification states that establish heritable active or repressed chromatin structures.


Summary – Gene vs Allele

Genome is the place where our genetic information is hidden in the form of genes. Gene is a precise nucleotide sequence which contains the genetic code to produce a protein. There are many genes arranged in the chromosomes. Hence, they have specific locations in the chromosomes where we can identify. Furthermore, a gene consists of two alternative forms. They are alleles. These two alleles come from the respective parents. Among the two alleles, one is dominant and the other one is recessive. Most of the time, when the dominant allele is present, it always expresses its phenotype dominating over the other allele. Thus, this summarizes the difference between gene and allele.

Reference:

1.Polyak, Kornelia. “Overview: Gene Structure.” Holland-Frei Cancer Medicine. 6th Edition., U.S. National Library of Medicine, 1 Jan. 1970. Available here

Image Courtesy:

1.”Chromosome-DNA-gene”By Thomas Splettstoesser – Own work, (CC BY-SA 4.0) via Commons Wikimedia
2.”Gene Loci and Alleles”By Keith Chan – Own work, (CC BY-SA 4.0) via Commons Wikimedia


Acknowledgements

We thank P. Mieczkowski, A. Brandt, E. Malc, M. Vernon, J. Brennan and M. Calabrese for helpful discussions. Major funding was provided by National Institute of Mental Health/National Human Genome Research Institute Center of Excellence for Genome Sciences grants (P50MH090338 and P50HG006582, co-principal investigators F.P.-M.d.V. and P.F.S.). This work was also supported by grants R01GM074175 (principal investigator F.Z.) from the National Institute of General Medical Sciences and K01MH094406 (principal investigator J.J.C.) from the National Institute of Mental Health.


Genetic Variation and Change

Genetic variation describes naturally occurring genetic differences among individuals of the same species. All organisms are slightly or greatly different. This variation permits flexibility and survival of a population in the face of changing environmental circumstances and can also produce variation in the gene pools. This variation is important, especially in New Zealand as the habitat is constantly changing living (biotic) and non living (abiotic factors) change the populations gene pools and pressures.

This standard is about what brings on this variation in populations and how this leads to different frequencies of traits and eventually natural selection. What leads to the variation in a species? Make sure you watch the videos and look at the animations, they will all help.

These have been made by Benjamin Himme from https://www.pathwayz.org/ (Another great learning site)

DNA from the Beginning Good introduction to DNA and genetics

Population genetics is the study of genetic variation within populations, and involves the examination and modelling of changes in the frequencies of genes and alleles inpopulations over space and time. Many of the genes found within a population will bepolymorphic - that is, they will occur in a number of different forms (or alleles). Mathematical models are used to investigate and predict the occurrence of specific alleles or combinations of alleles in populations, based on developments in the molecular understanding of genetics, Mendel's laws of inheritance and modern evolutionary theory. The focus is the population or the species - not the individual.

A good starter video.

Look at the level 2 Gene expression page for notes, animations and videos explaining the structure of DNA.

A gene pool is the complete set of unique alleles in a population.

Genetic drift is the change in the relative frequency in which an allele occurs in a population due to random sampling and chance.

Migration is the transfer of alleles of genes from one population to another.

The gene pool changes in allele frequencies due to random events not related to the fitness of the allele relative to that environment.

The gene pool changes in allele frequencies due to new alleles being brought into the population (immigration) or being lost from the population due to emigration.

The effects of both genetic drift and migration are particularly apparent in a small population where the relatively small changes in allele numbers can have a bigger impact on the ratio of those alleles in the population.

Mutation is a permanent / random changes in the DNA/ genetic material. Mutation must occur in gamete-producing cells to enter the gene pool of the population.

is can also be defined as a permanent change in the nucleotide sequence in a gene or a chromosome.

A mutation is a permanent (unrepaired) change in an organisms DNA.

They introduce new alleles into a population. Most mutations are harmful.

Mutations are caused by mutagens.

Beneficial ones tend to occur more often in organisms with short generation times.

Many may be silent – not observed – and may only be selected for or against at a later date.

Neutral mutations make no change at all.

Mutations must happen in gamete producing cells to enter the gene pool of a population. This is important!!

Genes mutate at known rates. This rate varies depending on the gene involved. Some genes have high spontaneous mutation rates.

Mutation rates for genes within a species are probably similar, but the viability of mutations varies greatly. Mutant genes in the human population:

With approximately 30,000 genes in the human genome and two copies of each gene, each cell has a total of 60,000 genes.

In higher organisms, a mutation for a specific gene will occur in one gamete in 300,000.

Somatic (Body cell mutations)

· somatic mutations occur in any cells of the body other than in the gametes

Alterations in DNA that occur after conception. Somatic mutations can occur in any of the cells of the body except the germ cells (sperm and egg) and therefore are not passed on to the offspring.

Gametic (sex cell mutations)

· gametic mutations only occur in gametes, eg, sperm / eggs (accept pollen).

· somatic mutations are not passed on from one generation to the next

· somatic mutations only affect the individual organism in which the cells have mutated

· gametic mutations are (heritable) transferred to the next (& possibly subsequent) generations

· gametic mutations are not limited to the individual in which the original mutations has occurred

the new alleles created by gametic mutation are available to the gene pool and may become established in that gene pool.

Gametic: (may be called germ line, which is acceptable). A heritable change in the DNA that occurred in a gamete (germ cell) – a cell destined to become an egg or sperm. When transmitted to the offspring, a gametic mutation is incorporated in every cell of their body.

Substitution of a single base, e.g. A → G. Affects 1 gene.

Addition or subtraction of a single base - causes a frame shift. Seriously affects one gene

Chromosome mutations -A chunk of chromosome can be deleted, added or moved to a different chromosome. Affects a number of genes

Aneuploidy -A whole chromosome, or whole set of chromosomes are added or lost.

If a mutation occurs in a gamete it will affect the entire organism produced (they are inherited) = Gametic mutation.

If the mutation occur in a body cell it will only affect one area (it is not inherited) = Somatic mutation.


Genetics

The study of heredity or, more generally, the flow of biologically encoded information through both space and time.

Sub-disciplines of genetics include Mendelian genetics, molecular genetics, cytogenetics, microbial genetics, and population genetics. The study of heredity substantially preceded the formal study of genetics by millennia while the formal study of genetics predated any molecular appreciation of how heredity occurs by decades.

Figure legend: Variation on the study of genetics. Note the indication of a building up of sub-disciplines starting with the molecular and going towards the population wide. Cytogenetics is the study of eukaryotic chromosomes especially visually (i.e., such as one does when studying mitosis and meiosis). Mendelian genetics is the study of the genetics of diploid organisms especially in terms of offspring-parent phenotypic relationships. Molecular genetics is considered is the study of genetic information flow at the molecular level. Population genetics, in turn, is the study of genetic variation among organisms within populations.

A flawed appreciation of genetics hampered Darwin's efforts to understand evolution, and even with the rediscovery of Mendelian genetics in the early years of the Twentieth Century, its proper application to evolutionary theory lagged behind by years.

Today we are all but overwhelmed by genetic information, as gleaned from high-throughput DNA sequencing technologies, but there nonetheless remain substantial gaps in our understanding how an organism's genetics (genotype) translates into an organism's phenotype except in the broadest sense of transcription followed by translation which in turn is followed by complex considerations of biochemistry, developmental biology, physiology, and even ecology. This gap in our knowledge and indeed the current state of quite a bit of modern biology is ironic given that genetics began as a solely phenotype-based discipline from which genotype information could only be inferred.

The following is a list of important concepts associated with the science of genetics: