Information

RNAs arising from intergenic regions


Which type of RNA molecule is coded for in intergenic regions? I think it must be a non-coding RNA but I'm unsure which type.


The term intergenic is more or less obsolete now. In fact it is ironical to say that a gene, which can give rise to a functional protein or an RNA, is expressed from an intergenic region. However the usage continues for both lncRNAs and miRNAs (other major type of ncRNA in metazoans1 - piRNAs have different classification). lncRNAs are vaguely classified as sense-overlapping (expressed in the same direction as a known gene but not completely overlapping), antisense (expressed from the antisense strand of a known gene) or intergenic; similarly miRNAs are classified as intronic or intergenic.

It is not necessary that the intergenic RNA has to be strictly non-coding, it can be coding as well; a novel protein.

In all these cases the term intergenic means - not overlapping with a known gene. Better nomenclature is necessary as the field of genomics makes more progress.

See this article.

1 Most metazoans.


Intergenic region

An intergenic region (IGR) is a stretch of DNA sequences located between genes. [1] Intergenic regions are a subset of noncoding DNA. Occasionally some intergenic DNA acts to control genes nearby, but most of it has no currently known function. It is one of the DNA sequences sometimes referred to as junk DNA, though it is only one phenomenon labeled such and in scientific studies today, the term is less used. Recently transcribed RNA from the DNA fragments in intergenic regions were known as "dark matter" or "dark matter transcripts". [2]


MINI REVIEW article

  • Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, NH, United States

In eukaryotic organisms, transfer RNA (tRNA)-derived fragments have diverse biological functions. Considering the conserved sequences of tRNAs, it is not surprising that endogenous tRNA fragments in bacteria also play important regulatory roles. Recent studies have shown that microbes secrete extracellular vesicles (EVs) containing tRNA fragments and that the EVs deliver tRNA fragments to eukaryotic hosts where they regulate gene expression. Here, we review the literature describing microbial tRNA fragment biogenesis and how the fragments secreted in microbial EVs suppress the host immune response, thereby facilitating chronic infection. Also, we discuss knowledge gaps and research challenges for understanding the pathogenic roles of microbial tRNA fragments in regulating the host response to infection.


Intergenic RNA mainly derives from nascent transcripts of known genes

Background Eukaryotic genomes undergo pervasive transcription, leading to the production of many types of stable and unstable RNAs. Transcription is not restricted to regions with annotated gene features but includes almost any genomic context. Currently, the source and function of most RNAs originating from intergenic regions in the human genome remains unclear.

Results We hypothesised that many intergenic RNA can be ascribed to the presence of as-yet unannotated genes or the ‘fuzzy’ transcription of known genes that extends beyond the annotated boundaries. To elucidate the contributions of these two sources, we assembled a dataset of >2.5 billion publicly available RNA-seq reads across 5 human cell lines and multiple cellular compartments to annotate transcriptional units in the human genome. About 80% of transcripts from unannotated intergenic regions can be attributed to the fuzzy transcription of existing genes the remaining transcripts originate mainly from putative long non-coding RNA loci that are rarely spliced. We validated the transcriptional activity of these intergenic RNA using independent measurements, including transcriptional start sites, chromatin signatures, and genomic occupancies of RNA polymerase II in various phosphorylation states. We also analysed the nuclear localisation and sensitivities of intergenic transcripts to nucleases to illustrate that they tend to be rapidly degraded either ‘on-chromatin’ by XRN2 or ‘off-chromatin’ by the exosome.

Conclusions We provide a curated atlas of intergenic RNAs that distinguishes between alternative processing of well annotated genes from independent transcriptional units based on the combined analysis of chromatin signatures, nuclear RNA localisation and degradation pathways.


Long Intergenic Non-Protein Coding RNA Variants and Disease Susceptibility

As a matter of fact, the occurrence of complex diseases (e.g., cancer) is related to multiple factors, including genetic, environmental, and lifestyle. Among them, genetic factors are of particular interest, just as GWASs and next-generation sequencing studies have greatly broadened the understanding of genetic variants that confer risk of diseases. Numerous genetic variants in lincRNA regions have been determined to be associated with the susceptibility of heterogeneous diseases, especially multiple types of cancer. Herein, we reviewed some lincRNAs that encompass disease or trait-associated variants (Tables 1, 2).

Table 1. Overviews of trait-associated variants on the chr8q24 locus.

Table 2. Overviews of other lincRNAs encompassing trait-associated variants.

Long Intergenic Non-protein Coding RNA Variants on the chr8q24 Locus

Genome-wide association studies have pointed to the chr8q24 genomic locus as a hotspot for cancer-associated variants owing to the large density, more strength, and high allele frequency of these variants (Yeager et al., 2007 Tuupanen et al., 2009). Even though chromosome 8q24 has been considered as a “gene desert” region owing to the absence of functionally annotated genes, with the only notable exception of the frequently amplified MYC (a proto-oncogene involved in tumorigenesis) (Chung et al., 2011). Surprisingly, large-scale studies have revealed that several lincRNAs are transcribed from the chr8q24 locus, such as CCAT1 (Kim et al., 2014), CCAT2 (Ling et al., 2013), PVT1 (Hanson et al., 2007), PCAT1 (Guo et al., 2016), and PRNCR1 (Li et al., 2013) all of these encompass multiple cancer-associated variants. For instance, lincRNA CCAT2 (Colon Cancer-Associated Transcript 2, also termed LINC00873), a transcript spanning SNV rs6983267, is associated with an increased risk for prostate, breast, colon, and colorectal cancers (Yeager et al., 2007 Tuupanen et al., 2009 Ling et al., 2013). CCAT2 is overexpressed in various types of cancers and may contribute to tumor growth, metastasis, and chromosomal instability by increasing MYC expression (Ling et al., 2013). LincRNA PRNCR1 has been reported to be involved in prostate carcinogenesis and may play an oncogene role via modulating the androgen receptor (Chung et al., 2011), PRNCR1 variants, especially rs1456315, are associated with the susceptibility of prostate and colorectal cancers (Li et al., 2013 Teerlink et al., 2016). Through an integrative analysis of the lncRNA transcriptome and GWAS data, Guo et al. (2016) have identified a prostate cancer-associated transcript PCAT1 and 10 risk loci on the chr8q24.21, including PCAT1 variants rs10086908 and rs7463708, which are significantly associated with prostate cancer susceptibility. As for PVT1 (also termed LINC00079), a GWAS analysis has identified that its variants rs13255292 and rs4733601 are associated with the susceptibility of diffuse large B cell lymphoma (Cerhan et al., 2014). Other independent SNVs (e.g., rs2720709 and rs2648875), which are mapped on PVT1, especially contributes to the development of end-stage renal disease (ESRD) in patients with type 2 diabetes (Hanson et al., 2007). A recent meta-analysis has summarized the relationship between two common variants (rs10505477 and rs7837328) in the intronic region of CASC8 (LINC00860) at 8q24 locus with the risk of cancers (Cui et al., 2018), including colorectal, gastric, and lung cancers (Ma et al., 2015 Hu et al., 2016). Another intronic loci rs378854 is related to adiposity in the individuals of African ancestry (Ng et al., 2017).

Single-Nucleotide Variants in Long Intergenic Non-protein Coding RNA H19 Locus

The H19 (also termed LINC00008) is located in chromosome 11p15.5, a paternally imprinted onco-fetal gene, which is typically down-regulated in adult tissues but can be overexpressed in multiple types of solid cancer. LincRNA H19 expression is closely related to tumor growth, metastasis, recurrence, and clinical prognosis (Ge et al., 2018). H19 variants are involved in the susceptibility of multiple diseases. A meta-analysis study has indicated that variant T allele of rs2107425 is correlated with a decreased risk of developing cancers (e.g., breast, ovarian, lung, and bladder cancers) (Chu et al., 2016 Wu et al., 2017), whereas variant rs2839698 is associated with an increased risk of digestive cancers (colorectal and gastric cancers) via up-regulating H19 expression of note, there is no significant association observed between rs217727 variant and cancers susceptibility (Chu et al., 2016). However, in other reports, H19 rs217727 has been linked to the risk of hepatocellular carcinoma (HCC) (Ge et al., 2018), oral squamous cell carcinoma (OSCC), and bladder cancer in the Chinese population (Guo Q. Y. et al., 2017). For coronary artery disease (CAD), the T variant of rs217727 is associated with an increased risk, whereas rs2067051 A variant is linked to a decreased risk (Gao et al., 2015). H19 rs217727, but not rs2107425 variant, is associated with susceptibility of women with preeclampsia (PE) (Harati-Sadegh et al., 2018). Additionally, maternally transmitted fetal H19 variants (e.g., rs217727, rs2071094, and rs10732516), along with paternal IGF2 variants, are independently correlated with the placental DNA methylation levels (Marjonen et al., 2018) and birth weight of newborns (Petry et al., 2011).

Single-Nucleotide Variant in MALAT1 and MIAT Regions

LincRNA MALAT1 (metastasis-associated lung adenocarcinoma transcript 1, also termed LINC00047) has rs619586 A > G variant, which is significantly associated with the susceptibility of pulmonary arterial hypertension (PAH), and the carriers with variant G genotypes have a decreased PAH risk (Zhuo et al., 2017). Recent study has suggested that rs619586 AG/GG genotypes could reduce the risks of coronary atherosclerotic heart disease and congenital heart disease (CHD) by regulating MALAT1 expression (Li et al., 2018b). Another report has showed that MALAT1 is overexpressed in colorectal cancers and that SNV rs1194338 mapping to its promoter region is significantly associated with a decreased risk of colorectal cancer (Li et al., 2017). Moreover, the large-scale case𠄼ontrol association studies have identified a novel myocardial infarction-associated transcript, MIAT (also termed LINC00066), which encompasses rs2331291, and other variants confer the susceptibility of myocardial infarction (Ishii et al., 2006). As a component of the nuclear matrix, MIAT is mainly expressed in neurons, Rao et al. (2015) have reported that SNV rs1894720 is correlated with paranoid schizophrenia susceptibility, and MIAT may contribute to the pathogenesis of schizophrenia.

Other Long Intergenic Non-protein Coding RNA Variants in Human Cancers

In addition to the above lincRNA molecules, recent studies have identified many other cancer-associated variants within lincRNA regions. For example, the tissue differentiation-inducing non-protein coding RNA (TINCR), also termed LINC00036, is essential for somatic tissue differentiation and tumor progression (Kretz et al., 2013). It has been demonstrated that two variants of TINCR (rs2288947 and rs8105637) are significantly correlated with the susceptibility and lymph node metastasis of colorectal cancer (Zheng et al., 2017) the lincRNA TINCR rs2288947 G allele and rs8113645 A allele genotypes could reduce the risk of gastric cancer. HULC, an HCC up-regulated lncRNA, also termed LINC00078, and its variants (rs7763881 and rs1041279) are linked to the susceptibility of HCC (Wang et al., 2018a). In thyroid carcinoma, several papillary thyroid carcinoma susceptibility candidates, such as PTCSC2, contain a risk-variant rs965513, and PTCSC3 encompasses rs944289 two lincRNA expression levels are strongly down-regulated in thyroid carcinoma tissues (Jendrzejewski et al., 2012 He et al., 2015). Additionally, GWAS analyses have identified five tag-SNVs, including rs944289 located in PTCSC3, are associated with large-vessel ischemic stroke (Lee et al., 2016). Xue et al. (2013) have reported that a prostate cancer gene expression marker, PCGEM1 (LINC00071), containing two risk-SNVs (rs6434568 C and rs16834898 A alleles) that are associated with a decreased risk of prostate cancer. Another prostate cancer risk-associated allele rs75823044 mapping to promoter of LINC00676 is almost exclusively found in African ancestry populations (Conti et al., 2017). In a GWAS analysis, five common variants including rs3803662 on the exon of CASC16 (LINC00918) have been identified to contribute to the susceptibility of lung and breast cancers (Orr et al., 2011). Furthermore, the colorectal cancer risk-SNV rs11776042 is located in the promoter of LNC00964, in which lincRNA is significantly decreased in colorectal cancer tissues (Chu et al., 2015). For tumor suppressor lncRNA GAS5, an insertion/deletion variant of rs145204276 is associated with the susceptibility of HCC (Tao et al., 2015) and colorectal and gastric cancers (Li et al., 2018a).

Other Disease-Associated Variants in Long Intergenic Non-protein Coding RNA Regions

Except for cancer susceptibility, some lincRNA variants are found to be associated with the risk of other heterogeneous diseases. GWAS and expression quantitative trait locus (eQTL) analyses have identified a risk factor for pathological inflammatory responses of leprosy, SNV rs1875147, which is an eQTL variant for lincRNA LOC105378318 located in chromosome 10p21.2 (Fava et al., 2017). Rautanen et al. have found a variant rs140817150 in the intron of LOC107986770, which may be correlated with bacteremia susceptibility in African children (Kenyan Bacteraemia Study Group et al., 2016). A systematic analysis highlights some variant loci in lncRNA regions linked to cardiometabolic disorders one of them, lincRNA LOC157273 harboring rs4841132, is linked to the regulation of serum lipid cholesterol (Ghanbari et al., 2018). Shyn et al.’s (2011) GWAS analysis has identified a major depressive disorder (MDD) risk-associated variant rs12526133, which resides in exon of LINC01108, in which lincRNA is overexpressed in patients with MDD. Moreover, the maternally expressed imprinted gene, MEG3 (also termed LINC00023), containing variants rs941576 (Wallace et al., 2010) and rs34552516 (Westra et al., 2018), which is found to be associated with susceptibility of type 1 diabetes. Nikpay et al.’s (2015) comprehensive GWAS meta-analyses have reported an association of CAD susceptibility with several SNVs, such as rs1870634, which is located in the downstream of LINC00841, and its GG genotype is strongly linked to CAD risk and has a higher frequency in CAD patients.


Long intergenic non-coding RNAs in hepatocellular carcinoma—a focus on Linc00176

Provenance: This is a Guest Editorial commissioned by Section Editor Meiyi Song (Division of Gastroenterology and Hepatology, Digestive Disease Institute, Tongji Hospital, Tongji University School of Medicine, Shanghai, China).

Received: 06 February 2018 Accepted: 13 February 2018 Published: 19 March 2018.

Liver cancer is a common type of cancer that ranks the second in cancer-related mortality world-wide (1). Hepatocellular carcinoma (HCC) is the predominant type of liver cancer and it usually arises on the background of cirrhosis. Risk factors associated with the incidence of HCC include viral hepatitis (hepatitis B and C viruses), the metabolic syndrome including non-alcoholic fatty liver diseases, autoimmune hepatitis as well as aflatoxin-B ingestion (2,3). The multikinase inhibitor, sorafenib, is the only 1 st line medical therapy widely available, with its sister drug regorafenib approved for 2 nd line use in selected patients (4,5). These drugs have only a limited impact on life expectancy. Transcriptomic changes accompanying cancer development and progression have been extensively explored in patients with HCC, with subgroups created based on transcription profile and mutational status (6-9). These studies to date have not adequately informed selection of candidate drugs for clinical trials, the vast majority of which have failed to show any survival benefit. Although trial design and toxicities have contributed to failures (8), our lack of global understanding of the tumour biology is likely also to have played a role. Recently, rather than sticking to the central transcriptome dogma of DNA-mRNA-protein, researchers have started to appreciate the many other transcripts out of this workflow that are cancer specific and functional, as alternative sources for the identification of novel candidate therapeutic targets. High-throughput sequencing techniques in combination with advanced computational predicting software and epigenetic tools have identified novel RNA transcripts and are even capable of predicting specific functions. These novel identified transcripts include “long non-coding RNAs (lncRNAs)”, a term generating more than 13,500 hits in a PubMed search at the end of 2017—three quarters of which were published in the last decade. These include a recent article by Tran and colleagues, published in Oncogene, in which the lncRNA Linc00176 has been interrogated in publicly available datasets. It has been reported as an upregulated transcript in HCC that is worthy of pursuit as a biomarker directed target for cancer therapy (10).

The term lncRNA defines a category of RNA transcripts more than 200 nucleotides in length that don’t encode for proteins. These features differentiate lncRNAs from short transcripts like miRNAs, tRNAs or snoRNAs, as well as from mRNAs. LncRNA and mRNA share similarities in the regulation of their transcription and post-transcriptional processing, but lncRNAs are shorter in length and have fewer exons and conserved primary structures compared to protein coding transcripts (11). Integration of the human RNA transcriptome with advanced bioinformatics analysis has indicated that over 60% of transcripts in the so called “MiTranscriptome” of human long poly-adenylated RNA transcripts are lncRNAs (12). Located within the intergenic regions and expressed at lower levels than protein coding RNAs, unsupervised clustering of the differentially expressed transcripts has identified the existence of distinct tissue-specific signatures of lncRNAs. Moreover, lncRNAs are differentially expressed between cancer and normal tissue within the same organ (12). In the liver, RNA sequencing of 60 primary HCC in Chinese patients, matched with tissue from portal vein tumour thrombosis (PVTT) and adjacent non-tumour liver, has identified a deregulated pool of lncRNAs in the primary and metastatic tumours (13). Remarkably, more than 75% of sequenced lncRNAs identified in HCC had not been previously annotated in either the MiTranscriptome (12) or the GENCODE transcriptome (14). In this cohort of Chinese patients with HCC, DNA methylation and copy number variation (CNV) were associated with deregulated lncRNAs pool and the lncRNA transcripts appeared to have regulatory roles in cellular functions, such as the immune response and cell adhesion. The tissue and cancer specificity of lncRNAs, alongside key regulatory roles, implies importance in cancer pathophysiology and highlights their potential as novel cancer-specific targets.

The transcription/export (TREX) mRNA export complex is a master regulator of mRNA biogenesis, being involved in different steps of mRNA transcription, processing and export. The mammalian TREX complex includes THOC1 (hHpr1), THOC2, THOC7, THOC5 (FMIP), THOC6, THOC3 (hTEX1), Uap56, DDX39c and Aly (15). The Hannover research group led by Teruko Tamura has previously shown that THOC5 null mice have drastically reduced numbers of hematopoietic system and myeloid progenitor cells, without any effect in adult kidney, heart and liver (16). The group hypothesised that THOC5 was essential for the maintenance of stem cells/progenitor cells but not for the terminally differentiated cells like hepatocytes. The same group demonstrated elevated levels of THOC5 in HCC tissues and cell lines, alongside an induction of apoptosis and cell cycle deregulation in vitro, after THOC5 knock down (17). Hypothesising that THOC5 target genes were important in liver cancer cell survival, Tamura’s team went on to study one target gene in particular—namely the lncRNA 00176 (Linc00176).

Linc00176 (NR_027686.1) is located on chromosome 20 and its HCC transcript is reported to have 4 exons and be 5,264 nucleotides long. In the HepG2 HCC cell line an alternatively spliced Linc00176 transcript, lacking exon 1, 1,601 nucleotides from exon 2 as well as 962 nucleotides from exon 4, has been identified (18). Apart from this information, little was known about the role of Linc00176 in HCC. Tran et al. (10) have now explored HCC patients data available from The Cancer Genome Atlas (TCGA) and the ENCODE consortium (18). Linc00176 alternatively spliced transcripts were present in HepG2, Hep3B, Huh7, HLE and HLF HCC cell lines, with HepG2 cells showing the highest expression levels. Linc00176 expression wasn’t evident in any normal human tissues (including liver, pancreas, lung, skin, brain, adipose tissue, muscle, heart and bone marrow) or other malignancies (leukaemia, melanoma, breast cancer, neuroblastoma, rhabdomyosarcoma and cervical human cancer cell lines) (10). In human primary HCC data in TCGA, high versus low levels of Linc00176 expression was significantly associated with both poorly differentiated tumour grade and poorer patient survival.

Studying Linc00176 putative promoter region (500 nucleotides upstream of the initiation site), Tran et al. have integrated human hepatocyte and HepG2 cells DNase-Seq, RNA-Seq, ChIP seq and cap analysis of gene expression (CAGE) data (18) and applied PROMO (ALGGEN, program predicting transcription factor binding sites) software. The team focused on a number of candidate regulatory transcription factors (Myc, MAZ and AP-4). Myc was reported to preferentially regulate the transcription machinery of lncRNAs (19) and Tran et al. have expanded this observation in vitro by showing that Myc/AP-4 double knock down synergistically depleted Linc00176 expression (10). Functionally, depletion of Linc00176 from Huh7 and HepG2 abolished their proliferation and converted them to TUNEL-positive cells after 2 days compared to control cells. Unlike THOC5 depletion-induced apoptosis (17), Linc00176 reportedly to exerts its role through necroptosis via mixed lineage kinase-like (MLKL) (10). The bioinformatics ingenuity pathway analysis (IPA) performed on the genes differentially expressed in HepG2 cells after knock down of Linc00176 supported this finding, ranking cancer and necrosis as the top deregulated pathways.

Having identified upstream regulators of Linc00176, as well as the impact of its knockdown, Tran et al. considered the three main mechanisms by which lncRNAs bring about their regulatory functions (20). Some lncRNAs act as physical scaffolds forming flexible macromolecular complex with nuclear matrix, chromatin regulatory and DNA methylation proteins to control the chromatin state. Other lncRNAs recruit chromatin regulatory machinery to specific DNA loci either through their affinity to some regulatory proteins or through 3D proximity-guided localisation to certain gene loci. Alternatively, many lncRNAs shape the nuclear structure in a way that encourage/discourage gene expression. Tran et al. went on to demonstrate that Linc00176 binds to two anti-tumour, Myc-regulated miRNAs, namely miRNA-9 and miRNA-185 (10). Depleting the concentrations of these miRNAs diminished their anti-proliferative effects on the tumour cells. Moreover, depletion of these two miRNAs using specific inhibitors rescued the proliferation-prohibited effect of Linc00176 knock down in tumour cell lines.

This work by Tamuras’ group (10) exploits publicly available datasets, combined with confirmatory and exploratory laboratory studies, in a fashion that brings lncRNAs to the heart of the HCC field. Not only are these transcripts highly specific for HCC, suggesting potential roles in HCC specific diagnosis, their functional roles also raise the potential for intervention to either prevent or treat cancers. Linc00176 joins other lncRNAs, like previously reported RP11-166D19.1 (13), as a candidate diagnostic biomarker. In addition, this study suggests that targeting of Linc00176 would not only selectively inhibit the proliferation of tumour cells, it would also favour the development of an anti-tumour niche by increasing the availability of tumour-limiting miRNAs like miRNA-9 and miRNA-185. The development of diagnostic and monitoring biomarker assays, measuring Linc00176 and its targets in tissues, alongside the means to target its functional effects, may be worthy of consideration for patients with HCC. There may be other lncRNAs, or combinations, which have relevance to different patients, possibly with cancers arising in different etiological backgrounds. This is a landmark study not just because it highlights the potential importance of Linc00176, but because it leads the way in HCC, demonstrating the translational value of systematic interrogation and integration of publicly available data (Figure 1).


References

Djebali, S. et al. Landscape of transcription in human cells. Nature 489, 101–108 (2012).

Derrien, T. et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 22, 1775–1789 (2012).

Katayama, S. et al. Antisense transcription in the mammalian transcriptome. Science 309, 1564–1566 (2005).

Wilusz, J. E., Freier, S. M. & Spector, D. L. 3′ end processing of a long nuclear-retained noncoding RNA yields a tRNA-like cytoplasmic RNA. Cell 135, 919–932 (2008).

Sunwoo, H. et al. MEN ε/β nuclear-retained non-coding RNAs are up-regulated upon muscle differentiation and are essential components of paraspeckles. Genome Res. 19, 347–359 (2009).

Yin, Q. F. et al. Long noncoding RNAs with snoRNA ends. Mol. Cell 48, 219–230 (2012).

Wu, H. et al. Unusual processing generates SPA lncRNAs that sequester multiple RNA binding proteins. Mol. Cell 64, 534–548 (2016).

Xing, Y. H. et al. SLERT regulates DDX21 rings associated with Pol I transcription. Cell 169, 664–678.e616 (2017).

Salzman, J., Gawad, C., Wang, P. L., Lacayo, N. & Brown, P. O. Circular RNAs are the predominant transcript isoform from hundreds of human genes in diverse cell types. PLoS One 7, e30733 (2012).

Memczak, S. et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature 495, 333–338 (2013).

Zhang, Y. et al. Circular intronic long noncoding RNAs. Mol. Cell 51, 792–806 (2013).

Li, W., Notani, D. & Rosenfeld, M. G. Enhancers as non-coding RNA transcription units: recent insights and future perspectives. Nat. Rev. Genet. 17, 207–223 (2016).

Ntini, E. et al. Polyadenylation site-induced decay of upstream transcripts enforces promoter directionality. Nat. Struct. Mol. Biol. 20, 923–928 (2013).

Kim, T. K. et al. Widespread transcription at neuronal activity-regulated enhancers. Nature 465, 182–187 (2010).

Lam, M. T. et al. Rev-Erbs repress macrophage gene expression by inhibiting enhancer-directed transcription. Nature 498, 511–515 (2013).

Anderson, K. M. et al. Transcription of the non-coding RNA upperhand controls Hand2 expression and heart development. Nature 539, 433–436 (2016).

Engreitz, J. M. et al. Local regulation of gene expression by lncRNA promoters, transcription and splicing. Nature 539, 452–455 (2016).

Cho, S. W. et al. Promoter of lncRNA gene PVT1 is a tumor-suppressor DNA boundary element. Cell 173, 1398–1412.e1322 (2018).

Anderson, D. M. et al. A micropeptide encoded by a putative long noncoding RNA regulates muscle performance. Cell 160, 595–606 (2015).

Nelson, B. R. et al. A peptide encoded by a transcript annotated as long noncoding RNA enhances SERCA activity in muscle. Science 351, 271–275 (2016).

Cabili, M. N. et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011).

Melé, M. et al. Chromatin environment, transcriptional regulation, and splicing distinguish lincRNAs and mRNAs. Genome Res. 27, 27–37 (2017).

Schlackow, M. et al. Distinctive patterns of transcription and RNA processing for human lincRNAs. Mol. Cell 65, 25–38 (2017).

Hacisuleyman, E. et al. Topological organization of multichromosomal regions by the long intergenic noncoding RNA Firre. Nat. Struct. Mol. Biol. 21, 198–206 (2014).

Zhang, B. et al. A novel RNA motif mediates the strict nuclear localization of a long noncoding RNA. Mol. Cell. Biol. 34, 2318–2329 (2014).

Shukla, C. J. et al. High-throughput identification of RNA nuclear enrichment sequences. EMBO J. 37, e98452 (2018).

Lubelsky, Y. & Ulitsky, I. Sequences enriched in Alu repeats drive nuclear localization of long RNAs in human cells. Nature 555, 107–111 (2018).

Yamazaki, T. et al. Functional domains of NEAT1 architectural lncRNA induce paraspeckle assembly through phase separation. Mol. Cell 70, 1038–1053.e1037 (2018).

Pombo, A. & Dillon, N. Three-dimensional genome architecture: players and mechanisms. Nat. Rev. Mol. Cell Biol. 16, 245–257 (2015).

Hall, L. L. & Lawrence, J. B. XIST RNA and architecture of the inactive X chromosome: implications for the repeat genome. Cold Spring Harb. Symp. Quant. Biol. 75, 345–356 (2010).

Jégu, T., Aeby, E. & Lee, J. T. The X chromosome in space. Nat. Rev. Genet. 18, 377–389 (2017).

Creamer, K. M. & Lawrence, J. B. XIST RNA: a window into the broader role of RNA in nuclear chromosome architecture. Phil. Trans. R. Soc. Lond. B 372, 20160360 (2017).

Zhang, L. F., Huynh, K. D. & Lee, J. T. Perinucleolar targeting of the inactive X during S phase: evidence for a role in the maintenance of silencing. Cell 129, 693–706 (2007).

Minajigi, A. et al. Chromosomes. A comprehensive Xist interactome reveals cohesin repulsion and an RNA-directed chromosome conformation. Science 349, aab2276 (2015).

Chen, C. K. et al. Xist recruits the X chromosome to the nuclear lamina to enable chromosome-wide silencing. Science 354, 468–472 (2016).

McHugh, C. A. et al. The Xist lncRNA interacts directly with SHARP to silence transcription through HDAC3. Nature 521, 232–236 (2015).

Wang, C. Y., Froberg, J. E., Blum, R., Jeon, Y. & Lee, J. T. Comment on “Xist recruits the X chromosome to the nuclear lamina to enable chromosome-wide silencing”. Science 356, eaal4976 (2017).

Hall, L. L. et al. Stable C0T-1 repeat RNA is abundant and is associated with euchromatic interphase chromosomes. Cell 156, 907–919 (2014).

Xiang, J. F. et al. Human colorectal cancer-specific CCAT1-L lncRNA regulates long-range chromatin interactions at the MYC locus. Cell Res. 24, 513–531 (2014).

Hasegawa, Y. et al. The matrix protein hnRNP U is required for chromosomal localization of Xist RNA. Dev. Cell 19, 469–476 (2010).

Wang, J. et al. Unusual maintenance of X chromosome inactivation predisposes female lymphocytes for increased expression from the inactive X. Proc. Natl Acad. Sci. USA 113, E2029–E2038 (2016).

Sunwoo, H., Colognori, D., Froberg, J. E., Jeon, Y. & Lee, J. T. Repeat E anchors Xist RNA to the inactive X chromosomal compartment through CDKN1A-interacting protein (CIZ1). Proc. Natl Acad. Sci. USA 114, 10654–10659 (2017).

Zhao, J., Sun, B. K., Erwin, J. A., Song, J. J. & Lee, J. T. Polycomb proteins targeted by a short repeat RNA to the mouse X chromosome. Science 322, 750–756 (2008).

Davidovich, C. et al. Toward a consensus on the binding specificity and promiscuity of PRC2 for RNA. Mol. Cell 57, 552–558 (2015).

Cerase, A. et al. Spatial separation of Xist RNA and polycomb proteins revealed by superresolution microscopy. Proc. Natl Acad. Sci. USA 111, 2235–2240 (2014).

Sunwoo, H., Wu, J. Y. & Lee, J. T. The Xist RNA-PRC2 complex at 20-nm resolution reveals a low Xist stoichiometry and suggests a hit-and-run mechanism in mouse cells. Proc. Natl Acad. Sci. USA 112, E4216–E4225 (2015).

Almeida, M. et al. PCGF3/5-PRC1 initiates Polycomb recruitment in X chromosome inactivation. Science 356, 1081–1084 (2017).

Chu, C. et al. Systematic discovery of Xist RNA binding proteins. Cell 161, 404–416 (2015).

Rinn, J. L. et al. Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell 129, 1311–1323 (2007).

Li, L. et al. Targeted disruption of Hotair leads to homeotic transformation and gene derepression. Cell Rep 5, 3–12 (2013).

Chu, C., Qu, K., Zhong, F. L., Artandi, S. E. & Chang, H. Y. Genomic maps of long noncoding RNA occupancy reveal principles of RNA-chromatin interactions. Mol. Cell 44, 667–678 (2011).

Amândio, A. R., Necsulea, A., Joye, E., Mascrez, B. & Duboule, D. Hotair is dispensible for mouse development. PLoS Genet. 12, e1006232 (2016).

Lai, K. M. et al. Diverse phenotypes and specific transcription patterns in twenty mouse lines with ablated lincRNAs. PLoS One 10, e0125522 (2015).

Portoso, M. et al. PRC2 is dispensable for HOTAIR-mediated transcriptional repression. EMBO J. 36, 981–994 (2017).

Mili, S. & Steitz, J. A. Evidence for reassociation of RNA-binding proteins after cell lysis: implications for the interpretation of immunoprecipitation analyses. RNA 10, 1692–1694 (2004).

Han, P. et al. A long noncoding RNA protects the heart from pathological hypertrophy. Nature 514, 102–106 (2014).

Jain, A. K. et al. LncPRESS1 is a p53-regulated lncRNA that safeguards pluripotency by disrupting SIRT6-mediated de-acetylation of histone H3K56. Mol. Cell 64, 967–981 (2016).

Postepska-Igielska, A. et al. LncRNA Khps1 regulates expression of the proto-oncogene SPHK1 via triplex-mediated changes in chromatin structure. Mol. Cell 60, 626–636 (2015).

Boque-Sastre, R. et al. Head-to-head antisense transcription and R-loop formation promotes transcriptional activation. Proc. Natl Acad. Sci. USA 112, 5785–5790 (2015).

Azzalin, C. M., Reichenbach, P., Khoriauli, L., Giulotto, E. & Lingner, J. Telomeric repeat containing RNA and RNA surveillance factors at mammalian chromosome ends. Science 318, 798–801 (2007).

Graf, M. et al. Telomere length determines TERRA and R-loop regulation through the cell cycle. Cell 170, 72–85.e14 (2017).

Marchese, F. P. et al. A long noncoding RNA regulates sister chromatid cohesion. Mol. Cell 63, 397–407 (2016).

Mariner, P. D. et al. Human Alu RNA is a modular transacting repressor of mRNA transcription during heat shock. Mol. Cell 29, 499–509 (2008).

Espinoza, C. A., Allen, T. A., Hieb, A. R., Kugel, J. F. & Goodrich, J. A. B2 RNA binds directly to RNA polymerase II to repress transcript synthesis. Nat. Struct. Mol. Biol. 11, 822–829 (2004).

Yang, Z., Zhu, Q., Luo, K. & Zhou, Q. The 7SK small nuclear RNA inhibits the CDK9/cyclin T1 kinase to control transcription. Nature 414, 317–322 (2001).

Calo, E. et al. RNA helicase DDX21 coordinates transcription and ribosomal RNA processing. Nature 518, 249–253 (2015).

Sleutels, F., Zwart, R. & Barlow, D. P. The non-coding Air RNA is required for silencing autosomal imprinted genes. Nature 415, 810–813 (2002).

Latos, P. A. et al. Airn transcriptional overlap, but not its lncRNA products, induces imprinted Igf2r silencing. Science 338, 1469–1472 (2012).

Tseng, Y. Y. et al. PVT1 dependence in cancer with MYC copy-number increase. Nature 512, 82–86 (2014).

Staněk, D. & Fox, A. H. Nuclear bodies: news insights into structure and function. Curr. Opin. Cell Biol. 46, 94–101 (2017).

Chujo, T., Yamazaki, T. & Hirose, T. Architectural RNAs (arcRNAs): a class of long noncoding RNAs that function as the scaffold of nuclear bodies. Biochim. Biophys. Acta 1859, 139–146 (2016).

Hutchinson, J. N. et al. A screen for nuclear transcripts identifies two linked noncoding RNAs associated with SC35 splicing domains. BMC Genomics 8, 39 (2007).

Clemson, C. M. et al. An architectural role for a nuclear noncoding RNA: NEAT1 RNA is essential for the structure of paraspeckles. Mol. Cell 33, 717–726 (2009).

Chen, L. L. & Carmichael, G. G. Altered nuclear retention of mRNAs containing inverted repeats in human embryonic stem cells: functional role of a nuclear noncoding RNA. Mol. Cell 35, 467–478 (2009).

Bond, C. S. & Fox, A. H. Paraspeckles: nuclear bodies built on long noncoding RNA. J. Cell Biol. 186, 637–644 (2009).

Sasaki, Y. T., Ideue, T., Sano, M., Mituyama, T. & Hirose, T. MENε/β noncoding RNAs are essential for structural integrity of nuclear paraspeckles. Proc. Natl Acad. Sci. USA 106, 2525–2530 (2009).

Naganuma, T. et al. Alternative 3′-end processing of long noncoding RNA initiates construction of nuclear paraspeckles. EMBO J. 31, 4020–4034 (2012).

Mao, Y. S., Sunwoo, H., Zhang, B. & Spector, D. L. Direct visualization of the co-transcriptional assembly of a nuclear body by noncoding RNAs. Nat. Cell Biol. 13, 95–101 (2011).

Hirose, T. et al. NEAT1 long noncoding RNA regulates transcription via protein sequestration within subnuclear bodies. Mol. Biol. Cell 25, 169–183 (2014).

Wang, Y. et al. Genome-wide screening of NEAT1 regulators reveals cross-regulation between paraspeckles and mitochondria. Nat. Cell Biol. 20, 1145–1158 (2018).

Souquere, S., Beauclair, G., Harper, F., Fox, A. & Pierron, G. Highly ordered spatial organization of the structural long noncoding NEAT1 RNAs within paraspeckle nuclear bodies. Mol. Biol. Cell 21, 4020–4027 (2010).

West, J. A. et al. Structural, super-resolution microscopy analysis of paraspeckle nuclear body organization. J. Cell Biol. 214, 817–830 (2016).

Yamazaki, T. & Hirose, T. The building process of the functional paraspeckle with long non-coding RNAs. Front. Biosci. (Elite Ed.) 7, 1–41 (2015).

Imamura, K. et al. Long noncoding RNA NEAT1-dependent SFPQ relocation from promoter region to paraspeckle mediates IL8 expression upon immune stimuli. Mol. Cell 53, 393–406 (2014).

Jiang, L. et al. NEAT1 scaffolds RNA-binding proteins and the microprocessor to globally enhance pri-miRNA processing. Nat. Struct. Mol. Biol. 24, 816–824 (2017).

Prasanth, K. V. et al. Regulating gene expression through RNA nuclear retention. Cell 123, 249–263 (2005).

Chen, L. L., DeCerbo, J. N. & Carmichael, G. G. Alu element-mediated gene silencing. EMBO J. 27, 1694–1705 (2008).

Hu, S. B. et al. Protein arginine methyltransferase CARM1 attenuates the paraspeckle-mediated nuclear retention of mRNAs containing IRAlus. Genes Dev. 29, 630–645 (2015).

Torres, M. et al. Circadian RNA expression elicited by 3'-UTR IRAlu-paraspeckle associated elements. ELfie 5, e14837 (2016).

Adriaens, C. et al. p53 induces formation of NEAT1 lncRNA-containing paraspeckles that modulate replication stress response and chemosensitivity. Nat. Med. 22, 861–868 (2016).

Mello, S. S. et al. Neat1 is a p53-inducible lincRNA essential for transformation suppression. Genes Dev. 31, 1095–1108 (2017).

Nakagawa, S., Naganuma, T., Shioi, G. & Hirose, T. Paraspeckles are subpopulation-specific nuclear bodies that are not essential in mice. J. Cell Biol. 193, 31–39 (2011).

Nakagawa, S. et al. The lncRNA Neat1 is required for corpus luteum formation and the establishment of pregnancy in a subpopulation of mice. Development 141, 4618–4627 (2014).

Standaert, L. et al. The long noncoding RNA Neat1 is required for mammary gland development and lactation. RNA 20, 1844–1849 (2014).

Valgardsdottir, R. et al. Transcription of Satellite III non-coding RNAs is a general stress response in human cells. Nucleic Acids Res. 36, 423–434 (2008).

Mannen, T., Yamashita, S., Tomita, K., Goshima, N. & Hirose, T. The Sam68 nuclear body is composed of two RNase-sensitive substructures joined by the adaptor HNRNPL. J. Cell Biol. 214, 45–59 (2016).

Caudron-Herger, M. et al. Alu element-containing RNAs maintain nucleolar structure and function. EMBO J. 34, 2758–2774 (2015).

Spector, D. L. & Lamond, A. I. Nuclear speckles. Cold Spring Harb. Perspect. Biol. 3, a000646 (2011).

Nakagawa, S. et al. Malat1 is not an essential component of nuclear speckles in mice. RNA 18, 1487–1499 (2012).

Zhang, B. et al. The lncRNA Malat1 is dispensable for mouse development but its transcription plays a cis-regulatory role in the adult. Cell Rep. 2, 111–123 (2012).

Fei, J. et al. Quantitative analysis of multilayer organization of proteins and RNA in nuclear speckles at super resolution. J. Cell Sci. 130, 4180–4192 (2017).

Änkö, M. L. et al. The RNA-binding landscapes of two SR proteins reveal unique functions and binding to diverse RNA classes. Genome Biol. 13, R17 (2012).

Tripathi, V. et al. The nuclear-retained noncoding RNA MALAT1 regulates alternative splicing by modulating SR splicing factor phosphorylation. Mol. Cell 39, 925–938 (2010).

Latorre, E. et al. The ribonucleic complex HuR-MALAT1 represses CD133 expression and suppresses epithelial-mesenchymal transition in breast cancer. Cancer Res. 76, 2626–2636 (2016).

Ji, Q. et al. Long non-coding RNA MALAT1 promotes tumour growth and metastasis in colorectal cancer through binding to SFPQ and releasing oncogene PTBP2 from SFPQ/PTBP2 complex. Br. J. Cancer 111, 736–748 (2014).

Malakar, P. et al. Long noncoding RNA MALAT1 promotes hepatocellular carcinoma development by SRSF1 upregulation and mTOR activation. Cancer Res. 77, 1155–1167 (2017).

Michalik, K. M. et al. Long noncoding RNA MALAT1 regulates endothelial cell function and vessel growth. Circ. Res. 114, 1389–1397 (2014).

Arun, G. et al. Differentiation of mammary tumors and reduction in metastasis upon Malat1 lncRNA loss. Genes Dev. 30, 34–51 (2016).

West, J. A. et al. The long noncoding RNAs NEAT1 and MALAT1 bind active chromatin sites. Mol. Cell 55, 791–802 (2014).

Engreitz, J. M. et al. RNA-RNA interactions enable specific targeting of noncoding RNAs to nascent pre-mRNAs and chromatin sites. Cell 159, 188–199 (2014).

Sun, Q., Hao, Q. & Prasanth, K. V. Nuclear long noncoding RNAs: key regulators of gene expression. Trends Genet. 34, 142–157 (2018).

Kopp, F. & Mendell, J. T. Functional classification and experimental dissection of long noncoding RNAs. Cell 172, 393–407 (2018).

Sridhar, B. et al. Systematic mapping of RNA-chromatin interactions in vivo. Curr. Biol. 27, 602–609 (2017).

Salmena, L., Poliseno, L., Tay, Y., Kats, L. & Pandolfi, P. P. A ceRNA hypothesis: the Rosetta Stone of a hidden RNA language? Cell 146, 353–358 (2011).

Poliseno, L. et al. A coding-independent function of gene and pseudogene mRNAs regulates tumour biology. Nature 465, 1033–1038 (2010).

Bosson, A. D., Zamudio, J. R. & Sharp, P. A. Endogenous miRNA and target concentrations determine susceptibility to potential ceRNA competition. Mol. Cell 56, 347–359 (2014).

Denzler, R., Agarwal, V., Stefano, J., Bartel, D. P. & Stoffel, M. Assessing the ceRNA hypothesis with quantitative measurements of miRNA and target abundance. Mol. Cell 54, 766–776 (2014).

Hansen, T. B. et al. Natural RNA circles function as efficient microRNA sponges. Nature 495, 384–388 (2013).

Kleaveland, B., Shi, C. Y., Stefano, J. & Bartel, D. P. A network of noncoding regulatory RNAs acts in the mammalian brain. Cell 174, 350–362.e317 (2018).

Kim, Y. K., Furic, L., Desgroseillers, L. & Maquat, L. E. Mammalian Staufen1 recruits Upf1 to specific mRNA 3'UTRs so as to elicit mRNA decay. Cell 120, 195–208 (2005).

Gong, C. & Maquat, L. E. lncRNAs transactivate STAU1-mediated mRNA decay by duplexing with 3′ UTRs via Alu elements. Nature 470, 284–288 (2011).

Zamore, P. D., Williamson, J. R. & Lehmann, R. The Pumilio protein binds RNA through a conserved domain that defines a new class of RNA-binding proteins. RNA 3, 1421–1433 (1997).

Miller, M. A. & Olivas, W. M. Roles of Puf proteins in mRNA degradation and translation. Wiley Interdiscip. Rev. RNA 2, 471–492 (2011).

Lee, S. et al. Noncoding RNA NORAD regulates genomic stability by sequestering PUMILIO proteins. Cell 164, 69–80 (2016).

Tichon, A. et al. A conserved abundant cytoplasmic long noncoding RNA modulates repression by Pumilio proteins in human cells. Nat. Commun. 7, 12209 (2016).

Munschauer, M. et al. The NORAD lncRNA assembles a topoisomerase complex critical for genome stability. Nature 561, 132–136 (2018).

Ingolia, N. T., Lareau, L. F. & Weissman, J. S. Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell 147, 789–802 (2011).

Ingolia, N. T. et al. Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes. Cell Rep. 8, 1365–1379 (2014).

Yoon, J. H. et al. LincRNA-p21 suppresses target mRNA translation. Mol. Cell 47, 648–655 (2012).

Carrieri, C. et al. Long non-coding antisense RNA controls Uchl1 translation through an embedded SINEB2 repeat. Nature 491, 454–457 (2012).

Wang, P. et al. The STAT3-binding long noncoding RNA lnc-DC controls human dendritic cell differentiation. Science 344, 310–313 (2014).

Liu, B. et al. A cytoplasmic NF-κB interacting long noncoding RNA blocks IκB phosphorylation and suppresses breast cancer metastasis. Cancer Cell 27, 370–381 (2015).

Gutschner, T., Baas, M. & Diederichs, S. Noncoding RNA gene silencing through genomic integration of RNA destabilizing elements using zinc finger nucleases. Genome Res. 21, 1944–1954 (2011).

Liu, S. J. et al. CRISPRi-based genome-scale identification of functional long noncoding RNA loci in human cells. Science 355, eaah7111 (2017).

Joung, J. et al. Genome-scale activation screen identifies a lncRNA locus regulating a gene neighbourhood. Nature 548, 343–346 (2017).

Bester, A. C. et al. An integrated genome-wide CRISPRa approach to functionalize lncRNAs in drug resistance. Cell 173, 649–664.e620 (2018).

Konermann, S. et al. Transcriptome engineering with RNA-targeting type VI-D CRISPR effectors. Cell 173, 665–676.e614 (2018).

Abudayyeh, O. O. et al. RNA targeting with CRISPR-Cas13. Nature 550, 280–284 (2017).

Simon, M. D. et al. High-resolution Xist binding maps reveal two-step spreading during X-chromosome inactivation. Nature 504, 465–469 (2013).

Engreitz, J. M. et al. The Xist lncRNA exploits three-dimensional genome architecture to spread across the X chromosome. Science 341, 1237973 (2013).

Murigneux, V., Saulière, J., Roest Crollius, H. & Le Hir, H. Transcriptome-wide identification of RNA binding sites by CLIP-seq. Methods 63, 32–40 (2013).

Van Nostrand, E. L. et al. Robust transcriptome-wide discovery of RNA-binding protein binding sites with enhanced CLIP (eCLIP). Nat. Methods 13, 508–514 (2016).

Cabili, M. N. et al. Localization and abundance analysis of human lncRNAs at single-cell and single-molecule resolution. Genome Biol. 16, 20 (2015).

Li, R., Harvey, A. R., Hodgetts, S. I. & Fox, A. H. Functional dissection of NEAT1 using genome editing reveals substantial localization of the NEAT1_1 isoform outside paraspeckles. RNA 23, 872–881 (2017).

Shah, S. et al. Dynamics and spatial genomics of the nascent transcriptome by intron seqFISH. Cell 174, 363–376.e316 (2018).

Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S. & Zhuang, X. RNA imaging. Spatially resolved, highly multiplexed RNA profiling in single cells. Science 348, aaa6090 (2015).

Chen, M. et al. A molecular beacon-based approach for live-cell imaging of RNA transcripts with minimal target engineering at the single-molecule level. Sci. Rep. 7, 1550 (2017).

Nelles, D. A. et al. Programmable RNA tracking in live cells with CRISPR/Cas9. Cell 165, 488–496 (2016).

Colognori, D., Sunwoo, H., Kriz, A. J., Wang, C. Y. & Lee, J. T. Xist deletional analysis reveals an interdependency between Xist RNA and Polycomb complexes for spreading along the inactive X. Mol. Cell. https://doi.org/10.1016/j.molcel.2019.01.015 (2019).


Methods

Databases

Five sets of RNA-seq data were downloaded from the NCBI Sequence Read Archive (SRA) database (Additional file 1). These datasets include two single-read samples and three paired-end samples that were 36 to 100 bp long sequenced on Illumina platforms (

100 million reads for the total). The Bos taurus UMD3.1 reference genome FASTA file and the Gene Transfer Format (GTF) file were downloaded from the ensembl website (http://asia.ensembl.org). The UniRef90 (UniProt Reference Clusters) database was downloaded from the UniProt website (http://www.ebi.ac.uk/uniprot/database/download.html).

Alignment of RNA-seq reads and assembly of transcripts

The quality control of downloaded RNA sequences was performed by the FastQC software (http://www.bioinformatics.babraham.ac.uk/projects/fastqc, version 0.11.2). Adaptors were filtered using the Trimmomatic program (http://www.usadellab.org/cms/?page=trimmomatic, version 0.33). RNA-seq reads from bovine mammary glands were aligned to the Bos Taurus UMD3.1 reference genome with the TopHat2 (version 2.0.12) [17]. Mapped reads were assembled with the Cufflinks (version 2.2.1) [18]. All assembled transcripts were then merged using the cuffcompare (version 2.2.1) [19].

Identification of putative LincRNAs

There is no generally accepted or standard methodology that allows for easy discovery of lincRNAs. Thus, in order to identify true intergenic lincRNAs and avoid false-positive ones, stringent conditions were applied in this study to filter the mapped reads with the following criteria: (1) Only unknown intergenic transcripts (UITs) were used to identify putative lincRNAs (2) If UITs had only one exon or the length of UITs was less than 200 bp, they were discarded [20] (3) The UITs with low expression levels (fragments per kilobase of transcript per million fragments mapped, FPKM <1) and the minimal read coverage threshold of these transcript below three were discarded (4) The UITs closest to the coding gene less than 1 kb were discarded (5) To predict coding potentials of the remaining UITs which are not annotated in the bovine genome, four programs including the CNCI, the CPC, the CPAT and the hmmscan were used concurrently. The CPC (version 0.9-r2), a SVM based algorithm (http://cpc.cbi.pku.edu.cn/), uses UniRef90 to identify protein-coding UITs or noncoding UITs. We selected the coding potential score > 0 as the coding UITs and the coding potential score < 0 as the noncoding UITs. The CPAT (version 1.2.2) uses a logistic regression model to assess the coding or noncoding transcripts in our UITs. We downloaded 10,000 bovine known protein sequences from the ensembl website and 10,000 bovine noncoding RNAs from the NONCODE database (version 4) (http://www.noncode.org/index.php). These sequences were used for training by CPAT. The software calculated the hexamer tables and a bovine specific logistic regression model was built. Because the coding probability score from the CPAT was different in different species, the cut-off value of 0.348 was chosen for reliability and sensitivity according a previous similar study of cows [11]. The UITs with scores <0.348 were retained as putative noncoding RNAs . The CNCI (version 2) program profiles the adjoining nucleotide triplets (ANT) to differentiate coding and noncoding sequences. The CNCI software was downloaded from the web (https://github.com/www-bioinfo-org/CNCI) and used with the default setting. Finally, all 886 UITs were translated into six possible open reading (ORF) frames. Six possible ORF frames contain three frames for the sense strand and three frames for antisense strand. The six ORF frames were compared against the Gene3D, Pfam, TIGRFAM and Superfamily databases using the hmmscan algorithm. If one or more motifs were found in any of six possible ORF frames, the UIT was considered as a coding UIT and was discarded. Only noncoding UITs identified by all four software programs were considered as our putative lincRNAs.

Localization of lincRNAs in quantitative trait loci

The positions of the 184 putative lincRNAs were compared with positions of know quantitative trait loci (QTL) on the Bos taurus UMD3.1 reference genome according to the AnimalQTLdb. The AnimalQTLdb is a public QTL database on animal species including cattle, chicken, horse, pig, rainbow trout and sheep (http://www.animalgenome.org/QTLdb/) [21].

Prediction of lincRNA–RNA interactions and pathway analyses

In order to predict the targets of lincRNAs in mammary glands and thus understand potential functions of lincRNAs, the LncTar tool with the first type of file format for predicting the lincRNA–RNA interactions was used. The first type of file format contains two files. One file was lincRNA sequence file and the other file was mRNA sequence file. During the analysis, only lincRNAs with the lowest normalized free energy < −0.14 were selected as possible lincRNA target genes. Predicted gene targets were used for further analyses of gene ontology (GO) functional annotations and KEGG pathway analysis using the R package clusterProfiler [22].

Validation by RT-PCR

The PCR primers for six randomly selected lincRNAs were designed by the Primer Premier 5 (PREMIER Biosoft international, Palo Alto, CA, USA). Primer sequences are in Additional file 2. Expected lengths of PCR products were from 206 to 336 bp. Total RNA was extracted from bovine mammary epithelial cells (MAC-T) by Trizol according to the manufacturer’s instructions (Invitrogen, Carlsbad, CA). The first strand of cDNA was synthesized using the PrimeScript™ RT reagent kit (Takara, Dalian, China) according to the manufacturer’s instructions. PCR was performed using the 2 × EasyTaq PCR SuperMix (TransGen Biotech, Beijing, China). The following PCR cycling condition was used: 94 °C 5 min, followed by 35 cycles of 94 °C for 30s, annealing for 30s (annealing temperatures for the lincRNAs are in Additional file 2), 72 °C for 30s and a final extension step was 72 °C for 10 min. Five ul of each PCR product was analyzed by 1% agarose gel.


Susan Gottesman, Ph.D.

Dr. Gottesman has pioneered studies on post-transcriptional mechanisms of regulation in bacterial systems, with a focus on the role of energy-dependent proteolysis in regulation and the role of small non-coding RNAs in regulating translation and mRNA stability. One focus of her work has been on how these regulatory inputs affect the bacteria’s response to stress.

1) microbial genetics, 2) bacterial regulatory RNAs, 3) ATP-dependent proteolysis

Contact Info

Small Regulatory RNAs and Energy-Dependent Proteolysis: Novel Modes for the Regulation of Gene Expression

Our laboratory has been interested in novel mechanisms for gene regulation and how these mechanisms contribute to global control circuits in Escherichia coli (E. coli) For many years, the focus of the laboratory was energy-dependent proteolysis. In the past decade, much of the lab has shifted to studying small regulatory RNAs, although we continue to investigate the mechanisms for regulating energy-dependent proteolysis. We first encountered small RNAs when studying the regulation of synthesis of a substrate for the energy-dependent proteases, and we continue to see significant overlap between mRNAs regulated by small RNAs and the products of these mRNAs that are regulated by proteolysis.

This is exemplified in the regulation of RpoS, a stress sigma factor of E. coli. RpoS is rapidly degraded during exponential growth by the ClpXP protease but not when the cell is starved, stressed, or enters stationary phase. This degradation requires the response regulator protein RssB. RssB, an adaptor protein for RpoS degradation, affects degradation only of RpoS, and not of other ClpXP substrates, in vivo and in vitro. The primary question has been how environmental signals regulate RpoS degradation. Using a genetic screen, we identified multiple small anti-adaptor proteins that are made in response to specific stress signals and interfere with the ability of RssB to deliver RpoS to the protease. In collaboration with Dr. Sue Wickner, in vivo and in vitro studies are providing insight into how many anti-adaptors there are, how the anti-adaptors work, and how RssB functions.

In addition to this complex regulation of RpoS degradation, the translation of RpoS is positively regulated by multiple small RNAs. One of these, DsrA, is synthesized preferentially at low temperatures and is necessary for the low-temperature expression of RpoS. DsrA modulates RpoS synthesis by positively affecting translation of this protein by pairing with parts of the RpoS untranslated leader. A second small RNA regulator of RpoS, RprA, was also identified. RprA acts by a mechanism similar to that of DsrA in stimulating RpoS synthesis, but is regulated not by low temperature but by a two-component regulatory system responsive to cell surface status. We identified a third small RNA regulator of RpoS, ArcZ, that also positively regulates RpoS but is controlled by regulators that sense aerobic vs. anaerobic growth. Each of these small RNAs also negatively regulate other mRNAs, providing complex combinatorial regulation. The small RNAs link RpoS regulation to multiple environmental and metabolic inputs.

In collaboration with Gisela Storz (National Institutue of Child Health and Human Development) and others, we carried out a number of genome-wide searches for other small regulatory RNAs. Initially, highly conserved stretches within intergenic regions were found to be reliable hallmarks of small RNAs. In another genome-wide collaborative study, we defined small RNAs that bind the RNA chaperone Hfq, used by fully 1/3 of the small RNAs in the cell. This led to identification of yet other, less conserved small RNAs. This work combined with studies by others have defined more than 80 small RNAs in E. coli, and studies of these demonstrate that small RNAs are important and previously underappreciated components of many regulatory circuits.

A large number of the small RNAs that bind Hfq have now been studied. Small RNAs that bind Hfq act by pairing to target mRNAs to change mRNA stability and translation. In addition to defining targets, we need to define their upstream regulators so we can understand when they act. One of these, RyhB, is synthesized only when iron is limiting (it is repressed by the Fur repressor). RyhB redirects cellular metabolism to respond to the iron limitation by down-regulating synthesis of non-essential proteins that use Fe. A number of small RNAs were found to regulate major cell surface proteins, including proteins implicated in attachment of bacteria to surfaces. Another, controlled by a two-component system important for virulence, regulates modification of LPS. These small RNA effects on the cell surface may modulate interactions with hosts, affecting the immune response and/or pathogenesis. The lab has now developed rapid and flexible methods to scan all Hfq-binding small RNAs for their ability to regulate genes of interest.

While the bacterial small RNAs parallel microRNAs in their ability to affect the stability and translation of target mRNAs, the pathway for their function is rather different. We are studying how Hfq, a member of the Sm/Lsm family of proteins, acts to bring RNAs together, and what other functions are necessary for small RNA function.

Collaborators on this research include Gisela Storz, Sue Wickner, and Xinhua Ji, at NIH and Sarah Woodson, Johns Hopkins University


Long non-coding RNAs and their functions in plants

Plant lncRNAs participate in regulation of transcription, splicing, and nuclear structure.

In flowering, multiple lncRNAs act in silencing of FLC via diverse mechanisms.

Plant lncRNAs function in RNA-directed DNA methylation and epigenetic functions.

The RNA-processing machinery probably plays a key role in the regulation of lncRNAs.

Eukaryotic genomes encode thousands of long noncoding RNAs (lncRNAs), which play important roles in essential biological processes. Although lncRNAs function in the nuclear and cytoplasmic compartments, most of them occur in the nucleus, often in association with chromatin. Indeed, many lncRNAs have emerged as key regulators of gene expression and genome stability. Emerging evidence also suggests that lncRNAs may contribute to the organization of nuclear domains. This review briefly summarizes the major types of eukaryotic lncRNAs and provides examples of their mechanisms of action, with focus on plant lncRNAs, mainly in Arabidopsis thaliana, and describes current advances in our understanding of the mechanisms of lncRNA action and the roles of lncRNAs in RNA-dependent DNA methylation and in the regulation of flowering time.


Watch the video: Χρωμόσωμα (January 2022).