Information

Multiplex PCR, shorter amplicon inhibiting longer amplicon?


I want to run multiplex PCRs for my genotyping, with a primer pair targeting my construct and a primer pair targeting some housekeeping gene (sort of a built-in control).

I designed the control amplicon to be very short (I tested 3 primer pairs with ~120, 85, and ~50bp amplicons respectively). The main rationale behind this being that usually my amplicons are ~200-400bp, and I want something clearly distinguishable, but shorter rather than longer (I want to keep my options open for longer target amplicons).

In any case, regardless which control primer pair I use, whenever I try to multiplex I only end up seeing the shorter amplicon. See the gel picture below for an example (lanes: 1-ladder, 2,3-target+control, 4,5-control, 5,6- target).

I see that lanes 2,3 have a very faint additional smear compared to 4,5, but I want a band for my target amplicon. Also, my shorter bands are quite blurred.

Also, just for reference, my amplicons do not overlap at all.

So, I guess my questions are:

  • Why is the longer band disappearing?
  • Why is the shorter band so blurred?
  • What can I do to prevent this form happening?

It turns out that the $T_m$ was in fact the issue, or rather, that the computation algorithm overestimated the $T_m$ of the primer pair for the longer amplicon (or underestimated that of the shorter one). At any rate, it seems that at 60°C the shorter amplicon primers hijack the polymerization mechanism.

Setting the annealing phase temperature to 55°C solved my issue, and now I consistently get better results:


Multiplex PCR, shorter amplicon inhibiting longer amplicon? - Biology

Multiplex PCR is a widespread molecular biology technique for amplification of multiple targets in a single PCR experiment. In a multiplexing assay, more than one target sequence can be amplified by using multiple primer pairs in a reaction mixture. As an extension to the practical use of PCR, this technique has the potential to produce considerable savings in time and effort within the laboratory without compromising on the utility of the experiment.


Multiplex PCR coupled with direct amplicon sequencing for simultaneous detection of numerous waterborne pathogens

The current water quality monitoring and regulation approaches use fecal indicator bacteria (FIB) to indirectly assess health risks from fecal pathogens. Direct detection of waterborne pathogens is expected to provide more accurate and comprehensive risk assessment, which however has been hindered by the lack of methods for simultaneous detection of the numerous waterborne pathogens. This study aimed to develop a mPCR-NGS approach that uses the high sequencing depth of NGS and sequence-based detection to significantly increase the multiplex level of mPCR for direct pathogen detection in water. Individual PCR primers were designed for 16 target marker genes of nine different bacterial pathogens, and an optimal combination of primers with least primer complementarities was identified for the multiplex setting. Using an artificial tester sample, the mPCR system was optimized for annealing temperature and primer concentration, and bioinformatic procedures were developed to directly detect the target marker gene amplicons in NGS sequence reads, which showed simultaneous detection of 14 different target genes in one reaction. The effectiveness of the developed mPCR-NGS approach was subsequently demonstrated on DNA extracts from stream water samples and their counterparts that were spiked with various target pathogen DNA, and all target genes spiked into the environmental water samples were successfully detected. Several key issues for further improving the mPCR-NGS approach were also identified and discussed.

This is a preview of subscription content, access via your institution.


Methods

Sample

Participants were selected from a database of the Mental Health Research Center (MHRC) in Moscow. There were 83 schizophrenia patients from the MHRC or Moscow Psychiatric Hospital No. 1 and 71 healthy controls. All the participants provided written informed consent and donated blood samples for DNA extraction. Smoking was assessed through oral interviews, and the smoking status of patients was double-checked with their psychiatrists. Current smokers and never smokers, hereinafter referred to as smokers and non-smokers, respectively, participated in this study. The sample consisted of 50 smokers (mean age 28.0 ± 7.5 years, 40% women, 54% patients) and 104 non-smokers (mean age 26.0 ± 5.9 years, 54% women, 54% patients).

DNA extraction and bisulfite conversion

Genomic DNA was extracted with the DNeasy Blood and Tissue Kit (Qiagen, USA) according to the manufacturer’s instructions. The bisulfite-converted DNA samples were obtained with the EpiGentek Methylamp DNA Modification Kit (Epigentek Group Inc., USA) in agreement with the manufacturer’s protocol. We did support the original Yang et al. [26] conclusion that this particular kit worked better with the long bisulfite PCR compared to the Epitect Fast DNA Bisulfite Kit (Qiagen, USA).

Bisulfite primer design

Primers were designed with the primer3 software [33] to amplify approximately 1.3 Kbp PCR products of converted genome sequences. Primers were designed to be of 25–35 bp length, Tm = 60 °C and no CpGs allowed. The designed primer sequences are listed in the Additional file 1: Table S1. The summary information surrounding the amplicons is found in Table 1.

Bisulfite PCR

For the bisulfite PCR, we utilised 20 ng of the converted DNA, 1 μM of the “panhandle” 5′-phosphorylated primer “U1” GCAGTCGAACATGTAGCTGACTCAGGTCAC, 5 nM of each of the specific primer with the identical U1 sequence on the 5′ end and 200 nM dNTP, 1 mg/ml BSA, 2.5 U HotTaq polymerase with the corresponding buffer (Sileks, Russia) in a total volume of 12.5 μl. The choice of polymerase is important—the polymerase should be a simple hot-start polymerase that, unlike specialised high-fidelity polymerases, is not capable of overcoming the suppression effect. We have routinely verified the PCR kinetics with the 20× EVA Green DNA intercalating dye (Biotium Inc., USA), which apparently did not affect the reaction. The PCR programme was as follows: (1) initial denaturation, 94 °C, 10 min (2) 5 cycles of specific PCR (94 °C, 20 s 55 °C, 1 min 64 °C, 4 min) (3) 37 cycles of “panhandle” PCR (94 °C, 20 s 64 °C, 2 min) and (4) final incubation, 64 °C, 10 min.

Barcoding

For the creation of the Y-adapters, we employed 96 unique combinations of two sets of oligonucleotides: a first set of eight oligonucleotides CGAGTAGTGTTC-unique 5-letter sequence-CAAGGCACACAGGGGATAGG and a second set of 12 oligonucleotides 5′-CCATCTCATCCCTGCGTGTC-unique five-letter sequence-CTACACTACTCGT. A combination of two oligonucleotides from both sets could be used to create 96 unique Y-adapters. The oligonucleotides from the first set were 5′-phosphorylated. The sequences of oligonucleotides bearing molecular barcodes are found in Additional file 1: Table S2. Each Y-adapter was formed by pairing of 10 nM of a single oligonucleotide from each of these two sets in an annealing reaction within 25 μl of the annealing buffer AB (10 mM Tris-HCl (pH 8.0), 50mМ NaCl, 0.1 mM EDTA). The annealing reactions were set in the PCR thermal cycler with the following programme: incubation 98 °C, 1 min cooling down to 70 °C (1.6 °C/s) and cooling down to 10 °C (0.1 °C/s). The reactions were then diluted fivefold with AB, stored at − 20 °C and utilised as a stock solution. Immediately prior to ligation, these stocks were diluted 10-fold with 1× T4 ligase buffer with 5% PEG 4000. The ligation reactions were set in 10 μl: 2 μl diluted Y-adapters stock solution, 2 μl of PCR products and 6 μl of the ligation master mix (1.33× T4 ligase buffer with 6.67% PEG 4000, 1.2 w.u. T4 DNA ligase, Thermo, USA). The ligation reactions were performed at 20 °C for 2.5 h followed by incubation at 65 °C for 10 min. Then, the reactions were mixed in libraries (two libraries, up to 96 samples per library). The libraries (500 μl) were washed twice with 10 mM Tris-HCl (pH 8.0) and concentrated down to 50 μl by Amicon Ultra-0.5 30K Device columns (Merck, USA). Next, the libraries were washed twice to eliminate the primers, unligated Y-adapters, etc., with 0.7 volume of AMPure XP magnet beads (Beckman Coulter Inc., USA). The purified DNA solution was employed for amplification of the libraries with additional PCR. The PCR was performed with 250 nM primers, specific to the end of the Y-adapters: “emPCR_A” 5′-CCATCTCATCCCTGCGTGTC and “emPCR_B” 5′-CCTATCCCCTGTGTGCCTTG with the HiFi HotStart Uracil+ 2× master mix (Kapa Biosystems, Republic of South Africa). The PCR was performed with the following programme: initial denaturation 95 °C, 5 min 20 cycles: 98 °C, 20 s 60 °C, 15 s and 72 °C, 2 min. The PCR product was then length-selected through agarose electrophoresis and purified with the QIAquick Gel Extraction Kit (Qiagen, USA).

CCS library preparation and sequencing

The CCS library preparation (ligation of “SMRTBell” adapters with SMRTbell Template Prep Kit, Pacbio, USA) and sequencing was performed with Pacbio RSII (P6/C4 chemistry) in the facility of the Washington University Pacbio Sequencing Services. The final volume of raw data used in this paper is approximately equal to a single SMRT cell of the Pacbio RSII device.

Post-sequencing data preparation

Only reads with a quality score of no less than Q30 (average quality score was Q40) were utilised in the following analysis. After adapter trimming, we obtained 56,581 reads with correct adapters and primer sequences. The reads were demultiplexed with no errors in the barcode sequences allowed, discarding 11% of the reads. The median amount of reads per barcode was 202 (Q1:128, Q3:315). Adapter trimming and barcode demultiplexing were performed with the cutadapt programme [34]. The alignment of the filtered reads to the reference human genome (hg19) was obtained using the bismark programme with 88% mapping efficiency [35] (together with bowtie2 [36]). Filtration of under-converted DNA (threshold of unconverted CpH < 5%, H = A/C/T) and de-duplication were performed with the perl script (see Additional file 1: Supplementary Note 2 on de-duplication procedure). The final conversion rate was no less than 0.98 for each of the analysed targets. The number of reads for each target with the different stages of data preparation is presented in Additional file 1: Table S5.

ASM data

Each read in the data (files in SAM format) was sorted with perl script based on CIGAR string parsing by alleles of easily identifiable polymorphisms in each target. The list of used polymorphisms is presented in Additional file 1: Table S3. The rate of methylation of individual CpGs per haplotype for each sample was defined by the bismark software. Only samples with the minimum 5× read depth per haplotype were used, leading to discarding of the CACNA1D target owing to insufficient amounts of data. Missing values were mean imputed. Methylation signals in sites of known CpG-SNPs were not employed in the following analysis. The methylation rate for each of the CpGs was logit-transformed according to the equation: ( M=log left(^>/1-^> ight) ) ( ^>=left(mleft(n-1 ight)+0.5 ight)/n ) , where m is raw methylation rate, n is the sample size.

Statistical analysis

Three smoking status predictive models were tested on the prepared ASM dataset, hereinafter referred as “index”, “boruta” and “boruta.adjusted”. Age, gender and diagnosis were regressed out for subsequent analysis. The regression residuals were employed for subsequent analysis. For the “boruta.adjusted” model, the haplotype information was also used. The Boruta algorithm was used to determine important CpGs (“important” in the sense of the Boruta algorithm) inside each of the targets for the “boruta” and “boruta.adjusted” models. Original CpGs from smoking EWAS were selected for the “index” model (Table 1). The dataset was randomly split 1:1 into a train and test sets. The logistic model with selected CpGs was trained on the train set. The combined prediction logistic model was built on top of prediction values of individual target models. In the case of heterozygous samples, the prediction values were averaged. The performance of the combined models was evaluated on the test set. The analysis was conducted with the R statistical software programme with the “Boruta” package [37].


Results

Testing the multiplex PCR and Nanopore sequencing on RNA reference material

Amplicons were generated from the four DENV RNA samples using the multiplex PCR approach and sequenced on the Nanopore MinION. The sequencing run generated 8604–16,654 reads per sample passing the quality filters (Q-score ≥ 7) (Table 1).

Alignment of the reads to the appropriate RefSeq genome demonstrated that full coverage of the coding region was achieved for all four serotypes (Table 2). The resulting consensus sequences were on average 99.49% identical to the Illumina-generated consensus sequences used as references.

However, sequencing coverage depth was uneven, with only 87.56–96.51% of the DENV1–4 genomes covered by 20 or more reads. These regions of low coverage coincided with drops in consensus sequence accuracy (Fig. 1). Masking regions of the genome with less than 20X coverage depth improved the overall accuracy of the consensus sequence in 3/4 cases, increasing the average consensus identity to 99.78%. However, masking these regions also resulted in a loss of genome coverage ( ( overline ) = 9.01%) (Table 2).

Nanopore sequencing coverage for DENV control RNA samples, using the multiplex (400 bp amplicon) approach. Nanopore sequencing coverage depth for dengue virus control RNA samples, using the multiplex PCR approach is plotted in black against the left-hand y-axis, with the read depth threshold of 20X indicated by the dotted line. Coverage depth is capped at 1000X. The nucleotide similarity of the Nanopore-generated sequence to the Illumina-generated reference sequence is shown in red against the right-hand y-axis. The shaded areas identify regions where the coverage depth fell below the low coverage threshold of 20X, to allow comparison with the similarity plot

The Nanopore MinION was also used to sequence amplicons generated using the single-plex PCR approach. The resulting reads generated consensus sequences 99.93–99.99% identical to the Illumina references ( ( overline ) = 99.97%), containing an average of 4 mismatches (range = 1–7) (Table 3). These consensus sequences were on average 0.48% more accurate than their multiplex-generated counterparts.

Testing on Indonesian clinical samples

The multiplex PCR approach was next tested on a set of clinical samples from Indonesia (n = 10). These samples included representatives of each of the four DENV serotypes across a range of viral loads (Ct values 15.2–37.9). All of the samples produced PCR products of the expected size (

400 bp) and the resulting amplicons were sequenced on the Nanopore MinION, producing 33,908–82,891 reads ( ( overline=mathrm<57,948>Big) ) .

The average coding-region coverage at 1X read depth was 99.80% across the 10 samples (Table 4), and complete coverage was achieved for 8 of the 10. At 20X read depth, average coverage fell to 95.84%. Drops in coverage below 20X were more frequent in samples with higher Ct values (Fig. 2). Those with a Ct value of 25 or less (n = 3) generated an average of 100% coverage. The average coverage fell to 98.55% for samples with a Ct value between 25 and 30 (n = 4), and was further reduced to 88.06% for those with Ct values greater than 30 (n = 3).

Nanopore Sequencing Coverage of Indonesian Clinical Isolates using the Multiplex PCR Approach. The multiplex PCR approach was used to amplify DENV1–4 from 10 clinical samples from Indonesia. These samples were selected to cover a range of viral loads, as estimated by Ct values from the diagnostic qRT-PCR. The resulting amplicons were sequenced on the Nanopore MinION. Coverage depth for each sample is plotted, with the read depth threshold of 20X indicated by the dotted line. Coverage depth is capped at 1000X

The same 10 samples were also amplified using the single-plex PCR approach as a comparison (Table 4). On average, the approach only produced 2.6 of the 5 amplicons required to cover the DENV coding-region. As with the multiplex approach, samples with the lowest Ct values (< 20) were the most successful, producing 93% of amplicons (14/15), whereas those with Ct values greater than 25 produced only 34% (12/35).

Testing on non-Indonesian clinical samples

The multiplex method was next applied to four clinical samples from DENV-infected patients from The Philippines, in order to test how the method performed when working with viral strains from countries outside of Indonesia. All of the samples produced PCR products of the expected size (

400 bp) and the resulting amplicons were sequenced on the Nanopore MinION, producing 6852–12,972 reads ( ( overline=mathrm<8,048>Big) ) .

The average coding-region coverage at 1X read depth was 99.90% across the 4 samples (Table 5). At 20X read depth, average coverage fell to 88.40%. DENV-1 produced several particularly large drops in coverage depth compared to the other isolates (Fig. 3), resulting in only 79.22% of the coding-region being covered at 20X read depth. Consensus sequences were again generated and compared to Illumina-generated reference sequences by pairwise-nucleotide alignment (Table 5). Consensus sequences produced using all regions covered by 1 or more read were found to be 99.17–99.80% identical to the Illumina-generated sequence ( ( overline=99.45\%Big) ) . Masking regions with a read depth below 20X improved consensus sequence accuracies to 99.70–99.92% ( ( overline=99.80\%Big) ) , at the expense of coverage.

Nanopore Sequencing Coverage of Pilipino Clinical Isolates using the Multiplex PCR Approach. The multiplex PCR approach was used to amplify DENV1–4 from 4 clinical samples from The Philippines. The resulting amplicons were sequenced on the Nanopore MinION. Coverage depth for each sample is plotted, with the read depth threshold of 20X indicated by the dotted line. Coverage depth is capped at 250X

The single-plex PCR approach was again used to produce the amplified product for Illumina sequencing, however 3 of the 4 samples failed to generate one of the expected products (Table 5). Complete coverage of DENV-1 and 2 was therefore achieved by replacing the published primer sets with primers taken from the multiplex set. Following several attempts, only a truncated version of the most 3′ DENV-4 amplicon could be produced, and so the accuracy of the Nanopore-generated consensus could be assessed for 10,117 of the 10,163 coding bases only.

Assessment of the Nanopore-generated consensus sequences by phylogenetic analysis

Phylogenies were constructed using the Nanopore- and Illumina-generated coding region sequences and a set of reference sequences for each DENV serotype. Separate phylogenies were constructed using Nanopore consensus sequences masked below 20x coverage depth (Fig. 4) and 1x coverage depths (Fig. 5). The Nanopore consensus sequences generated from 20 x depth all formed monophyletic clusters with their Illumina-generated counterparts. The Nanopore-generated sequences for DENV-1-3 at 1x depth also formed monophyletic clades with their Illumina-generated counterparts, however the DENV4 sequence was separated from its Illumina counterpart by GQ868594, a sequence generated from the same viral isolate. Pairwise phylogenetic distance between the Nanopore- and Illumina-generated sequence tips averaged 0.001975 for those generated using regions of >20X coverage, and 0.005685 for those generated using 1X coverage (Table 6).

Phylogenetic analysis of Illumina- and Nanopore-generated consensus sequences. Bootstrap phylogenies of complete DENV coding regions were constructed using Nanopore and Illumina consensus sequences and a selection of genotype reference sequences. Nanopore consensus sequences were generated for all samples using the short amplicon approach, with regions below 20X coverage depth masked. Illumina consensus sequences were generated for the RNA standards and Pilipino samples using the long-amplicon approach. Sequence names are coloured to denote geographical origin, and internal nodes of the tree are coloured to demonstrate bootstrap values (blue = 100%, green = 90–99% and red = < 90%). Monophyletic clades formed by the Nanopore and Illumina-generated consensus sequences are highlighted in yellow for the RNA standard samples, and red for the Pilipino clinical samples

Phylogenetic analysis of Nanopore (1x) and Illumina-generated consensus sequences. Bootstrap phylogenies of complete DENV coding regions were constructed using Nanopore and Illumina consensus sequences and a selection of genotype reference sequences. Nanopore consensus sequences were generated for all samples using the short amplicon approach with only regions below 1X coverage depth masked. Illumina consensus sequences were generated for the RNA standards and Pilipino samples using the long-amplicon approach. Sequence names are coloured to denote geographical origin, and internal nodes of the tree are coloured to demonstrate bootstrap values (blue = 100%, green = 90–99% and red = < 90%). Clades formed by the Nanopore and Illumina-generated consensus sequences are highlighted in yellow for the reference samples, and red for Pilipino samples

Sequences from the Indonesian samples formed distinct clades containing the majority of reference sequences from Indonesia (green labels). The Indonesian clades for DENV-3 and DENV-4 were exclusively composed of Indonesian sequences, whilst the Indonesian DENV-1 clade also contained one sequence from the neighbouring country of Singapore (FJ469907). The Indonesian DENV-2 clade included sequences from several neighbouring South East Asian countries including Brunei (EU179859), Singapore (EU081177, EU081179, EU081180, KM279597) and the Philippines (110394). The Indonesian consensus sequences also clustered by region whenever multiple samples from the same region of Indonesia were included. DENV-1 and DENV-3 samples from Banjarmasin (BJM) in Central Indonesia were clustered, whilst samples from Batam (BTM) in the West, and Ambon (AMB) in the East, clustered separately.


Capture of assay template by multiplex PCR of long amplicons for genotyping SNPs and InDels with MALDI-TOF mass spectrometry

Mis-priming associated with uncharacterised single nucleotide polymorphisms (SNPs) may lead to failure of PCR for genotyping. This is particularly troublesome in high-throughput SNP genotyping applications relying on multiplex PCR (2–40-plex) generating many short amplicons (80–120 bp) of similar size, an approach best suited for whole genome scans. However, if the target SNPs are clustered within a few target genes one option to ameliorate this is to increase the amplicon length, effectively reducing the potential for primer/template interactions and mis-priming. We tested this approach in a diverse population of 372 Eucalyptus pilularis individuals (π = 8.11 × 10 −3 , H e = 0.75) using a modified Sequenom iPLEX gold assay. Four candidate genes (MYB1, MYB2, CAD and CCR) were amplified in a single long range multiplex capture PCR generating 6 long amplicons ranging in size from 907 to 2,225 bp. This contrasts with the standard approach which would have required the amplification of 98 short amplicons in 4 multiplex reactions. These 6 long amplicons provided the assay template for 98 assays (87 SNP and 11 InDel) within the 4 candidate genes. Reaction results indicated that longer amplicons could provide a suitable template for genotyping assays, with 90.8% of assays functional and 84.3% of assays suitable for downstream analysis. Additional advantages of this approach were the capacity for troubleshooting using gel electrophoresis and savings of 94% in capture primer synthesis costs. This approach will have the greatest relevance for candidate gene approaches for association testing in uncharacterised populations of organisms with high sequence diversity.

This is a preview of subscription content, access via your institution.


Introduction

Genome sequencing of viruses has been used to study the spread of disease in outbreaks 1 . Real-time genomic surveillance is important in managing viral outbreaks, as it can provide insights into how viruses transmit, spread and evolve 1,2,3,4 . Such work depends on rapid sequencing of viral material directly from clinical samples—i.e., without the need to isolate the virus in pure culture. During the Ebola virus epidemic of 2013–2016, prospective viral genome sequencing was able to provide critical information on virus evolution and help inform epidemiological investigations 3,4,5,6 . Sequencing directly from clinical samples is faster, less laborious and more amenable to near-patient work than time-consuming culture-based methods. Metagenomics, the process of sequencing the total nucleic acid content in a sample (typically cDNA or DNA), has been successfully applied to both virus discovery and diagnostics 7,8,9 . Metagenomic approaches have seen rapid adoption over the past decade, fueled by relentless improvements in the yield of high-throughput sequencing instruments 5,10,11,12 . Whole-genome sequencing of Ebola virus directly from clinical samples without amplification was possible because of the extremely high virus copy numbers found in acute cases 13,14,15 . However, direct metagenomic sequencing from clinical samples poses challenges with regard to sensitivity: genome coverage may be low or absent when attempting to sequence viruses that are present at low abundance in a sample with high levels of host nucleic acid background.

Development of the protocol

During recent work on the Zika virus epidemic 16 , we found that it was difficult to generate whole-genome sequences directly from clinical samples using metagenomic approaches (Table 1). These samples had cycle threshold (Ct) values between 33.9 and 35.9 (equivalent to 10–48 genome copies per microliter). Before sequencing, these samples were depleted of human rRNA and prepared for metagenomic sequencing on the Illumina MiSeq platform as previously described 2,17 . In these cases, sequences from Zika virus comprised <0.01% of the data set, resulting in incomplete coverage. Greater coverage and depth are critical for accurate genome reconstruction and subsequent phylogenetic inference. In addition, there are substantial sequencing, analysis and storage costs associated with generating large sequencing data sets therefore, metagenomic approaches currently do not lend themselves to the cost-effective use of lower-throughput portable sequencing devices such as the Oxford Nanopore MinION.

To generate complete viral genome coverage from clinical samples in an economic manner, target enrichment is often required 18 . Enrichment can be achieved directly through isolation in culture or the use of oligonucleotide bait probes targeting the virus of interest, or indirectly via host nucleic acid depletion. Amplification may also be required to generate sufficient material for sequencing (>5 ng for typical Illumina protocols and 100–1,000 ng for MinION). PCR can provide both target enrichment and amplification in a single step, and is relatively cheap, available and fast as compared with other methods. To generate coding-sequence complete coverage, a tiling amplicon scheme is commonly used 19,20,21 . During our work with Ebola virus, we were able to reliably recover >95% of the genome by sequencing 11 long amplicons (1–2.5 kb in length) on the MinION 5 .

The likelihood of long fragments being present in the sample, however, reduces with lower virus abundance. Therefore, we anticipated that, for viruses such as Zika that are present at low abundance in clinical samples, we would be more likely to amplify shorter fragments. As an extreme example of this approach, a recent approach termed 'jackhammering' was used to amplify degraded HIV-1 samples stored for >40 years this approach used 200–300 nt amplicons to help maximize sequence recovery 22 . Using shorter amplicons necessitates a larger number of products to generate a tiling path across a target genome. Doing this in individual reactions requires a large number of manual pipetting steps and therefore increases the potential for mistakes, with a heightened risk of cross-contamination, as well as a greater cost in time and consumables. To solve these problems, we designed a multiplex assay to carry out tens of reactions in an individual tube. This method has been subsequently used to perform Zika sequencing in order to understand the spread of Zika virus in the Americas 16,23,24,25,26 . Our resulting step-by-step protocol, described here, allows any researcher to successfully amplify and sequence viruses of low abundance directly from clinical samples. The method also has other potential uses that are not demonstrated here. One potential application is multilocus sequencing typing approaches, which could be carried out by amplifying conserved genes from bacteria, fungi and yeasts. Simultaneously, antibiotic-resistance-determining genes or key virulence genes could also be targeted in the same assay. The scheme could also be used to sequence chloroplast and mitochondrial genomes.

Comparison with other approaches

The three most common approaches for sequencing viruses are metagenomic sequencing, PCR amplicon sequencing and target enrichment sequencing, recently reviewed in detail by Houldcroft et al. 27 . The main benefits of the PCR-based approach described here are cost and sensitivity. In theory, both PCR and cell culture require only one viral copy, making them both exquisitely sensitive. In practice, however, the reaction conditions do not allow single-genome amplification, and, typically, multiple starting molecules are required. PCR also has limited sensitivity in cases in which the template sequence is divergent from the expected because of primer-binding kinetics. However, in an outbreak situation in which isolates are highly related, and low cost per sample and rapid turnaround time are required, PCR is particularly suitable. Sequencing amplicons on the Oxford Nanopore MinION is a popular method for determining viral genomes and has been used for diverse viruses, including Ebola, influenza and poxvirus, using either single primer pair reactions generating long amplicons (>1 kb) or multiple reactions that are pooled before sequencing 5,28,29,30 . However, these approaches are laborious to scale up when many small amplicons are required (because of low viral copy numbers), or when multiple samples are sequenced on a single sequencing run, as in this protocol.

The most similar alternative approach to the one described here is AmpliSeq (Life Technologies), which was previously used for Ebola sequencing on the Ion Torrent PGM 6 . However, this method is specific to the Ion Torrent platform, and primer schemes must be ordered directly from the manufacturer thus, it may consequently be more expensive per sample. Alternative software packages for designing primer schemes are available, some of which cater specifically to multiplex or tiling amplicon schemes 20,21,31,32 , and these may perform better when dealing with divergent genomes because of an increased emphasis on oligonucleotide degeneracy. Primers generated with such software may also be compatible with this protocol, although PCR conditions may require optimization, as the Primal Scheme software used in this protocol is designed with an emphasis on monitoring short-term evolution of known lineages, and primer conditions have been optimized for multiplex PCR amplification efficiency.

Propagation in cell culture is another method that has been widely used for virus enrichment 33,34,35 . This process is time-consuming, and requires specialist expertise and high containment laboratories for especially dangerous pathogens. There is also concern that viral passage can introduce mutations that are not present in the original clinical sample, potentially confounding analysis 36,37 .

Oligonucleotide bait probes have also shown promise as an alternative to metagenomics and amplicon sequencing 38,39,40,41,42 . These isolate viral nucleic acid sequences by hybridizing target-specific biotinylated probes to the DNA/RNA sample and then separating them using magnetic streptavidin-coated beads. Such methods, however, are limited by the efficiency of the capture step because of the kinetics of nucleic acid hybridization in complex samples such as those containing the human genome. The complete hybridization of all probes to targets can take hours (typical protocols suggest a 24-h incubation, although shorter times may be possible) and may never be achieved because of competitive binding by the host DNA. These methods suffer from a coverage bias, which worsens at lower viral abundances, resulting in increasingly incomplete genomes, as demonstrated by recent work on the Zika virus 43 . They work best on samples with higher viral abundances and may not have the sensitivity to generate near-complete genomes for the majority of isolates in an outbreak. Probes for hybridization capture are also more expensive than PCR primers because they are usually designed in a fully overlapping 75-nt scheme, which can run to hundreds of probes per virus and thousands for panels of viruses.

Direct sequencing of RNA has been recently demonstrated on the Oxford Nanopore MinION 44,45 . This method is attractive because it eliminates the need for reverse transcription, and so potentially may reduce biases resulting from nonrandom priming and copying errors introduced by reverse transcriptase. However, this method currently requires 500 ng of RNA as starting material and would suffer from the same sensitivity issues associated with cDNA metagenomics approaches when applied to samples containing very low viral copy numbers.

Limitations of tiling amplicon sequencing

Our method is not suitable for the discovery of new viruses or for sequencing highly diverse or recombinant viruses because primer schemes are virus-genome-specific. This protocol has not been validated for discovery of intra-host nucleotide variants, and we expect that minor allele frequencies will not be reliably recovered when amplifying from very small amounts of starting virus, as shown by Metsky et al. 25 . We expect that this method will work for larger virus genomes, but we have not tested this protocol with viral genomes longer than 12 kb. The protocol is designed for infections resulting from single clones, and may not perform well with mixed infections of diverse viruses. We have not tested performance of the method in chronic infections in which large amounts of diversity may have evolved within a patient (for example, viral quasispecies during HIV infection). Amplicon sequencing is prone to coverage dropouts that may result in incomplete genome coverage, especially at lower abundances, and the loss of both 5′ and 3′ regions that fall in regions not covered by primer pairs. Sequencing of complete 5′- and 3′-UTR regions may require alternative techniques such as RACE 46 . Targeted methods are also highly sensitive to amplicon contamination from previous experiments. Extreme caution should be taken to keep pre-PCR areas, reagents and equipment free of contaminating amplicons.

Experimental design

Description of the protocol. We describe a fully integrated end-to-end protocol for rapid sequencing of viral genomes directly from clinical samples. The protocol proceeds in four stages: (i) multiplex primer pool design, (ii) multiplex PCR, (iii) sequencing on MinION or Illumina instruments and (iv) bioinformatic analysis and quality control (QC) (Fig. 1).

Workflow for tiling amplicon sequencing on MinION/Illumina platforms, with associated Procedure step numbers indicated.

Primer design. We developed a web-based primer design tool called Primal Scheme (http://primal.zibraproject.org), which provides a complete pipeline for the development of efficient multiplex primer schemes. Each scheme is a set of oligonucleotide primer pairs that generate overlapping products, the size of which is determined by the target genome length, amplicon length and overlap required, as discussed below. For Zika, we use 35 primer pairs, amplifying products of ∼ 400-nt length with a 100-nt overlap for the ∼ 11-kb viral genome. Together, the amplicons generated by the pairs span the target genome or region of interest (Fig. 2).

(a) Submission box for online primer design tool. (b) Primer table of results. (c) Schematic showing expected amplicon products for each pool in genomic context for the ZikaAsian and ChikAsianECSA schemes.

As input, Primal Scheme requires a FASTA file containing one or more reference genomes. The user specifies a desired PCR amplicon length (default = 400 nt, suggested values between 200 and 2,000 nt) and the desired length of overlap between neighboring amplicons (default = 75 nt). Using a shorter amplicon length may be useful for samples in which longer products fail to amplify (e.g., when the virus nucleic acid is highly degraded). However, if amplicon lengths become too short (e.g., <300 nt), it may not be possible to find suitable primer pairs reducing the overlap parameter may help with this.

The Primal Scheme software performs the following processes:

Generation of candidate primers: The first sequence listed in the FASTA file should be the most representative genome, with further sequences spanning the expected interhost diversity. Primal Scheme uses the Primer3 software to generate candidate primer pairs (five, by default) 47 . It selects primers based on thermodynamic modeling, which takes into account length, annealing temperature, %GC, 3′ stability, estimated secondary structure and likelihood of primer–dimer formation, maximizing the chance of a successful PCR reaction. Primers are designed with a high annealing temperature within a narrow range (65–68 °C) that allows PCR to be performed as a 2-step protocol (95 °C denaturation, 65 °C combined annealing and extension) for highly specific amplification from clinical samples without the need for nested primers.

Testing of candidate primers: Subsequent reference genomes in the file are used to help choose primer pairs that maximize the likelihood of successful amplification of known virus diversity. A semi-global alignment score between each candidate primer and all supplied references is calculated to ensure that the most 'universal' candidate primers are picked for the scheme. Mismatches at the 3′ end are severely penalized, as they have a disproportionate effect on the likelihood of successful extension 48,49 . The alignment scores are summed, and the single best-scoring pair for each region is selected. If no candidates are returned by Primer3 for a region, most likely because all primers had insufficient annealing temperature, an error message prompting you to adjust the amplicon length or the overlap parameter will appear.

Output of primer pairs: Output files include a table of primer sequences to be ordered, a BED file of primer locations that can be used subsequently for primer trimming and a diagram of the primer scheme.

Choice of amplicon length. The choice of amplicon length when designing primer pools for sequencing is important. There is an inverse relationship between amplicon length and the number of primer pairs. It is believed that increasing the number of primer pairs reduces the likelihood of successful amplification of each region, owing to interaction between primers 18 . It is plausible that as the number of primer pairs increases, competitive inhibition may decrease PCR efficiency, although the high annealing temperature used in this protocol should reduce this risk. Longer amplicons are preferred, as they mean fewer primer pairs are needed per reaction. They also increase the amount of linkage information that can be recovered as haplotypes, which is of importance for investigation of within-host diversity. On the Illumina platform, 600 bases is the maximum size of amplicon that can be obtained using this protocol without an additional fragmentation step (using 600 cycle kits in paired-end mode—i.e., paired 300 nucleotides without any overlap), although read accuracy may degrade during the last 50 cycles. On the Oxford Nanopore MinION, there is no limit to the maximum amplicon length that can be sequenced the maximum length is effectively limited by the performance of the reverse transcription and PCR (practically to ∼ 5 kb). However, longer amplicons are less likely to amplify successfully when viral copy number is low or there is sample degradation (e.g., because of inadequate storage).

Optimization of primer schemes. The majority of primers are expected to work even when pooled in equimolar amounts, meaning largely complete genomes can be recovered without optimization. For example, the chikungunya virus data shown in Table 2 were generated without any optimization. However, to achieve coding-sequence-complete genomes, problem primers causing inefficient amplification of certain regions may need to be replaced or their concentrations adjusted relative to other primers in an iterative manner. Complete coverage of the genome covered by the scheme—i.e., all amplicons successfully amplified—should be achievable for the majority of samples using this protocol however, coverage is still expected to correlate with viral abundance (Table 3).

Multiplex PCR Protocol. Next, we developed a multiplex PCR protocol using novel reaction conditions: specifically low individual primer concentrations, high primer annealing temperatures (>65 °C) and long annealing times, which allows amplification of products covering the whole genome in two reactions (Fig. 3). In comparison with single-plex methods, this markedly reduces the cost of reagents and minimizes potential sources of laboratory error. We assign alternate target genome regions to one of two primer pools, so that neighboring amplicons do not overlap within the same pool (which would result in a short overlap product being generated preferentially). By screening reaction conditions based on the concentration of cleaned-up PCR products and specificity as determined by gel electrophoresis, we determined that lower primer concentrations and a longer annealing/extension time were optimal. Given the low cost of the assay, this step could also be performed alongside standard diagnostic quantitative PCR as a quality control measure to help reveal potential false positives 50 .

(a) Schematic showing the regions amplified in pools 1 (upper track) and 2 (lower track), and the intended overlap between pools (as determined in Step 1). (b) Products generated by PCR in Step 9 from pools 1 (left tube) and 2 (right tube) for the hypothetical scheme shown in a. (c) In Step 12A(ii), the input amount is normalized based on the number of samples and the scheme length pool 1 and 2 products can be pooled at this stage (shown) or kept separate if you wish to barcode them individually. In Step 12A(iv), products for each sample are then barcoded by ligation of a unique barcode. In Step 12A(vi), all barcoded products are pooled together before sequencing adaptor ligation, yielding a sequenceable library.

Sequencing protocol optimizations. Optimized library preparation methods for both the MinION and Illumina MiSeq platforms are provided and should be readily adaptable to other sequencing platforms, if required. The MinION system is preferred when portability and ease of setup in harsh environments are important 5 . The Illumina platform is more suited to sequencing very large number of samples, because of greater sequence yields, and the ability to barcode and accurately demultiplex hundreds of samples. Both platforms use ligation-based methods to add the required sequencing adaptors and barcodes.

For the MinION, we used the native barcoding kit (Oxford Nanopore Technologies) to allow up to 12 samples to be sequenced per flow cell. As the manufacturer's protocol is designed for 6–8 kb of fragmented genomic DNA, we have adjusted the input mass to achieve an equivalent number of moles of DNA ends this improves the efficiency of barcode/adaptor ligation and improves run yields. In the development of the protocol, we used R9 or R9.4 flow cells (FLO-MIN105/FLO-MIN106) and the 2D barcoded library preparation kit (EXP-NBD002/SQK-LSK208). The protocol is also compatible with the current 1D barcoded library preparation kit (EXP-NBD103/SQK-LSK108). Because of the regular revisions of the kits, we have avoided including any specific component names or volumes be sure to follow the appropriate protocol for your chosen kit version. Depending on the number of reads required, the number of samples multiplexed and the performance of the flow cell, sequencing on the MinION can take from a few minutes up to 72 h. Typically, 2–4 h of sequencing is sufficient for 12 samples. For the MiSeq platform, we used the Agilent SureSelect xt2 adaptors and the KAPA Hyper library preparation kit, allowing up to 96 samples to be sequenced per MiSeq run. Other library prep kits (e.g., Illumina TruSeq) and dual-indexed adaptors could also be used on the MiSeq. For the MiSeq, we recommend using the 2 × 250-nt read-length for 400-nt amplicons, which takes 48 h to complete.

Bioinformatics workflow MinION pipeline. We developed bioinformatic pipelines consisting of primer trimming, alignment, variant calling and consensus generation for both the Oxford Nanopore and Illumina platforms. The MinION pipeline was developed by building upon tools previously developed for Ebola virus sequencing in Guinea and is freely available with components developed under the permissive MIT open source license at https://github.com/zibraproject/zika-pipeline. The pipeline runs under the Linux operating system and is available as a Docker image, which means that it can also be run on Mac and Windows operating systems. The MinION version of the pipeline can process the data from basecalled reads to consensus sequences on the instrument laptop, given the correct primer scheme (a BED file).

FAST5 reads containing raw nanopore signal data may be basecalled in real time using MinKNOW (accessible via the MinION Community Portal for registered users at http://community.nanoporetech.com) or off-line using Albacore. Albacore is a recurrent neural network (RNN) basecaller developed by Oxford Nanopore Technologies and also made available through the MinION Community Portal. Reads are extracted into a FASTA file using the poretools fasta command. This FASTA file may be demultiplexed by a script, demultiplex.py, into separate FASTA files for each barcode, as specified in a config file. By default, these are set to the barcodes NB01–12 from the native barcoding kit. Alternatively, the Metrichor online service (https://www.metrichor.com) and versions of Albacore 1.0.1 or later may be used to basecall read files and demultiplex samples. Each file is then mapped to the reference genome using bwa mem using the -x ont2d flag and converted to BAM format using samtools view . Alignments are preprocessed using a script ( align_trim.py ) that performs primer trimming and coverage normalization. Primer trimming is performed by reference to the expected coordinates of sequenced amplicons, and therefore requires no knowledge of the sequencing adaptor (Fig. 3). Signal-level events are aligned and variants are called using nanopolish variants . Low-quality or low-coverage variants are filtered out and consensus sequences are generated using a script, margin_cons.py . Variant calls and frequencies can be visualized using vcfextract.py and pdf_tree.py .

Bioinformatics workflow Illumina pipeline. First, we use Trimmomatic 51 to remove primer sequences (first 22 nt from the 5′ end of the reads) and bases at both ends with Phred quality scores <20. Reads are aligned to the genome of a Zika virus isolate from the Dominican Republic, 2016 (GenBank: KU853012), using Novoalign v3.04.04 (http://www.novocraft.com/support/download/). SAMtools is used to sort the aligned BAM files and to generate alignment statistics 52 . The code and reference indexes for the pipeline can be found at https://github.com/andersen-lab/zika-pipeline. Snakemake is used as the workflow management system 53 .

Alignment-based consensus generation. We have used an alignment-based consensus approach to generate genomes as opposed to de novo assembly. Although de novo assembly could in theory be used with this protocol, the use of a tiling amplicon scheme already assumes that the viral genome is present in a particular fixed order. This assumption may be violated in the presence of large-scale recombination. Some de novo assemblers, such as SPAdes, use a frequency-based error correction preprocessing stage, and this may result in primer sequences being artificially introduced into the reference if primer sequences are not removed in advance 54 . Importantly, when we compared alignment with de novo-based analysis methods for our generated Zika virus genomes, we found that we always obtained the same consensus sequences.

Preparing sequencing controls. We recommend that positive sample controls be included in each sequencing run. To check that the protocol is generating the expected results, we recommend choosing a positive sample with an established, trusted reference sequence. For the Zika virus, we used the previously sequenced World Health Organization reference strain PF13/251013-18 (GenBank accession: KX369547), which can be obtained on request from the Paul-Ehrlich-Institut 55,56 . Sample archives such as the National Collection of Pathogenic Viruses in the United Kingdom can provide high-quality reference materials for other viruses. Positive controls should have viral copy numbers similar to those of the clinical samples on the same run. This may require the positive control to be heavily diluted until the Ct values are comparable. Negative sequencing controls should be processed in a manner as similar as possible to that used for clinical samples and should not be simply water controls for example, if samples are collected by swabs, then the same type of unused swab should be subjected to RNA extraction and PCR. Additional negative water controls may be added at each step (e.g., reverse transcription, PCR and library preparation) to detect the sources of contaminants. Even if amplification is not detected (e.g., by gel electrophoresis) or DNA quantity is low or undetectable by fluorimetry, a sequencing library should still be prepared as normal using the total available amount, as contamination may still be detectable by sequencing.

Contamination. Cross-contamination is a serious potential problem when working with amplicon sequencing. Contamination risk is minimized by maintaining physical separation between pre- and post-PCR areas, and performing regular decontamination of work surfaces and equipment—e.g., by UV exposure or with 1% (vol/vol) sodium hypochlorite solution. Contamination becomes harder to mitigate with decreasing viral copy numbers. Processing high-viral-count samples can lead to overamplification during PCR (e.g., generation of unnecessarily high numbers of amplicons), which can increase the risk of amplicon contamination in subsequently processed samples with low viral counts. Such 'between-sample amplification' can occur during sequencing library preparation, or may result from barcode misidentification or 'barcode hopping' (incorporation of incorrect barcode sequences during sequence library preparation) during sequencing. When determining how many PCR cycles to use, begin with a lower number and increase gradually to minimize this contamination risk.

The best safeguard for helping to detect contamination is the use of negative controls. These controls should be sequenced even if no DNA is detected by quantification or no visible band is present on a gel. Negative control samples should be analyzed through the same software pipeline as is used for the other samples, and you should assume that any contaminating amplicons in the negative control will also be present in your other samples. The relative number of reads as compared with positive samples gives a simple guide to the extent of contamination, and inspection of coverage plots can help identify any specific region involved.


Affiliations

Life Science Research and Foundation, QIAGEN Sciences, Inc., Frederick, Maryland, USA

Quan Peng, Ravi Vijaya Satya, Marcus Lewis, Pranay Randad & Yexun Wang

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

Corresponding author


Multiplex PCR, shorter amplicon inhibiting longer amplicon? - Biology

The 64-bit Mac version should work on most modern Macs (OS X 10.7 or newer).

FreeBSD users may simply type pkg install pooler

For all other systems (GNU/Linux, older Mac, Solaris. ) please compile from source (below).

Example primers file

Source code

  • a C compiler (GCC or Clang) and basic Unix tools,
  • MingW compiler(s) if you want to cross-compile for Windows.

Usage

  • If you opt to use Score when your primers and/or tags are very long, you will be asked if you are really sure you don't want to use deltaG instead.
  • If you opt for deltaG, the following questions will be asked: Temperature: Enter a number (decimal fractions are allowed). You can enter it in Celsius, Kelvin, Fahrenheit or Rankine. Do not enter the suffix C or K or F or R---Primer Pooler will determine for itself which unit was meant, and ask you to confirm. (Recent versions of Primer Pooler offer 5 additional obscure temperature scales if you decline all of the more probable ones.) Magnesium concentration in mM (0 for no correction): Enter your concentration of magnesium in nanomoles per cubic metre (decimal fractions are allowed). Enter 0 if you don't mind the deltaG figures not being corrected for magnesium concentration. Monovalent cation (e.g. sodium) concentration in mM: Enter your concentration of sodium etc in nanomoles per cubic metre (decimal fractions are allowed). If in doubt, try 50. dNTP concentration in mM (0 for no correction): Enter your concentration of deoxynucleotide (dNTP) in nanomoles per cubic metre (decimal fractions are allowed). Enter 0 if you don't mind the deltaG figures not being corrected for dNTP concentration.
  • If you answered yes to this question, the summary will be displayed on screen, and you will be asked if you also want to save it to a file. If you answer yes to this, you will be asked for a filename.
  • These up-front counts will include self-interactions (a primer interacting with itself), and interactions between the pair of primers in any given set. Self-interactions and in-set interactions are not counted when summarizing the counts of each pool (below).
  1. Go to http:// hgdownload. cse. ucsc. edu/ downloads. html
  2. Choose a species (e.g. Human)
  3. Choose "Genome sequence files"
  4. If you're under hg38, choose "Standard genome sequence files"
  5. Scroll down to the links, and choose the one that ends .2bit (e.g. hg38.2bit)
  • After the overlap scan is complete, Primer Pooler will then have enough data to write an input file for MultiPLX if you wish to run that software as well for comparison. If you decline this, it will ask if you want it to write a simple text file with the locations of all amplicons, which you may accept or decline.
  • If you do not opt to check for overlaps in the genome, then Primer Pooler will not take overlaps into account when generating its pools. This is rarely useful unless you have already ensured there are no overlaps in the set of amplicons under consideration. Even then, I would recommend performing a scan anyway, just to double-check: an early version found 11 overlaps in a supposedly overlap-free batch drawn up by an experienced academic---we all make mistakes. But bypassing the overlap check might be useful if you are sure there are no overlaps and you don't want to download a very large genome file to the workstation you're using.

You will not be allowed to set the maximum size of each pool lower than the average size of each pool, since that would make it logically impossible to fit all primer-sets into all pools. It is not advisable to set it just above the average either, since being overly strict about the evenness of the pools could hinder Primer Pooler from finding a solution with lower dimer formation. You might want to experiment with different maxima---you will be able to come back to this question and try again. Do you want to give me a time limit? (y/n): If you answer y, you will be asked to set a time limit in minutes. Normally 1 or 2 is enough, although you may wish to let it run a long time to see if it can find better solutions. You don't have to set a time limit: you may manually interrupt the pooling process at any time and have it give the best solution it has found so far, whether a time limit is in place or not. Additionally, Primer Pooler will stop automatically when it detects better solutions are unlikely to be found. Do you want my "random" choices to be 100% reproducible for demonstrations? (y/n): If you answer y, Primer Pooler's random choices will be generated in a way that merely look random but are in fact completely reproducible. This is useful for demonstration purposes---you'll know how long it will take to find the solution you want. Otherwise, the random choices will be less predictable, as a different sequence will be chosen depending on the exact time at which the pooling was started. Pooling display While pooling is in progress, Primer Pooler will periodically display a brief summary of the best solution found so far, showing the pool sizes, and the counts of interactions (by deltaG range or score) within each pool. As instructed on screen, you may press Ctrl-C (i.e. hold down Ctrl while pressing and releasing C, then release Ctrl) to cancel further exploration and use the best solution found so far. Do you want to see the statistics of each pool? (y/n): After the pooling is complete, or after you have interrupted it (by pressing Ctrl-C as instructed on screen), you will be asked if you wish to see the interaction counts of each pool (rather than a simple summary of all pools as appeared during pooling). If you want this, you will also be asked if you wish to save them to a file, and, if so, what file name. Do you want to see the highest bonds of these pools? (y/n): If you answer Yes, you will be asked for a deltaG or score threshold, and all interactions worse than that threshold will be displayed on-screen with bonds diagrams such as:and you will then be asked if you wish to save it to a file, and, if so, what file name. You will then be asked if you would like to try another threshold. Shall I write each pool to a different result file? (y/n): If you answer y to this, you will be asked for a prefix, which will be used to name the individual results files. Otherwise, you will be asked if you wish to save all results to a single file. If you decline saving all results to a single file, the results will not be saved at all---this is for when you weren't happy with the solution and want to go back to try a different number of pools or a different maximum pool size. Do you want to try a different number of pools? (y/n): This question is self-explanatory. You can go back as many times as you like, trying different numbers of pools. But many researchers have a pretty good idea of how many pools they want to use, or else are happy with the computer's initial suggestion. Would you like another go? (y/n): If you answered No to trying a different number of pools, or if you didn't want the program to do pooling at all, then you will be asked if you want to start the program again. Answering No to this question will exit.

Command-line usage

The only mandatory argument (if not running interactively) is a filename for the primers file. This should be a text file in multiple-sequence FASTA format, such as:(this example does not represent real primers). Degenerate bases are allowed using the normal letters, and both upper and lower case is allowed. Names of amplicons' primers should end with F or R, and otherwise match. Optionally include tags (tails, barcoding) to apply to all primers: >tagF and >tagR (tags can also be changed part-way through the file).

Processing options should be placed before this filename. Options are as follows: --help or /help or /? Show a brief help message and exit. --counts Show score or deltaG-range pair counts for the whole input. deltaG will be used if the --dg option is set (see below). This option produces a fast summary of how many primer pairs (in the entire collection, before pooling) have what range of interaction strengths. This could be used for example to check a pool that you have already chosen manually, or if you want a rough idea of the worst-case scenario that pooling aims to avoid. --self-omit Causes the --counts option to avoid counting self-interactions(a primer interacting with itself), and interactions between the pair of primers in any given set. --print-bonds=THRESHOLD Similar to --counts , this can be useful for checking a manual selection or for a rough idea. All interactions worse than the given threshold (deltaG if --dg is in use, otherwise score) will be written to standard output, with bonds diagrams. --dg[= temperature[, mg[, cation[, dNTP]]]] Set this option to use deltaG instead of score. Optional parameters are the temperature (default is human blood heat), the concentration of magnesium (default 0), the concentration of monovalent cation (e.g. sodium, default 50), and the concentration of deoxynucleotide (dNTP, default 0). Decimal fractions are allowed in all of these. Temperature is specified in kelvin, and all concentrations are specified in nanomoles per cubic metre. --suggest-pools Outputs a suggested number of pools. This is the approximate lowest number of pools needed to achieve no worse than a deltaG of -7 (or a score of 7) in each. --pools[= NUM[, MINS[, PREFIX]]] Splits the primers into pools. Optional parameters are the number of pools (if omitted or set to ? then the suggested number will be calculated and used), a time limit in minutes, and a prefix for the filenames of each pool (set this to - to write all to standard output). --max-count=NUM Set the maximum number of pairs per pool. This is optional but can make the pools more even. A maximum lower than the average is not allowed, and it's usually best to allow a generous margin above the average. --genome=PATH Check the amplicons for overlaps in the genome, and avoid these overlaps during pooling. The genome file may be in .2bit format as supplied by UCSC, or in .fa (FASTA) format. --scan-variants When searching for amplicons in a genome file, scan variant sequences in that file too, i.e. sequences with _ and - in their names. By default such sequences are omitted as they're not normally needed if using hg38. --amp-max=LENGTH Sets maximum amplicon length for the overlap check. The default is 220. --multiplx=FILE Write a MultiPLX input file after the --genome stage, to assist comparisons with MultiPLX's pooling etc. --seedless Don't seed the random number generator --version Just show the program version number and exit.

Changes

Defects fixed

  1. an error in incremental-update logic sometimes had the effect of generating suboptimal solutions (in particular, pools could be unnecessarily empty, and/or full beyond any limit that was set)
  2. an error in the user-interface loop meant that if you use tags, run interactively, and answer "yes" to the question "Do you want to try a different number of pools", the second run will have been done without the tags, and its results will have been de-tagged twice, removing some bases from the output moreover, the resulting truncated versions of your primers will have made it into the interaction calculations for any third run.

Versions prior to 1.17 also had a display bug: the concentrations for the deltaG calculation are in millimoles per litre, not nanomoles as stated on-screen in interactive mode (please ignore the on-screen instruction and enter millimoles, or upgrade to the latest version which fixes that instruction).

Versions prior to 1.34 would round down any decimal fraction you type when in interactive mode (for deltaG temperature, concentration and threshold settings). Internal calculation and command-line use was not affected by this bug.

Versions prior to 1.37 did not ignore whitespace characters after FASTA labels and the label) -->.

Notable additions

Version 1.2 added the MultiPLX output option, and Version 1.33 fixed a bug when MultiPLX output was used with tags and multiple chromosomes. Version 1.3 added genome reading from FASTA (not just 2bit), auto-open browser, and suggest number of pools.

Version 1.36 clarified the use of Taq probes, and allowed these to be in the input file during the overlap check. It's consequently stricter about the requirement that reverse primers must end with R or B : previous versions would accept any letter other than F for these.

Version 1.4 allows tags to be changed part-way through a FASTA file. For example, if there are two >tagF sequences, the first >tagF will set the tags for all F primers between the beginning of the file and the point at which the second >tagF is given the second >tagF will set the tags for all F primers from that point forward. You can change tags as often as you like.

Version 1.5 allows primer sets to be "fixed" to predetermined pools by specifying these as primer name prefixes , e.g. [email protected]:myPrimer-F fixes myPrimer-F to pool 2.

Version 1.6 detects and warns about alternative products of non-unique PCR. It was followed within hours by Version 1.61 which fixed a regression in the amplicon overlap check.

Version 1.7 makes the ignoring of variant sequences in the genome optional, and warns if primers not being found might be due to variant sequences having been ignored.


Amplification-Based Methods

Marina N. Nikiforova , . Yuri E. Nikiforov , in Clinical Genomics , 2015

Primer Design for Multiplex PCR

Multiplex PCR is a commonly used approach for amplification-based target enrichment. There are several strong advantages of targeted amplification-based sequencing as compared with whole genome and exome sequencing, or targeted sequencing by a hybrid capture approach. It requires a small amount of DNA (10–200 ng) as the starting template, can be performed on specimens with a suboptimal DNA quality, it is time- and cost-effective, and provides high depth of sequencing and straightforward data analysis.

PCR assays are a mainstay of molecular pathology and represent the most convenient and cost-effective method for target selection and amplification using specimens with limited DNA and low abundance targets. However, critical performance issues arise with pooling (multiplexing) of progressively larger numbers of PCR primers and reactions. Specifically, (i) amplification artifacts are introduced due to polymerase editing mistakes during annealed oligomer extension, and (ii) thermal damage to genomic targets takes place during high temperature cycling resulting in modification of the native nucleic acid sequence [7] . In addition, reaction biases emerge associated with primer–dimer formation, substrate competition, and sequence-dependent differences in PCR efficiency [8] . The maximum achievable pooling using conventional PCR is estimated to be 10 targets [9] , however, for next-gen sequencing approaches a significantly larger number of primers are necessary in multiplex reaction in order to achieve sequencing of large genomic regions. Therefore, one of the main factors that are crucial for successful amplification-based target enrichment is primer design for multiplex PCR.

PCR amplification includes repetitive cycles of DNA denaturation, primer annealing, and sequence extension. The oligonucleotide primers are designed to be complementary to a known genomic sequence of interest. When designing amplification primers for multiplex PCR, several factors must be considered including length of primers (18–25 nucleotides), melting temperature (Tm) of the primers that should be either identical or within 1–2°C, appropriate GC content (50–55%), and lack of primer cross-complementarity. In addition, regions with repetitive sequences, known germ line single nucleotide polymorphisms (SNPs), and regions with high homology should be avoided because they may affect efficiency of PCR amplification and create amplification bias.

The most common type of amplification bias arises from unequal amplification of alleles due to sequence variation in the primer binding site [10] . Therefore, designed primers should be checked against SNP databases (dbSNP at www.ncbi.nlm.nih.gov/SNP ) or the 1000 genomes project ( www.1000genomes.org ) to assure that primer binding sides do not contain highly variable SNPs. If binding site sequence variation is impossible to avoid, primers should be modified to include several possible nucleotide variations in the primer design. In addition, primers also need to be checked against sequence databases ( http://blast.ncbi.nlm.nih.gov/ ) for evaluation of the primer specificity to the region of interest. This will avoid amplification of pseudogenes and other regions with high sequence homology that may result in erroneous sequence alignment and generation of false positive calls [11,12] . There are a number of software programs available for assisting with primer design (e.g., Primer3: http://frodo.wi.mit.edu/cgibin/primer3/primer3_www.cgi and PrimerBLAST: http://www.ncbi.nlm.nih.gov/tools/primer-blast ).

Highly multiplexed PCR permits amplification of thousands of short genomic sequences in a single tube and does not require a large amount of DNA. Depending on a platform, as low as 5–10 ng of DNA is sufficient for producing a high complexity library. Therefore, this approach has been successfully used in samples when only limited amount of DNA is available (i.e., from small tumor biopsies or FNA samples). However, it is necessary to understand that a very small tissue sample and correspondingly low amount of DNA (picograms) may misrepresent the cell composition in the specimen and affect library complexity by producing biased amplification of one cell population versus another (e.g., nonneoplastic vs. neoplastic cells). In addition, low DNA input can produce bias toward propagation of incorporated errors during early cycles of the PCR, mostly because no excess of DNA is available to compete with the erroneous sequence. Replication errors can be reduced through the use of polymerases with 3′–5′ exonucleolytic proofreading and mismatch repair capabilities, but at the cost of slower extension rates and lower thermostability. For example, Pfu polymerase (from Pyrococcus furiosus) exhibits <2% of the errors of Taq polymerase (from Thermus aquaticus) but has a much lower elongation rate (

20 nt/s vs. 80 nt/s, respectively, at 72°C) increasing exposure time for thermal damage [7] . Thermal modifications associated with PCR are characteristically reflected in depurination (A or G), deamination (C>U), and oxidation of G to 8-oxoG. Users should be aware of the potential for overrepresentation of these PCR-specific artifacts which can be miscalled as genetic variants. At a minimum, failure to control for these errors during amplicon sequencing results in overestimation of sample diversity while reducing sensitivity for detection of true genetic variants [13] .

Another advantage of multiplex PCR is in amplification of relatively short genomic regions (80–150 base pairs) that allows for a successful sequencing of DNA and RNA of suboptimal quality such as from FFPE tissue samples. However, sequencing of large consecutive genomic regions by multiplex PCR can create a cross-reaction between primer pairs due to primer overlap and, therefore, may require separation of closely located primers into several multiplex pools (and consideration of whether a capture-based method is more well suited to the analysis).

Similarly to other amplification-based methods, targeted amplification-based MPS requires incorporation of strict measures to avoid sample contamination with amplification products. Laboratories should implement physical separation of preamplification area for specimen processing and nucleic acid extraction and postamplification areas, develop a unidirectional workflow process, and assure decontamination of work surfaces.


Watch the video: Multiplex PCR (December 2021).