First draft genome assembly of the desert locust, <i>Schistocerca gregaria</i>.

1. Laboratory of Molecular Developmental Physiology and Signal Transduction, KU Leuven, Leuven, 3000, Belgium.
Authors
Verlinden H¹
Verdonck R¹
Holtof M¹
Vanden Broeck J¹
(4 authors)
2. Laboratory of Bioinformatics and Evolutionary Genomics, Ghent University, Ghent, 9000, Belgium.
Authors
Sterck L²
Li J²
Li Z²
Van de Peer Y^{2,

3}
Van de Peer Y^{2,

3}
(5 authors)
3. Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria, 0002, South Africa.
Authors
Yssel A³
Van de Peer Y^{2,

3}
Van de Peer Y^{2,

3}
(3 authors)
4. Laboratory of Pharmaceutical Biotechnology, Ghent University, Ghent, 9000, Belgium.
Authors
Gansemans Y⁴
Deforce D⁴
Van Nieuwerburgh F⁴
(3 authors)
5. Department of Entomology, Texas A&M University, College Station, Texas, TX 77843-2475, USA.
Authors
Song H⁵
Behmer ST⁵
Sword GA⁵
(3 authors)

Show all (6)

ORCIDs linked to this article

Show all (14)

F1000research, 27 Jul 2020, 9:775
https://doi.org/10.12688/f1000research.25148.2 PMID: 33163158 PMCID: PMC7607483

This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.

Free full text in Europe PMC

This article is based on a previously available preprint.

Abstract

Background: At the time of publication, the most devastating desert locust crisis in decades is affecting East Africa, the Arabian Peninsula and South-West Asia. The situation is extremely alarming in East Africa, where Kenya, Ethiopia and Somalia face an unprecedented threat to food security and livelihoods. Most of the time, however, locusts do not occur in swarms, but live as relatively harmless solitary insects. The phenotypically distinct solitarious and gregarious locust phases differ markedly in many aspects of behaviour, physiology and morphology, making them an excellent model to study how environmental factors shape behaviour and development. A better understanding of the extreme phenotypic plasticity in desert locusts will offer new, more environmentally sustainable ways of fighting devastating swarms. Methods: High molecular weight DNA derived from two adult males was used for Mate Pair and Paired End Illumina sequencing and PacBio sequencing. A reliable reference genome of Schistocerca gregaria was assembled using the ABySS pipeline, scaffolding was improved using LINKS. Results: In total, 1,316 Gb Illumina reads and 112 Gb PacBio reads were produced and assembled. The resulting draft genome consists of 8,817,834,205 bp organised in 955,015 scaffolds with an N50 of 157,705 bp, making the desert locust genome the largest insect genome sequenced and assembled to date. In total, 18,815 protein-encoding genes are predicted in the desert locust genome, of which 13,646 (72.53%) obtained at least one functional assignment based on similarity to known proteins. Conclusions: The desert locust genome data will contribute greatly to studies of phenotypic plasticity, physiology, neurobiology, molecular ecology, evolutionary genetics and comparative genomics, and will promote the desert locust's use as a model system. The data will also facilitate the development of novel, more sustainable strategies for preventing or combating swarms of these infamous insects.

Free full text

Version 2. F1000Res. 2020; 9: 775.

Published online 2021 May 21. https://doi.org/10.12688/f1000research.25148.2

PMCID: PMC7607483

Other versions

PMID: 33163158

First draft genome assembly of the desert locust, Schistocerca gregaria

Heleen Verlinden, Conceptualization, Funding Acquisition, Investigation, Methodology, Project Administration, Resources, Supervision, Visualization, Writing – Original Draft Preparation,¹ Lieven Sterck, Conceptualization, Data Curation, Formal Analysis, Methodology, Software, Supervision, Visualization, Writing – Original Draft Preparation,^2,³ Jia Li, Data Curation, Formal Analysis, Methodology, Software, Visualization, Writing – Original Draft Preparation,^2,³ Zhen Li, Data Curation, Formal Analysis, Methodology, Software, Visualization, Writing – Original Draft Preparation,^2,³ Anna Yssel, Data Curation, Formal Analysis, Writing – Review & Editing,⁴ Yannick Gansemans, Data Curation, Formal Analysis, Methodology, Software, Writing – Original Draft Preparation,^5,⁶ Rik Verdonck, Data Curation, Formal Analysis, Investigation, Visualization, Writing – Review & Editing,^1,⁷ Michiel Holtof, Investigation, Writing – Review & Editing,¹ Hojun Song, Conceptualization, Funding Acquisition, Writing – Review & Editing,⁸ Spencer T. Behmer, Conceptualization, Funding Acquisition, Writing – Review & Editing,⁸ Gregory A. Sword, Conceptualization, Funding Acquisition, Writing – Review & Editing,⁸ Tom Matheson, Conceptualization, Funding Acquisition, Writing – Review & Editing,⁹ Swidbert R. Ott, Conceptualization, Funding Acquisition, Writing – Review & Editing,⁹ Dieter Deforce, Resources, Writing – Review & Editing,^5,⁶ Filip Van Nieuwerburgh, Conceptualization, Resources, Supervision, Writing – Review & Editing,^5,⁶ Yves Van de Peer, Conceptualization, Funding Acquisition, Project Administration, Resources, Supervision, Writing – Review & Editing,^a,^2,^3,⁴ and Jozef Vanden Broeck, Conceptualization, Funding Acquisition, Project Administration, Resources, Supervision, Writing – Review & Editing^b,¹

Heleen Verlinden

¹Laboratory of Molecular Developmental Physiology and Signal Transduction, KU Leuven, Leuven, 3000, Belgium

Find articles by Heleen Verlinden

Lieven Sterck

²Laboratory of Bioinformatics and Evolutionary Genomics, Ghent University, Ghent, 9000, Belgium

³Center for Plant Systems Biology, Ghent University - VIB, Ghent, 9052, Belgium

Find articles by Lieven Sterck

Jia Li

²Laboratory of Bioinformatics and Evolutionary Genomics, Ghent University, Ghent, 9000, Belgium

³Center for Plant Systems Biology, Ghent University - VIB, Ghent, 9052, Belgium

Find articles by Jia Li

Zhen Li

²Laboratory of Bioinformatics and Evolutionary Genomics, Ghent University, Ghent, 9000, Belgium

³Center for Plant Systems Biology, Ghent University - VIB, Ghent, 9052, Belgium

Find articles by Zhen Li

Anna Yssel

⁴Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria, 0002, South Africa

Find articles by Anna Yssel

Yannick Gansemans

⁵Laboratory of Pharmaceutical Biotechnology, Ghent University, Ghent, 9000, Belgium

⁶NXTGNT, Ghent University, Ghent, 9000, Belgium

Find articles by Yannick Gansemans

Rik Verdonck

¹Laboratory of Molecular Developmental Physiology and Signal Transduction, KU Leuven, Leuven, 3000, Belgium

⁷Station d' Ecologie Théorique et Expérimentale, UMR 5321 CNRS et Université Paul Sabatier, Moulis, 09200, France

Find articles by Rik Verdonck

Michiel Holtof

¹Laboratory of Molecular Developmental Physiology and Signal Transduction, KU Leuven, Leuven, 3000, Belgium

Find articles by Michiel Holtof

Hojun Song

⁸Department of Entomology, Texas A&M University, College Station, Texas, TX 77843-2475, USA

Find articles by Hojun Song

Spencer T. Behmer

⁸Department of Entomology, Texas A&M University, College Station, Texas, TX 77843-2475, USA

Find articles by Spencer T. Behmer

Gregory A. Sword

⁸Department of Entomology, Texas A&M University, College Station, Texas, TX 77843-2475, USA

Find articles by Gregory A. Sword

Tom Matheson

⁹Department of Neuroscience, Psychology and Behaviour, University of Leicester, Leicester, LE1 7RH, UK

Find articles by Tom Matheson

Swidbert R. Ott

⁹Department of Neuroscience, Psychology and Behaviour, University of Leicester, Leicester, LE1 7RH, UK

Find articles by Swidbert R. Ott

Dieter Deforce

⁵Laboratory of Pharmaceutical Biotechnology, Ghent University, Ghent, 9000, Belgium

⁶NXTGNT, Ghent University, Ghent, 9000, Belgium

Find articles by Dieter Deforce

Filip Van Nieuwerburgh

⁵Laboratory of Pharmaceutical Biotechnology, Ghent University, Ghent, 9000, Belgium

⁶NXTGNT, Ghent University, Ghent, 9000, Belgium

Find articles by Filip Van Nieuwerburgh

Yves Van de Peer

²Laboratory of Bioinformatics and Evolutionary Genomics, Ghent University, Ghent, 9000, Belgium

³Center for Plant Systems Biology, Ghent University - VIB, Ghent, 9052, Belgium

⁴Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria, 0002, South Africa

Find articles by Yves Van de Peer

Jozef Vanden Broeck

¹Laboratory of Molecular Developmental Physiology and Signal Transduction, KU Leuven, Leuven, 3000, Belgium

Find articles by Jozef Vanden Broeck

Author information Article notes Copyright and License information Disclaimer

This article has been cited by other articles in PMC.

Associated Data

Data Availability Statement

Underlying data

European Nucleotide Archive: First draft genome of Schistocerca gregaria, a swarm forming grasshopper species. Accession number PRJEB38779; https://identifiers.org/ena.embl:PRJEB38779.

This accession contains all genome and transcriptome data. The annotations are also available via the ORCAE platform ( https://bioinformatics.psb.ugent.be/orcae/overview/Schgr).

Extended data

Figshare: First draft genome assembly of the desert locust, Schistocerca gregaria - extended data. https://doi.org/10.6084/m9.figshare.12654026.v2 ( Verlinden et al., 2020).

This project contains the following extended data:

Supplementary Methods (DOCX). Containing details of Animal material, Genomic DNA extraction, Library construction, sequencing for RNA-Seq and de novo transcriptome assembly.
Supplementary Table S1 (DOCX). Available Polyneopteran genomes (incl. Schistocerca gregaria for comparison).
Supplementary Table S2 (DOCX). Software parameter settings.
Supplementary Table S3 (DOCX). Transfer RNA (tRNA), microRNA (miRNA), small nuclear RNA (snRNA) and ribosomal RNA (rRNA) content of the desert locust genome.
Supplementary Table S4 (DOCX). Desert locust genome annotation details.
Supplementary Table S5 (DOCX). BUSCO assessments for the genomes of the desert locust, Schistocerca gregaria, and the migratory locust, Locusta migratoria ( Wang et al., 2014).
Supplementary Table S6 (DOCX). Functional annotation of the proteome of the desert locust.

Extended data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

Version Changes

Revised. Amendments from Version 1

Based on feedback from the reviewers and other readers we made some minor changes to the manuscript to clarify certain things. Additional references were added to the figure legend of Figure 1. We also specified that we do not show the range of the non-swarming sub-species S. gregaria flaviventris and changed the description of the gregarious male to being brightly coloured. We added some additional references to why we expected a large non-coding part and repetitive regions in the desert locust genome. As suggested by Reviewer 3 we added the results of the BUSCO assessment of the Trinity assembly. Moreover more information on precautions against and verification of no contamination was included in the supplemental methods.

Peer Review Summary

Review date	Reviewer name(s)	Version reviewed	Review status
2021 Jun 14	Surya Saha	Version 2 Version 2	Approved
2021 May 24	Uwe Homberg	Version 2 Version 2	Approved
2020 Nov 2	Joshua B. Benoit	Version 1	Approved
2020 Oct 27	Surya Saha	Version 1	Approved with Reservations
2020 Oct 5	Uwe Homberg	Version 1	Approved

Abstract

Methods: High molecular weight DNA derived from two adult males was used for Mate Pair and Paired End Illumina sequencing and PacBio sequencing. A reliable reference genome of Schistocerca gregaria was assembled using the ABySS pipeline, scaffolding was improved using LINKS.

Results: In total, 1,316 Gb Illumina reads and 112 Gb PacBio reads were produced and assembled. The resulting draft genome consists of 8,817,834,205 bp organised in 955,015 scaffolds with an N50 of 157,705 bp, making the desert locust genome the largest insect genome sequenced and assembled to date. In total, 18,815 protein-encoding genes are predicted in the desert locust genome, of which 13,646 (72.53%) obtained at least one functional assignment based on similarity to known proteins.

Conclusions: The desert locust genome data will contribute greatly to studies of phenotypic plasticity, physiology, neurobiology, molecular ecology, evolutionary genetics and comparative genomics, and will promote the desert locust’s use as a model system. The data will also facilitate the development of novel, more sustainable strategies for preventing or combating swarms of these infamous insects.

Keywords: Eco-devo, large genome size, locust plague, Orthoptera, pest insect, phenotypic plasticity, polyphenism, swarm

Introduction

Locust plagues have been recorded since Pharaonic times in ancient Egypt. In the Bible ( Exodus 10), locust swarms are described as one of the major destructive plagues and still today they form a serious threat to crops and food security of over 60 countries across more than 20% of the world’s total land surface ( Figure 1a). Swarms can cover areas up to several hundred square kilometres and migrate up to 200 km per day. Per square kilometre, a swarm that contains about 40 million locusts can eat the same amount of food in one day as about 35,000 people. The damage done by a locust plague is on the same level as a major drought (FAO Locust Watch; De Vreyer et al., 2012). The long-term socio-economic impact of these swarms is significant. The loss of harvest is disastrous for local farmers and leads to towering local food prices, also affecting non-farming families. The poorest households are often hit the hardest. Malnourishment of children and expecting mothers endangers their long-term health and growth. School enrolment rate fell by a quarter during plagues in 1987–89 in Mali, with girls being particularly affected ( Courcoux, 2012). Human activities in turn affect the propensity of locusts to swarm through factors such as land use (e.g. agriculture, wood extraction, urbanization), political relations between affected countries and the effects of climate change (FAO Locust Watch, http://www.fao.org/ag/locusts/en/info/info/index.html; Cullen et al., 2017; Meynard et al., 2020).

An external file that holds a picture, illustration, etc.
Object name is f1000research-9-56735-g0000.jpg

Figure 1.

Geographical distribution of the desert locust and a picture of two adult male desert locusts, one in the solitarious phase and the other in the gregarious phase.

( a) Geographic distribution of the desert locust. During ‘recession’ periods, desert locusts are restricted to the semi-arid and arid regions of Africa, the Arabian Peninsula and South-West Asia that receive less than 200 mm of annual rain. The recession area covers about 16 million km ² in 30 countries. Within this recession area, locusts move seasonally between winter/spring and summer breeding areas. During outbreaks, desert locusts may spill into more fertile adjacent regions, threatening an area of some 29 million km ² comprising 60 countries as outbreaks escalate into upsurges and further into plagues. The recession breeding areas and migration patterns may have predictive value to understand how the swarms will migrate Range of the non-swarming southern sub-species S. gregaria flaviventris not shown. Figure based on information from FAO Locust Watch ( Cressman, 2016; Symmons & Cressman, 2001), map derived from Google Map Data ©2020 Google. ( b) Phase polyphenism in desert locusts, using the example of sexually mature males. The gregarious male (right) is brightly coloured, while the solitarious male relies on camouflage colours. In this staged scene, the solitarious male was forced into close proximity of the gregarious male and is seen retreating from its conspecific. Photo by H. Verlinden and R. Verdonck.

Desert locusts ( Schistocerca gregaria Forskål) are grasshoppers (Orthoptera: Acrididae) that exhibit ‘phase polyphenism’, an extreme form of phenotypic plasticity that evolved as an adaptation to the drastic changes that can occur in their environment. Locusts can develop into two extremely divergent, population density-dependent phenotypes, which are tailored to very different ecological requirements. Under low population densities, locusts appear in the solitarious phase and live a solitary life in which they avoid each other. In periods with abundant rainfall, rapid vegetation growth creates a favourable habitat that permits large increases in local population sizes. However, when food becomes scarce again, solitarious locusts are forced to aggregate on the remaining plants. This crowding causes the transformation into the swarming gregarious phase , beginning with rapid changes in behaviour that include a switch to increased locomotion and mutual attraction. The prolonged crowding drives slower but equally profound changes in colouration, morphology ( Figure 1b) and physiology. Compounded across multiple generations, locust populations can aggregate further into huge, ruinous swarms capable of crossing continents and oceans in search of food. Populations may crash in the absence of sufficient resources or following human intervention, leading once more to scattered low density solitarious phase populations. The transition between locust phases is thus reversible and occurs gradually through the expression of intermediate phenotypic states ( Cullen et al., 2017; Pener & Simpson, 2009; Symmons & Cressman, 2001; Verlinden et al., 2009).

Orthoptera (grasshoppers, crickets and allies) belong to the Polyneoptera, a clade that represents one of the major lineages of winged insects (Pterygota) and comprises around 40,000 known species and ten orders of hemimetabolous insects ( Misof et al., 2014; Wipfler et al., 2019). Other major neopteran (Pterygota that can flex their wings over their abdomen) lineages are Acercaria (mostly sucking insects such as lice or true bugs) and Holometabola (insects with complete metamorphosis). At present, only 25 sequenced polyneopteran genomes are reported on NCBI and i5k ( http://i5k.github.io/arthropod_genomes_at_ncbi), unequally distributed over five different orders ( Extended data, Supplementary Table S1 ( Verlinden et al., 2020)). When including S. gregaria, the genomes of five orthopteran species, representing five different subfamilies, are now available. In addition to representing a paradigmatic example of phenotypic plasticity, the desert locust is an important research model for generating advances in a wide variety of fundamental and applied scientific areas, including biomechanics, ecology, pest control, neurobiology and physiology. For instance, the relatively large body size of locusts has been instrumental in discovery of a multitude of insect neuropeptides ( Schoofs et al., 1997). Moreover, the globally increasing interest in the use of insects as food or feed also applies to the desert locust, which is a highly nutrient-rich, edible insect that is gaining much attention as a potential, climate-friendly food source ( van Huis et al., 2013).

The devastating socio-economic impact of locust swarms, together with the opportunity this species offers to investigate the phenotypic interface of molecular processes and environmental cues highlight the importance of sequencing the desert locust genome. However, the extremely large estimated genome size of 8.55 Gb ( Camacho et al., 2015; Fox, 1970; John & Hewitt, 1966; Wilmore & Brown, 1975) predicted a formidable challenge. Moreover, previous transcriptomics and chromosome size data from the desert locust ( Badisco et al., 2011; Camacho et al., 2015), as well as comparisons with the genome of the distantly related migratory locust, Locusta migratoria (6.5 Gb; Wang et al., 2014), suggested that the non-coding part and repetitive regions of the desert locust genome might be greatly expanded as compared to other insect genomes, presenting additional challenges to sequencing and assembly ( Dominguez Del Angel et al., 2018; Tørresen et al., 2019). Our team has overcome these hurdles and presents here the ~8.8 Gb genome of the desert locust assembled from short Illumina Mate Pair (MP) and Paired End (PE) reads and long PacBio reads. This new genomic resource, the largest insect genome yet sequenced and assembled, will complement decades of research on this species, enhancing the desert locust’s role as an important comparative model system. The genome will permit exciting new opportunities to examine mechanisms of phenotypic plasticity, social behaviour, physiological and morphological specialization. Moreover, it will open up new avenues to find better ways of fighting the notorious swarms they can cause. The desert locust genome will also enable better understanding of genome size evolution and the early phylogeny of winged insects.

Methods

Sequencing strategy

A hybrid sequencing approach was adopted consisting of both Illumina short read sequencing to get sufficient coverage for accurate contig assembly, and complementary PacBio long read sequencing to allow efficient scaffolding of the contig assembly. The Illumina and first PacBio sequencing were performed on high-molecular-weight DNA derived from the central nervous system (central brain, optic lobes, ventral nerve cord), fat body and testes of one adult male inbred for seven generations. A second round of PacBio sequencing used DNA from another male from the same lineage, with two additional generations of inbreeding (for details on the animal material and genomic DNA extraction, see Extended data, Supplementary Methods ( Verlinden et al., 2020)).

Illumina sequencing

The concentration of the S. gregaria high molecular weight DNA sample was measured with PicoGreen (Invitrogen) fluorimetry, after which DNA integrity was confirmed by gel electrophoresis (1% E-Gel; Invitrogen). The sample was divided for Illumina MP and PE sequencing library preparation.

The MP sequencing library was prepared from 1 µg of the sample with a “Nextera Mate Pair Library prep kit” (Illumina). The PE library was prepared with a “NEBNext Ultra II library prep kit” (NEB) from 2 µg of the sample, sheared to 500 bp fragments using an S2 focused-ultrasonicator (Covaris). Size selection (600–700 bp) was performed for both libraries in a 2% E-Gel (Invitrogen). The quality of the libraries was confirmed with a Bioanalyzer High Sensitivity DNA Kit (Agilent). The MP and PE libraries were quantified by qPCR, according to Illumina's “Sequencing Library qPCR Quantification protocol guide” (version February 2011) and pooled at a molar ratio of 25% MP – 75% PE for sequencing on Hiseq3000 (2 × 150 cycles, 16 lanes; Illumina).

PacBio sequencing

The library preparation for PacBio sequencing was performed with a "SMRTbell Template Prep Kit 1.0" according to the PacBio protocol (version 100-286-000). For each of the two libraries, 10 µg of the S. gregaria high-molecular-weight DNA was used as input in two parallel 50-µl reactions.

For library size selection, a "0.75% Dye-Free Agarose Gel Cassette” (ref: BLF7510) was used on a Blue Pippin (Sage Science) with the "0.75% DF Marker S1 high-pass 15–20kb" protocol for a lower cut-off of 12 kb. Fragment size distribution was determined with a “DNA 12000 kit” (ref: 5067-1508) for the first library and a “Fragment Analyzer (Agilent) - High Sensitivity Large Fragment 50 kb kit” (ref: DNF-464-0500) for the second library. The resulting libraries had an average length of 16.5 and 22 kb, respectively.

No extension time was used for the sequencing as recommended for size selected libraries in the “Quick Reference Card 101-461-600 version 07”. The first run was performed on a PacBio RSII System (V4.0 chemistry, polymerase P6). Fifteen additional runs were performed on a PacBio Sequel system with 2.0 Chemistry, polymerase and SMRTCells. The same conditions were used to sequence 20 more SMRTCells with the second library on the PacBio Sequel system.

Genome assembly

PE short read data were pre-processed with bbduk v38.20 from the BBTools package to remove adapters and low-quality reads. Illumina MP read data were cleaned and separated into true MP data and likely MP data in nxTrim ( O’Connell et al., 2015). The long-read PacBio data were pre-processed using CANU v1.7 ( Koren et al., 2017) to obtain trimmed and corrected reads. Cleaned short-read PE and MP data were then assembled using the ABySS v2.1.1 pipeline ( Simpson et al., 2009) up to scaffold stage, using a k-mer value of 120. Parameters for ABySS were optimized away from default values to achieve better performance (for all parameter settings see Extended data, Supplementary Table S2 ( Verlinden et al., 2020)). The assembly was further improved by using the PacBio data as input for LINKS ( Warren et al., 2015).

Annotation of repetitive elements and noncoding RNAs

Two strategies were used to identify and annotate repetitive elements. First, de novo annotation was carried out by RepeatModeler v2.0 and LTR_FINDER v1.0.7 ( Xu & Wang, 2007) to build a custom repeat library. Second, a homology-based approach was used to search for repetitive elements in the assembled genome using the repetitive element library of RepeatMasker v4.1.0 and RepeatProteinMask v4.1.0. The results of both strategies were combined into a non-redundant set of repetitive elements. Subsequently, the library was used to mask repetitive elements by employing RepeatMasker v4.1.0 ( Tarailo-Graovac & Chen, 2009).

Transfer RNAs (tRNAs) were predicted by tRNAscan-SE v1.31 ( Lowe & Eddy, 1997) with default parameters. To predict non-coding RNAs (ncRNAs), such as microRNAs (miRNAs), small nuclear RNAs (snRNAs), and ribosomal RNAs (rRNAs), the desert locust genome was screened against the RNA families (Rfam) v14.1 database ( Griffiths-Jones et al., 2003) by the cmscan program of Infernal v1.1.2 ( Nawrocki & Eddy, 2013). To supplement our predictions of miRNAs, miRNA sequences from the L. migratoria genome ( Wang et al., 2015) were extracted and searched in the S. gregaria genome by BLASTN with options “-task blastn-short -ungapped -penalty -1 -reward 1” ( Camacho et al., 2008). The alignment result was filtered using a mismatch cutoff of 3 bp. Specifically, the stem-loop structure of each potential miRNA was predicted by miRNAFold ( Tav et al., 2016) using each alignment with 110 bp upstream and downstream sequences. Then the RNAfold program of ViennaRNA v2.4.14 ( Lorenz et al., 2011) was used to calculate the minimum free energy (MFE) of each stem-loop structure. If a potential miRNA had several predicted stem-loop structures, the one with the minimum MFE was selected as representative. Putative miRNAs located within protein coding sequences or repetitive elements were discarded. Finally, the results based on Rfam and the migratory locust genome were combined into a non-redundant prediction of miRNAs.

Gene prediction and functional annotation

Protein-coding genes in the desert locust genome were predicted using three approaches. (1) RNA-Seq reads (see Extended data, Supplementary Methods ( Verlinden et al., 2020)) were mapped to the desert locust genome using HISAT2 v2.1.0 ( Kim et al., 2015) with parameter “--max-intronlen” set to 1,000,000 to increase the maximum allowed intron length during read mapping. Then, StringTie v2.1.1 ( Pertea et al., 2015) was used to assemble potential transcripts based on RNA-Seq alignments to the desert locust genome. Subsequently, TransDecoder v5.0.2 was used to identify open reading frames (ORFs) within the assembled transcripts which resulted in 20,201 ORFs with start and/or stop codons. We also built de novo assembled transcripts based on the pooled RNA-Seq reads of all samples with Trinity v2.8.4 ( Grabherr et al., 2011; Haas et al., 2013) and obtained 285,499 transcripts (including isoforms), of which 57,870 putative protein-coding transcripts and 305 rRNA candidates were identified by Trinotate v3.1.1 ( Bryant et al., 2017). This was complemented with 34,974 ESTs of the desert locust from NCBI ( Badisco et al., 2011). The assembled transcripts and ESTs were then aligned to the desert locust genome with Program to Assemble Spliced Alignments (PASA v2.4.1) ( Haas et al., 2003). (2) For ab initio gene prediction, we used a hard-masked genome in which genomic repetitive elements were substituted by ‘N’. To build a training set for the ab initio gene predictors, we extracted 498 complete genes with both start and stop codons from the 500 longest ORFs predicted by TransDecoder, based on the above RNA-Seq analysis with HISAT2 and StringTie. Augustus v3.3.3 ( Stanke et al., 2006) SNAP v2006-07-28 ( Korf, 2004) and GlimmerHMM v3.0.4 ( Majoros et al., 2004) were trained on this training set and then used to predict potential gene models. Furthermore, combined with the RNA-Seq alignments, BRAKER2 v2.1.5 ( Hoff et al., 2019) was used to predict protein-coding genes based on the above-mentioned training model of Augustus. (3) The proteomes of the migratory locust, Locusta migratoria ( Wang et al., 2014); the African malaria mosquito, Anopheles gambiae; the domestic silk moth, Bombyx mori; the fruit fly, Drosophila melanogaster; the kissing bug, Rhodnius prolixus; the red imported fire ant, Solenopsis invicta; the red flour beetle, Tribolium castaneum; and the Nevada dampwood termite, Zootermopsis nevadensis from Ensembl Metazoa (release-47), as well as the proteins in UniRef100 (release-2020_01) for the clade Polyneoptera (Taxonomy ID: 33341) were used to assist gene predictions with homologous proteins. Exonerate v2.4.0 ( Slater & Birney, 2005) was used to perform spliced alignments of the proteins with the maximum intron length set to 1 Mb. To integrate the predictions from all three gene-prediction approaches, EvidenceModeler v1.1.1 ( Haas et al., 2008) was used to produce a non-redundant gene set. Functional annotation of the predicted protein-coding genes was done by running BlastP ( Altschul et al., 1990) using an e-value cut-off of 1×10 ^-5 against the public protein databases Uniprot/SwissProt ( Magrane, 2011; The UniProt Consortium, 2019) and NCBI NR (RefSeq non-redundant protein record). Protein family (Pfam) domain information and Gene Ontology (GO) terms were added using InterProscan ( Mitchell et al., 2019).

Results and discussion

Genome size and assembly

Initial input data for the assembly comprised (i) 1,316 Gb of Illumina short read data, of which 1,009 Gb remained after cleaning and trimming, and (ii) 112 Gb of long reads from PacBio sequencing. The resulting assembly, using the ABySS pipeline, consisted of 8.5 Gb in ~1.6 M contigs with an N50 of 12,027 bp. Scaffolding with the MP data using ABySS resulted in 8.6 Gb in 1.2 M scaffolds with an N50 of 66,194 bp. The PacBio data as input for LINKS further improved the scaffolded assembly derived from ABySS, doubling the N50 and maximum length and reducing the number of sequences by half. The final assembly consists of 8,817,834,205 bp organised in 955,015 scaffolds with an N50 of 157,705 bp ( Table 1).

Table 1.

Results of the assembly for the desert locust genome.

	Total	Total size (bp)	N50 (bp)	N90 (bp)	Largest (bp)	Mean length (bp)
Contigs	1,648,200	8,561,922,307	12,027	5,375	202,979	5,194.71
Scaffolds (MP)	1,233,802	8,632,364,377	66,194	15,575	1,561,787	8,350.11
Scaffolds (PacBio)	955,015	8,817,834,205	157,705	29,453	3,339,430	9,233.20

Scaffolds (MP), Scaffolds reached with the Mate Pair data using the ABySS pipeline; Scaffolds (PacBio), improved scaffolds with the PacBio data as input for LINKS; N50, the sequence length of the shortest contig/scaffold at 50% of the total genome length; N90, the sequence length of the shortest contig/scaffold at 90% of the total genome length

Repetitive elements and noncoding RNAs

In total, repetitive elements account for 62.55% of the desert locust genome ( Table 2), which is more than the 58.86% repetitive elements in the published migratory locust genome ( Wang et al., 2014). Screening the desert locust genome against the Rfam v14.1 database identified 121,581 tRNAs, 1,302 rRNAs, 121 miRNAs, and 361 snRNAs ( Extended data, Supplementary Table S3 ( Verlinden et al., 2020)).

Table 2.

Repetitive elements in the genomes of the desert locust, Schistocerca gregaria, and the migratory locust, Locusta migratoria ( Wang et al., 2014).

	Schistocerca gregaria		Locusta migratoria
Repeat Types	Length (bp)	P%	Length (bp)	P%
DNA	2,390,333,660	27.1	1,480,538,225	22.69
LINE	2,438,094,307	27.6	1,332,720,207	20.42
SINE	28,032,199	0.32	141,176,698	2.16
LTR	637,406,118	7.23	508,675,263	7.80
Other	165	0.00	32,017	0.00
Unknown	871,233,596	9.88	406,097,360	6.22
Total	5,515,243,572	62.55	3,840,808,141	58.86

DNA, DNA transposons; LINE, long interspersed nuclear element retrotransposon; SINE, short interspersed nuclear element retrotransposon; LTR, long terminal repeat retrotransposon; Other, repeats classified to other than the above mentioned types; Unknown, repeats that cannot be classified; P%, percentage of the genome.

In addition to the 121 evolutionary conserved miRNAs identified from Rfam, blasting with miRNAs previously identified in the migratory locust (from small RNA sequencing-based and homology-based approaches; Wang et al., 2015) identified a further 686 miRNAs in the desert locust genome, resulting in a total of 807 identified miRNAs ( Extended data, Supplementary Table S3 ( Verlinden et al., 2020)). Of these 807 miRNAs, 676 are located on short scaffolds without any protein-coding gene. Among the 121 miRNAs identified based on Rfam, 81 have no homologs in the migratory locust genome.

Protein-coding genes

In total, 18,815 protein-encoding genes are predicted in the desert locust genome ( Extended data, Supplementary Table S4 ( Verlinden et al., 2020)). The average pre-mRNA length is 54,426 bp, with an average coding sequence (CDS) length of 1,137 bp and an average intron length of 12,522 bp, values which are comparable to those of the published migratory locust genome ( Wang et al., 2014). Although both locust genomes have longer pre-mRNAs with bigger introns and more exons than the Drosophila melanogaster genome ( Adams et al., 2000), their average CDS and exon length are in fact shorter ( Figure 2 and Table 3). The BUSCO assessment of the current gene set (protein mode) shows that it includes 79.4% complete genes in the insecta_odb10 dataset ( Simão et al., 2015), which closely matches the result from the BUSCO genome completeness assessment (genome mode) of 80.9% ( Extended data, Supplementary Table S5 ( Verlinden et al., 2020)). Comparing the BUSCO assessment of the Trinity assembly (91.2% completeness) with that of the current gene set of the genome indicates that the present genome assembly is still missing genes that are present in the transcriptomes. The BUSCO assessment of the predicted genes in the desert locust genome shows fewer complete genes than for the published Locusta migratoria and Drosophila melanogaster genomes ( Figure 2). Among the 18,815 predicted genes in the desert locust genome, 13,646 (72.53%) obtained at least one functional assignment based on similarity to known proteins in the databases. Pfam domain information could be added to 10,395 (55.25%) predicted genes, and 6,470 (34.39%) predicted genes could be assigned a GO term ( Extended data, Supplementary Table S6 ( Verlinden et al., 2020)).

An external file that holds a picture, illustration, etc.
Object name is f1000research-9-56735-g0001.jpg

Figure 2.

Gene characteristics and BUSCO assessment in the genomes of the desert locust, Schistocerca gregaria, the migratory locust, Locusta migratoria ( Wang et al., 2014) and the fruit fly, Drosophila melanogaster ( Adams et al., 2000).

( a– e) Boxplots of ( a) pre-mRNA lengths; ( b) intron lengths; ( c) exon numbers; ( d) coding sequence (CDS) lengths; and ( e) exon lengths in the three genomes. ( f) BUSCO assessments of the gene sets in the three genomes. The stacked bars indicate the percentages of genes that are complete (light blue), duplicated (dark blue), fragmental (yellow) and missed (red).

Table 3.

Summary statistics on gene information for the desert locust, Schistocerca gregaria, and the migratory locust, Locusta migratoria ( Wang et al., 2014).

	Schistocerca gregaria	Locusta migratoria
Genome
Size (bp)	8,817,834,205	6,524,990,357
Scaffold N50 (bp)	157,705	322,700
GC content	0.406	0.407
Gene
Total gene number	18,815	17,307
Average pre-mRNA Length (bp)	54,426	54,341
Average CDS length (bp)	1,137	1,160
Average intron length (bp)	12,522	11,159
Average exon length (bp)	216	201
Average exon number per gene	5.26	5.77

Scaffold N50, the sequence length of the shortest scaffold at 50% of the total genome length; CDS, coding sequence.

Conclusions

Here, we present the first draft genome sequence of the desert locust, Schistocerca gregaria, a swarming pest species with significant socio-economic and ecological impact. With the current locust crisis in mind, it should be clear that despite ongoing monitoring and control operations, we are still in urgent need of more locust research to foster development of effective management strategies. Sequencing and assembling the desert locust genome has been both challenging and ground-breaking due to the enormous size of the genome and its extremely large proportion of repetitive elements. The desert locust genome is the largest insect genome sequenced and assembled to date. As is the case for the second and third largest assembled insect genomes, the expanded genome size is caused by accumulation of repetitive regions and intron elongation ( Locusta migratoria, 6.5 Gb; Wang et al., 2014; Clitarchus hookeri, 4.2 Gb; Wu et al., 2017). Sequencing the desert locust genome is an important step to advance our knowledge of these animals. It will enable future studies to examine the very complex relationship between environmental cues and phenotypic plasticity, and in particular the question of how this is regulated at the molecular level. A better understanding of the desert locust’s molecular biology will facilitate the development of novel, more sustainable strategies for controlling these pests.

Data availability

Underlying data

European Nucleotide Archive: First draft genome of Schistocerca gregaria, a swarm forming grasshopper species. Accession number PRJEB38779; https://identifiers.org/ena.embl:PRJEB38779.

This accession contains all genome and transcriptome data. The annotations are also available via the ORCAE platform ( https://bioinformatics.psb.ugent.be/orcae/overview/Schgr).

Extended data

Figshare: First draft genome assembly of the desert locust, Schistocerca gregaria - extended data. https://doi.org/10.6084/m9.figshare.12654026.v2 ( Verlinden et al., 2020).

This project contains the following extended data:

Supplementary Methods (DOCX). Containing details of Animal material, Genomic DNA extraction, Library construction, sequencing for RNA-Seq and de novo transcriptome assembly.
Supplementary Table S1 (DOCX). Available Polyneopteran genomes (incl. Schistocerca gregaria for comparison).
Supplementary Table S2 (DOCX). Software parameter settings.
Supplementary Table S3 (DOCX). Transfer RNA (tRNA), microRNA (miRNA), small nuclear RNA (snRNA) and ribosomal RNA (rRNA) content of the desert locust genome.
Supplementary Table S4 (DOCX). Desert locust genome annotation details.
Supplementary Table S5 (DOCX). BUSCO assessments for the genomes of the desert locust, Schistocerca gregaria, and the migratory locust, Locusta migratoria ( Wang et al., 2014).
Supplementary Table S6 (DOCX). Functional annotation of the proteome of the desert locust.

Extended data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

Acknowledgements

Evelien Herinckx (KU Leuven) for technical support in desert locust rearing; Evert Bruyninckx (KU Leuven) for optimizing genomic DNA extraction. Ellen De Meester and Sarah De Keulenaer from NxtGNT Belgium for their practical expertise and assistance in the Illumina sequencing experiments. Wim Meert and Stephanie Deman (Genomics core Leuven) for optimizing the PacBio sequencing.

Notes

[version 2; peer review: 3 approved]

Funding Statement

This work was supported by the Special Research Fund of KU Leuven (BOF grant C14/15/050 to JVdB and HV), the Research Foundation of Flanders (FWO grants: postdoctoral fellowship 64322 to HV, G0F2417N to JVdB, G090919N to JVdB and YVdP); the Special Research Fund of Ghent University (BOFPDO2018001701 to ZL), the Department of Research and Innovation of the University of Pretoria (grant A0C827 to AY); the U.S. National Science Foundation (IOS-1253493 and IOS-1636632 to HS), the U.S. Department of Agriculture (hatch grant TEX0-1-6584 to HS) and the Biotechnology and Biological Sciences Research Council UK (BBSRC; research grant BB/L02389X/1 to SRO and TM).

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

Adams MD, Celniker SE, Holt RA, et al. : The genome sequence of Drosophila melanogaster. Science. 2000;287(5461):2185–2195. 10.1126/science.287.5461.2185 [Abstract] [CrossRef] [Google Scholar]
Altschul SF, Gish W, Miller W, et al. : Basic local alignment search tool. J Mol Biol. 1990;215(3):403–410. 10.1016/S0022-2836(05)80360-2 [Abstract] [CrossRef] [Google Scholar]
Badisco L, Huybrechts J, Simonet G, et al. : Transcriptome analysis of the desert locust central nervous system: production and annotation of a Schistocerca gregaria EST database. PLoS One. 2011;6(3):e17274. 10.1371/journal.pone.0017274 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
Bryant DM, Johnson K, DiTommaso T, et al. : A tissue-mapped Axolotl de novo transcriptome enables identification of limb regeneration factors. Cell Rep. 2017;18(3):762–776. 10.1016/j.celrep.2016.12.063 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
Camacho C, Coulouris G, Avagyan V, et al. : BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. 10.1186/1471-2105-10-421 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
Camacho JPM, Ruiz-Ruano FJ, Martín-Blázquez R, et al. : A step to the gigantic genome of the desert locust: chromosome sizes and repeated DNAs. Chromosoma. 2015;124(2):263–75. 10.1007/s00412-014-0499-0 [Abstract] [CrossRef] [Google Scholar]
Courcoux G: Invasions of locusts: a lasting impact. Scientific news of the Institut de Recherche pour le Développement. 2012;411. Reference Source [Google Scholar]
Cressman K: Desert Locust.In: Shroder, J.F., Sivanpalli, R. (Eds.), Biological and Environmental Hazards, Risks, and Disasters. Elsevier,2016;87–105. Reference Source [Google Scholar]
Cullen DA, Cease AJ, Latchininsky AV, et al. : From molecules to management: Mechanisms and consequences of locust phase polyphenism. Adv Insect Physiol. 2017;53:167–285. 10.1016/bs.aiip.2017.06.002 [CrossRef] [Google Scholar]
de Vreyer P, Guilbert N, Mesple-Somps S: The 1987-89 locust plague in Mali: Evidences of the heterogeneous impact of income shocks on education outcomes. No DT/2012/05, Working Papers, DIAL (Développement, Institutions et Mondialisation). 2012;48. Reference Source [Google Scholar]
Dominguez Del Angel V, Hjerde E, Sterck L, et al. : Ten steps to get started in Genome Assembly and Annotation [version 1; peer review: 2 approved]. F1000Res. 2018;7:ELIXIR–148. 10.12688/f1000research.13598.1 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
Fox DP: A non-doubling DNA series in somatic tissues of the locusts Schistocerca gregaria (Forskål) and Locusta migratoria (Linn.). Chromosoma. 1970;29(4):446–461. 10.1007/BF00281927 [Abstract] [CrossRef] [Google Scholar]
Grabherr MG, Haas BJ, Yassour M, et al. : Full-length transcriptome assembly from RNA-seq data without a reference genome. Nat Biotechnol. 2011;29(7):644–652. 10.1038/nbt.1883 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
Griffiths-Jones S, Bateman A, Marshall M, et al. : Rfam: an RNA family database. Nucleic Acids Res. 2003;31(1):439–441. 10.1093/nar/gkg006 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
Haas BJ, Delcher AL, Mount SM, et al. : Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 2003;31(19):5654–5666. 10.1093/nar/gkg770 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
Haas BJ, Papanicolaou A, Yassour M, et al. : De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc. 2013;8(8):1494. 10.1038/nprot.2013.084 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
Haas BJ, Salzberg SL, Zhu W, et al. : Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 2008;9(1):R7. 10.1186/gb-2008-9-1-r7 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
Hoff KJ, Lomsadze A, Borodovsky M, et al. : Whole-genome annotation with BRAKER. Methods Mol Biol. 2019;1962:65–95. 10.1007/978-1-4939-9173-0_5 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
John B, Hewitt, GM: Karyotype stability and DNA variability in the Acrididae. Chromosoma. 1966;20:155–172. 10.1007/BF00335205 [CrossRef] [Google Scholar]
Kim D, Langmead B, Salzberg SL: HISAT: a fast spliced aligner with low memory requirements. Nat Methods. 2015;12(4):357–360. 10.1038/nmeth.3317 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
Koren S, Walenz BP, Berlin K, et al. : Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017;27(5):722–736. 10.1101/gr.215087.116 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
Korf I: Gene finding in novel genomes. BMC Bioinformatics. 2004;5(1):59. 10.1186/1471-2105-5-59 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
Lorenz R, Bernhart SH, Zu Siederdissen CH, et al. : ViennaRNA Package 2.0. Algorithms Mol Biol. 2011;6(1):26. 10.1186/1748-7188-6-26 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
Lowe TM, Eddy SR: tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25(5):955–964. 10.1093/nar/25.5.955 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
Magrane M, UniProt Consortium: UniProt Knowledgebase: a hub of integrated protein data. Database (Oxford). 2011;2011:bar009. 10.1093/database/bar009 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
Majoros WH, Pertea M, Salzberg SL: TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics. 2004;20(16):2878–2879. 10.1093/bioinformatics/bth315 [Abstract] [CrossRef] [Google Scholar]
Meynard CN, Lecoq M, Chapuis MP, et al. : On the relative role of climate change and management in the current desert locust outbreak in East Africa. Glob Chang Biol. 2020;26(7):3753–3755. 10.1111/gcb.15137 [Abstract] [CrossRef] [Google Scholar]
Misof B, Liu S, Meusemann K, et al. : Phylogenomics resolves the timing and pattern of insect evolution. Science. 2014;346(6210):763–767. 10.1126/science.1257570 [Abstract] [CrossRef] [Google Scholar]
Mitchell AL, Attwood TK, Babbit PC, et al. : InterPro in 2019: improving coverage, classification and access to protein sequence annotations. Nucleic Acids Res. 2019;47(D1):D351–D360. 10.1093/nar/gky1100 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
Nawrocki EP, Eddy SR: Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 2013;29(22):2933–2935. 10.1093/bioinformatics/btt509 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
O’Connell J, Schulz-Trieglaff O, Carlson E, et al. : NxTrim: Optimized trimming of Illumina mate pair reads. Bioinformatics. 2015;31(12):2035–2037. 10.1093/bioinformatics/btv057 [Abstract] [CrossRef] [Google Scholar]
Pener MP, Simpson SJ: Locust phase polyphenism: an update. Adv Insect Physiol. 2009;36:1–272. 10.1016/S0065-2806(08)36001-9 [CrossRef] [Google Scholar]
Pertea M, Pertea GM, Antonescu CM, et al. : StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. 2015;33(3):290–5. 10.1038/nbt.3122 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
Schoofs L, Veelaert D, Vanden Broeck J, et al. : Peptides in the locusts, Locusta migratoria and Schistocerca gregaria. Peptides. 1997;18(1):145–56. 10.1016/s0196-9781(96)00236-7 [Abstract] [CrossRef] [Google Scholar]
Simão FA, Waterhouse RM, Ioannidis P, et al. : BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210–3212. 10.1093/bioinformatics/btv351 [Abstract] [CrossRef] [Google Scholar]
Simpson JT, Wong K, Jackman SD, et al. : ABySS: a parallel assembler for short read sequence data. Genome Res. 2009;19(6):1117–1123. 10.1101/gr.089532.108 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
Slater GSC, Birney E: Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics. 2005;6(1):31. 10.1186/1471-2105-6-31 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
Stanke M, Keller O, Gunduz I, et al. : AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 2006;34(Web Server issue):W435–W439. 10.1093/nar/gkl200 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
Symmons PM, Cressman K: Desert Locust Guidelines. Second edition. Food and Agriculture Organization of the United Nations (Rome).2001. Reference Source [Google Scholar]
Tarailo-Graovac M, Chen N: Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics. 2009; Chapter 4: Unit 4.10. 10.1002/0471250953.bi0410s25 [Abstract] [CrossRef] [Google Scholar]
Tav C, Tempel S, Poligny L, et al. : miRNAFold: a web server for fast miRNA precursor prediction in genomes. Nucleic Acids Res. 2016;44(W1):W181–W184. 10.1093/nar/gkw459 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
UniProt Consortium: UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 2019;47(D1):D506–D515. 10.1093/nar/gky1049 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
Tørresen OK, Star B, Mier P, et al. : Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Res. 2019;47(21):10994–11006. 10.1093/nar/gkz841 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
van Huis A, Van Itterbeeck J, Klunder H, et al. : Edible insects: future prospects for food and feed security. FAO Forestry Paper. 2013;171:187. Reference Source [Google Scholar]
Verlinden H, Badisco L, Marchal E, et al. : Endocrinology of reproduction and phase transition in locusts. Gen Comp Endocrinol. 2009;162(1):79–92. 10.1016/j.ygcen.2008.11.016 [Abstract] [CrossRef] [Google Scholar]
Verlinden H, Sterck L, Li J, et al. : First draft genome assembly of the desert locust, Schistocerca gregaria - extended data. figshare. Journal contribution.2020. 10.6084/m9.figshare.12654026.v1 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
Wang X, Fang X, Yang P, et al. : The locust genome provides insight into swarm formation and long-distance flight. Nat Commun. 2014;5:2957. 10.1038/ncomms3957 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
Wang Y, Jiang F, Wang H, et al. : Evidence for the expression of abundant microRNAs in the locust genome. Sci Rep. 2015;5:13608. 10.1038/srep13608 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
Warren RL, Yang C, Vandervalk BP, et al. : LINKS: Scalable, alignment-free scaffolding of draft genomes with long reads. Gigascience. 2015;4(1):35. 10.1186/s13742-015-0076-3 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
Wilmore PJ, Brown AK: Molecular properties of orthopteran DNA. Chromosoma. 1975;51(4):337–345. 10.1007/BF00326320 [Abstract] [CrossRef] [Google Scholar]
Wipfler B, Letsch H, Frandsen PB, et al. : Evolutionary history of Polyneoptera and its implications for our understanding of early winged insects. Proc Natl Acad Sci USA. 2019;116(8):3024–3029. 10.1073/pnas.1817794116 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
Wu C, Twort VG, Crowhurst RN, et al. : Assembling large genomes: analysis of the stick insect ( Clitarchus hookeri) genome reveals a high repeat content and sex-biased genes associated with reproduction. BMC Genomics. 2017;18(1):884. 10.1186/s12864-017-4245-x [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
Xu Z, Wang H: LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 2007;35(Web Server issue):W265–W268. 10.1093/nar/gkm286 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

Approved

Surya Saha, Referee¹

¹Boyce Thompson Institute for Plant Research, Ithaca, NY, USA

Competing interests: No competing interests were disclosed.

Review date: 2021 Jun 14. Status: Approved. 10.5256/f1000research.56735.r85888

This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The authors have responded satisfactorily to my comments. I look forward to future work from the authors that address the additional analysis I had mentioned in my first report.

Is the work clearly and accurately presented and does it cite the current literature?

Yes

If applicable, is the statistical analysis and its interpretation appropriate?

Not applicable

Are all the source data underlying the results available to ensure full reproducibility?

Yes

Is the study design appropriate and is the work technically sound?

Yes

Are the conclusions drawn adequately supported by the results?

Yes

Are sufficient details of methods and analysis provided to allow replication by others?

Yes

Reviewer Expertise:

Arthropod genomics and transcriptomics, Comparative genomics and Metagenomics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Approved

Uwe Homberg, Referee¹

¹Animal Physiology, Department of Biology, Center for Mind, Brain and Behavior (CMBB), University of Marburg, Giessen, Germany

Competing interests: No competing interests were disclosed.

Review date: 2021 May 24. Status: Approved. 10.5256/f1000research.56735.r85889

This paper looks fine now.

Is the work clearly and accurately presented and does it cite the current literature?

Yes

If applicable, is the statistical analysis and its interpretation appropriate?

Yes

Are all the source data underlying the results available to ensure full reproducibility?

Yes

Is the study design appropriate and is the work technically sound?

Yes

Are the conclusions drawn adequately supported by the results?

Yes

Are sufficient details of methods and analysis provided to allow replication by others?

Yes

Reviewer Expertise:

neurobiology of the desert locust, neuropeptide research in desert locust, neuroanatomy of the locust brain

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Approved

Joshua B. Benoit, Referee¹

¹Department of Biological Sciences, University of Cincinnati, Cincinnati, OH, USA

Competing interests: No competing interests were disclosed.

Review date: 2020 Nov 2. Status: Approved. 10.5256/f1000research.27753.r72658

This is an extremely important agricultural pest and having a genome for this species will allow for more future comparisons among locust species. This study represents a great deal of work and the techniques used are appropriate and well described. There is some room for improvement, but a valuable contribution.

I would suggest to add a little more biological interpretation. Was there anything of interest and unique identified? Specifically, anything related to the transition from solitary to swarming.
The assembly is of sufficient quality for some comparisons to other insects, but there might be issues with the low BUSCO score. This was similar to the Locusta genome. Please check the BUSCO score of the Trinity assembly to determine if the missing genes are present. If the missing genes are present in the de novo assembly, I would make sure to make the de novo assembly available until a higher quality genome is available.
Were any bacterial symbiont present or microbial contamination detected? How were these accounted for in the assembly?
As a future goal, I would suggest adding techniques for chromosome scaffolding (e.g. Hi-C). This genome is fine as a draft, a higher quality version will be needed for future comparisons.

Is the work clearly and accurately presented and does it cite the current literature?

Yes

If applicable, is the statistical analysis and its interpretation appropriate?

Yes

Are all the source data underlying the results available to ensure full reproducibility?

Yes

Is the study design appropriate and is the work technically sound?

Yes

Are the conclusions drawn adequately supported by the results?

Yes

Are sufficient details of methods and analysis provided to allow replication by others?

Yes

Reviewer Expertise:

Insect physiology, molecular biology, and genomics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Approved with Reservations

Surya Saha, Referee¹

¹Boyce Thompson Institute for Plant Research, Ithaca, NY, USA

Competing interests: No competing interests were disclosed.

Review date: 2020 Oct 27. Status: Approved with Reservations. 10.5256/f1000research.27753.r72656

This work is timely given the locust outbreaks in East Africa and recently in parts of West Asia with the devasting impact on crops of small holder farmers in these regions besides secondary impacts on nutrition and human health. The large genome size and repetitive regions make this a challenging genome to assemble. The phase polyphenism of the gregarious and solitarious adults make this a fascinating system to study for social behavior and physiology in arthropods. A high-quality chromosomal length genome assembly for S. gregaria will lay the foundation for genetics and phenotyping of this important insect pest. The methods for the genome assembly, protein coding and non-coding gene annotation are clearly described in the paper and in the extended data. Inclusion of the parameters used is helpful for the reproducibility of the genome assembly process. I commend the authors on a well written manuscript.

Although this is a valuable contribution to Polyneoptera genomics, it is possible to do a better job of utilizing the new genome and annotation for comparison with other sequenced relatives in Polyneoptera, especially the migratory locust. In my humble opinion, the manuscript can be improved a lot if these issues are addressed.

1. This manuscript can become a tour de force for locust genomics if additional analysis and discussion were to be included. Gene families related to energy consumption and detoxification already identified in the migratory locust are of particular interest. There are two other aspects that, if addressed, will be of value to the community.

1a.The authors mention a greater presence of ncRNA elements in the S. gregaria genome. The association of these potential regulatory elements with protein coding genes based on RNA data from this paper and other public data will be useful.

1b. The other point is about a more detailed characterization of the repeat elements that account for 62% of the genome. A GenomeScope or similar plot of the heterozygosity in the Illumina reads might be useful to understand the repetitive structure.

I know this adds additional burden on the authors but I hope they see my rationale.

2. Endosymbionts been reported for other locust genomes (https://www.mdpi.com/2075-4450/11/10/655 ¹). These are typical by products of insect genome assembly. Were any microbial contigs found in the assembly for known endosymbionts?

I had a few minor points:

1. The introduction states that the potentially expanded non-coding portion of the genome in S. gregaria makes the assembly more challenging. Can the authors please expand on this argument?

2. Was any kind of filtering done to remove microbial contamination? The animal material protocol in the supplementary methods does not mention starving the insects before DNA extraction. Can this have introduced microbial contamination from the feed even though the leaves were washed?

3. This manuscript represents a mammoth amount of work that has gone into the genome assembly. The standard of quality for a genome assembly has increased vastly since the L. migratiria genome was published in 2014. Since sourcing high quality DNA from the insects does not seem to be a major challenge as far as I know, were long range scaffolding methods like Hi-C or BioNano explored for chromosomal scale scaffolding?

Is the work clearly and accurately presented and does it cite the current literature?

Yes

If applicable, is the statistical analysis and its interpretation appropriate?

Not applicable

Are all the source data underlying the results available to ensure full reproducibility?

Yes

Is the study design appropriate and is the work technically sound?

Yes

Are the conclusions drawn adequately supported by the results?

Yes

Are sufficient details of methods and analysis provided to allow replication by others?

Yes

Reviewer Expertise:

Arthropod genomics and transcriptomics, Comparative genomics and Metagenomics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

References

1. : Locust Bacterial Symbionts: An Update. Insects .2020;11(10) : 10.3390/insects11100655 10.3390/insects11100655 [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

Approved

Uwe Homberg, Referee¹

¹Animal Physiology, Department of Biology, Center for Mind, Brain and Behavior (CMBB), University of Marburg, Giessen, Germany

Competing interests: No competing interests were disclosed.

Review date: 2020 Oct 5. Status: Approved. 10.5256/f1000research.27753.r71637

This is a marvellous paper based on an enormous effort for genome assembly in this insect. The work is urgently needed in order to promote a large number of studies on the behavior and physiology of this insect. The data are highly likely to ultimately better understand migratory behavior in the desert locust as well as its phase polyethism. I have no comments or suggestions for further improvements of this already excellent achievement.

Is the work clearly and accurately presented and does it cite the current literature?

Yes

If applicable, is the statistical analysis and its interpretation appropriate?

Yes

Are all the source data underlying the results available to ensure full reproducibility?

Yes

Is the study design appropriate and is the work technically sound?

Yes

Are the conclusions drawn adequately supported by the results?

Yes

Are sufficient details of methods and analysis provided to allow replication by others?

Yes

Reviewer Expertise:

neurobiology of the desert locust, neuropeptide research in desert locust, neuroanatomy of the locust brain

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Articles from F1000Research are provided here courtesy of F1000 Research Ltd

Full text links

Read article at publisher's site: https://doi.org/10.12688/f1000research.25148.2

Citations & impact

Impact metrics

Citations

Jump to Citations

Citations of article over time

Alternative metrics

Altmetric item for https://www.altmetric.com/details/106605118

Altmetric
Discover the attention surrounding your research
https://www.altmetric.com/details/106605118

Altmetric item for https://www.altmetric.com/details/86609165

Altmetric
Discover the attention surrounding your research
https://www.altmetric.com/details/86609165

Smart citations by scite.ai
Explore citation contexts and check if this article has been supported or disputed.
https://scite.ai/reports/10.12688/f1000research.25148.2

Supporting

Mentioning

Contrasting

Article citations

Genome of the Lord Howe Island Stick Insect Reveals a Highly Conserved Phasmid X Chromosome.
Stuart OP, Cleave R, Magrath MJL, Mikheyev AS
Genome Biol Evol, 15(6):evad104, 01 Jun 2023
Cited by: 0 articles | PMID: 37279506 | PMCID: PMC10268593
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
Sizing Up the Onychophoran Genome: Repeats, Introns, and Gene Family Expansion Contribute to Genome Gigantism in Epiperipatus broadwayi.
Sato S, Cunha TJ, de Medeiros BAS, Khost DE, Sackton TB, Giribet G
Genome Biol Evol, 15(3):evad021, 01 Mar 2023
Cited by: 1 article | PMID: 36790097 | PMCID: PMC9985152
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
Transposable element expansion and low-level piRNA silencing in grasshoppers may cause genome gigantism.
Liu X, Majid M, Yuan H, Chang H, Zhao L, Nie Y, He L, Liu X, He X, Huang Y
BMC Biol, 20(1):243, 28 Oct 2022
Cited by: 6 articles | PMID: 36307800 | PMCID: PMC9615261
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
Neuromodulation and the toolkit for behavioural evolution: can ecdysis shed light on an old problem?
Sullivan LF, Barker MS, Felix PC, Vuong RQ, White BH
FEBS J, 291(6):1049-1079, 31 Oct 2022
Cited by: 0 articles | PMID: 36223183 | PMCID: PMC10166064
Review Free full text in Europe PMC
Knockdown of the Halloween Genes spook, shadow and shade Influences Oocyte Development, Egg Shape, Oviposition and Hatching in the Desert Locust.
Schellens S, Lenaerts C, Pérez Baca MDR, Cools D, Peeters P, Marchal E, Vanden Broeck J
Int J Mol Sci, 23(16):9232, 17 Aug 2022
Cited by: 1 article | PMID: 36012497 | PMCID: PMC9408901
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC

Go to all (20) article citations

Data

Data behind the article

This data has been text mined from the article, or deposited into data resources.

BioStudies: supplemental material and supporting data

http://www.ebi.ac.uk/biostudies/studies/S-EPMC7607483?xr=true

BioProject

(2 citations) BioProject - PRJEB38779

Funding

Funders who supported this work.

Biotechnology and Biological Sciences Research Council (1)

Grant ID: BB/L02389X/1
8 publications

Search life-sciences literature (44,011,949 articles, preprints and more)

First draft genome assembly of the desert locust, Schistocerca gregaria.

Author information

Affiliations

Authors

Authors

Authors

Authors

Authors

ORCIDs linked to this article

Abstract

Free full text

First draft genome assembly of the desert locust, Schistocerca gregaria

Heleen Verlinden

Lieven Sterck

Jia Li

Zhen Li

Anna Yssel

Yannick Gansemans

Rik Verdonck

Michiel Holtof

Hojun Song

Spencer T. Behmer

Gregory A. Sword

Tom Matheson

Swidbert R. Ott

Dieter Deforce

Filip Van Nieuwerburgh

Yves Van de Peer

Jozef Vanden Broeck

Associated Data

Underlying data

Extended data

Version Changes

Revised. Amendments from Version 1

Peer Review Summary

Abstract

Introduction

Methods

Sequencing strategy

Illumina sequencing

PacBio sequencing

Genome assembly

Annotation of repetitive elements and noncoding RNAs

Gene prediction and functional annotation

Results and discussion

Genome size and assembly

Table 1.

Repetitive elements and noncoding RNAs

Table 2.

Protein-coding genes

Table 3.

Conclusions

Data availability

Underlying data

Extended data

Acknowledgements

Notes

Funding Statement

References

Approved

Surya Saha

Approved

Uwe Homberg

Approved

Joshua B. Benoit

Approved with Reservations

Surya Saha

References

Approved

Uwe Homberg

Full text links

Citations & impact

Impact metrics

Citations of article over time

Alternative metrics

Article citations

Data

Data behind the article

BioStudies: supplemental material and supporting data

Biotechnology and Biological Sciences Research Council (1)