Posted on

cannabis seed identification guide

The Name of Cannabis: A Short Guide for Nonbotanists

The genus Cannabis (Family Cannabaceae) is probably indigenous to wet habitats of Asiatic continent. The long coexistence between mankind and Cannabis led to an early domestication of the plant, which soon showed an amazing spectrum of possible utilizations, as a source of textile fibers, as well as narcotic and psychoactive compounds. Nowadays, the specie(s) belonging to the genus Cannabis are represented by myriads of cultivated varieties, often with unstable taxonomic foundations. The nomenclature of Cannabis has been the object of numerous nomenclatural treatments. Linnaeus in Species Plantarum (1753) described a single species of hemp, Cannabis sativa, whereas Lamarck (1785) proposed two species of Cannabis: C. sativa, the species largely cultivated in Western Continent, and Cannabis indica, a wild species growing in India and neighboring countries. The dilemma about the existence of the species C. indica considered distinct from C. sativa continues up to present days. Due to their prevalent economic interest, the nomenclatural treatment is particularly important as far as it concerns the cultivated varieties of Cannabis. In this context, we propose to avoid the distinction between sativa and indica, suggesting a bimodal approach: when a cultivar has been correctly established. It could be advisable to apply a nomenclature system based on the International Code of Nomenclature for Cultivated Plants (ICNCP): it is not necessary to use the species epithets, sativa or indica, and a combination of the genus name and a cultivar epithet in any language and bounded by single quotation marks define an exclusive name for each Cannabis cultivar. In contrast, Cannabis varieties named with vernacular names by medical patients and recreational users, and lacking an adequate description as required by ICNCP, should be named as Cannabis strain, followed by their popularized name and without single quotation marks, having in mind that their names have no taxonomical validity.


Depending on the taxonomical treatment adopted, 1 the genus Cannabis (Hemp, Family Cannabaceae) includes up to three species, each with a very long history of domestication. Plants belonging to this genus are probably indigenous to the Asiatic Continent, where they preferably grew in wet places and near water bodies. 2 This kind of environment was also frequently chosen as a temporary settlement by human nomadic groups, before the discovery and diffusion of agricultural techniques. 3 Cannabis species in the wild had a weedy attitude, growing in soils with high concentrations of nitrogen released by animal dejections and human activities. 2 The long coexistence between mankind and hemp led to an early domestication of the plant, which soon showed an amazing spectrum of possible utilizations. Hemp has been used as a source of textiles, as an edible plant, 4 and as a medicinal and psychoactive plant 5 (resins produced by secretory glandular trichomes). In recent times, hemp fibers have been used to produce bioplastic and antibacterial agents; moreover, the trichomes are considered as biofactories of phytochemicals with multiple biotechnological applications. 6 The extent of Cannabis domestication has been so persistent to cause the disappearing of the wild species: nowadays, the specie(s) belonging to the genus Cannabis are represented by myriads of cultivated varieties, which occasionally escape cultivation and grow also in the wild, giving life to forms that lose some features typical of cultivated ones. For this reason, the nomenclature of Cannabis has unstable foundations and has been the object of numerous taxonomic treatments. To fully understand the difficulties in applying a shared nomenclature to Cannabis, a digression is necessary to describe what is a species ( Table 1 ) and what means to give a name to a species.

Table 1.

What Is a Species: A Biological (and Nomenclatural) Dilemma

“There is no consensus on how to define a species, and likely never will be.” 19 Despite this discouraging preamble, we will try to present some basic information. The following definitions are among the most diffused in the species-definition debate over the last 50 years.
Biological species concept 30
 “Species are groups of interbreeding natural populations that are reproductively isolated from other such groups.”
Diagnostic concept 31
 A species can be defined as “the smallest aggregation of population (sexual) or lineages (asexual) diagnosable by a unique combination of character states in comparable individuals.” 31 According to this definition, a species is limited by a definite set of characters, which, traditionally, are morphological.
Genealogical species concept 32
 A species is represented by populations that constitute a single group, without any exclusive subgroup. All the members of the group share a common ancestor (monophyly).
Ecological species concept 33
 “A species is a lineage (or a closely related set of lineages), which occupies an adaptive zone minimally different from that of any other lineage in its range and which evolves separately from all lineages outside its range. A lineage is a clone or an ancestral-descendent sequence.”
 There are unsolved difficulties in applying any of the definitions listed above, whatever the adopted: “every group differs in the biological criteria impacting species divergence, setting up a sliding scale from well defined to problematic species.” 34

An Outline of Nomenclatural Rules in Botany

Naturals sciences rely on shared nomenclatural rules. Although this statement sounds obvious now, it was not so for centuries, until at the beginning of the 18th Century became evident that there was a need to develop efficient nomenclatural tools for handling an increasing number of organisms. Naturalists during the 16th and 17th Centuries applied to species names that were actually short descriptions (polynomial system). The tendency toward a simplification of nomenclature was already evident in the Pinax Theatri Botanici, written in 1623 by Caspar Bauhin, 7 but only in the mid of 18th century Carl von Linnaeus provided a new framework to nomenclature, recommending in his Species Plantarum 8 that each species should be designated by a nomen trivialis, formed by the union of the generic name with a single word (epithet). By the second half of 18th century, this binomial nomenclature was adopted worldwide, and the need for a set of nomenclatural rules was already raised by JB Lamarck at the end of the same Century. 9 The first formalized laws for the nomenclature of plant species were prepared by Alphonse De Candolle in 1867, 10 but only in the 20th century both botanists and zoologists produced Codes of nomenclature, accepted by the international community of scholars. As far as botany is concerned, the International Code of Botanical Nomenclature has set up the rules for naming plants starting from the International Botany Congress held in Vienna in 1905. 11 However, the first Code accepted by the botanist community is the Cambridge International Code 12 and only after the Second World War a regular update of the Code has been carried out every 6 years. Since its last edition, 13 the Code has been renamed as the International Code of Nomenclature for Algae, Fungi, and Plants (ICN). The system proposed by the ICN is closed and hierarchically arranged ( Table 2 ).

Table 2.

A Simplified Summary of the Hierarchical Organization of the International Code of Nomenclature for Algae, Fungi, and Plants

Taxon: a taxonomic group of any rank
The taxa of one rank exclude each other
The name of a taxon is ruled by:
 1. Publication validity;
 2. Priority;
 3. Typification;
The species is the core taxon of the system
The rank below the species is the varietas
Varietates showing pattern of affinity are grouped into subspecies

A taxonomic group of any rank (generically called taxon) can be considered as valid if: (1) it has been regularly published; (2) it has not been diversely and correctly named before (priority); and (3) it has been typified. A type is a material on which the description of a taxon is based. In the case of a plant species, it is generally a herbal specimen. The specimen on which the description is based is called the holotype ( Table 3 ).

Table 3.

Handling Nomenclature Principles

How to give a valid name to a species—some basic rules
 1. Check if your putative new species has been already described and correctly erected. If not:
 2. Write a protologue, which is a description of the morphological diagnostic features of the new species, and draw a sketch of the specimen (the iconotypus). Description and drawings should be carried out on an individual plant that represents the species: the holotypus.
 3. The holotypus should be preserved in an official repository (i.e., an Herbarium).
When a name of a species need to be reexamined
 1. If it is a nomen nudum (someone gave a name, but he didn’t write the protologue)
 2. If the same species has already been correctly named (priority)
 3. If it has not been typified

All nomenclatural rules included in the Code are based on the taxon system, and this architecture raises some important questions. The hierarchical system of taxa, although the Code is scientifically neutral and provides only a series of conventional rules, is deeply rooted into evolutionary theory. Most practitioners in nomenclature consider a taxon as a monophyletic entity 14 and arrange the nomenclature according to the current opinions on plant phylogeny. This tendency is particularly evident when new taxa are created or separated following molecular approaches, for example, DNA barcoding. 15 What is the position of cultivated plants like Cannabis in this framework? It is acknowledged that the botanical entities known as cultivated varieties are a product of human selection and cannot be assimilated to wild varietates. In contrast to these latter, cultivated varieties (cultivars) are a product of human activity and are not subjected to the selective pressure of the environment. 16 This argument has been largely debated, and the idea that cultivars should be considered as a separate matter is not new. Linnaeus was the first to place cultivated plants under a separated category, suggesting the adoption of different nomenclatural rules for them. 17 However, it was not until 1953 that the first edition of the International Code of Nomenclature for Cultivated Plants (ICNCP) was published. 18

The first and foremost principle of the ICNCP is that the names of cultivated plants cannot be handled using the system of taxon, which is replaced by the culton (a systematic group of cultivated plants). 19 The core entity of the nomenclatural system for cultivated plants is the cultivated variety or cultivar ( Table 4 ). Each cultivar is the product of human selection and is directed toward definite goals related to human activities. The cultivar can be reproduced and is not subjected to extinction. The nomenclatural system of cultivars is open: each name of a cultivar is not exclusive and the same cultivar could have different names, depending on the scope of the classification. 16 Cultivars are static units; they are defined by a set of characters and are linked to a standard, generally a specimen, or a document. 19

Table 4.

Some Basic Rules for the Nomenclature of Cultivated Plants

Culton: a systematic group of cultivated plants
Cultivar: a cultivated variety, uniform and stable in its characters
Group: an assemblage of similar cultivars on the basis of defined characters
The name of a cultivar or Group is the combination of the genus, or lower taxon to which it is assigned, with a cultivar or group epithet
The epithet can be a vernacular word of any language and should be not written in italics
The epithet is bounded by single quotation marks

The Classification of Cannabis

The existence of cultivated and wild entities of hemp dates back to Dioscorides and passing from the physicians and botanists of the Renaissance (the German botanist Leonardt Fuchs was the first to adopt the term sativa, for indicating the domesticated hemp 20 ) survived until the 18th Century, when Linnaeus in Species Plantarum 8 described a single species of hemp, Cannabis sativa. Later, Jean-Baptiste Lamarck 9 proposed two species of Cannabis: C. sativa, the species largely cultivated in the western continents, and Cannabis indica, a wild species growing in India. 21 The taxonomic treatment of Lamarck was rejected about 50 years later by J. Lindley, 22 who restricted Cannabis to C. sativa, following Linnaeus’ classification, and the concept of Cannabis as a monospecific genus was confirmed in the following century. Only in the second decade of 1900’s a new species, Cannabis ruderalis, 23 was erected, whereas the reinstatement of the species C. indica was more recently suggested by Schultes et al. 24 In more recent times, genomic DNA studies to classify C. sativa have been carried out using Cannabis varieties of different geographical origin. The results seem to suggest that a polytypic concept of Cannabis cannot be ruled out. 25 In addition, chemotaxonomical markers are a promising tool to identify different Cannabis accessions and to screen hybrids, taking into account that all Cannabis varieties intercross successfully and produce fertile hybrids. 26

A biphasic approach, combining morphological and chemical characters (fruit morphology and Δ 9 -tetrahydrocannabinol [THC] content) was adopted by Small and Cronquist, 1 who recognized the following four Cannabis taxa (all belonging to the single species C. sativa) that “coexist dynamically by means of natural and artificial selection”:

1. Cannabis sativa L. subsp. sativa var. sativa;

2. Cannabis sativa L. subsp. sativa var. spontanea Vavilov;

3. Cannabis sativa L. subsp. indica Small & Cronquist var. indica (Lam) Wehmer;

4. Cannabis sativa L. subsp. indica Small & Cronquist var. kafiristanica (Vavilov) Small & Cronquist.

According to the authors, both varietates belonging to the subspecies sativa are common in North America, Europe, and Asia and show a limited intoxicant potential. In contrast, the varietates of the subspecies indica have high intoxicant potential and grow mainly in the Asiatic Continent.

Recently, Small 2 has proposed two possible classification of Cannabis, one based on ICP, which confirms his previous taxonomical treatment, and a new classification system for domesticated Cannabis, which is based on ICNCP and recognizes six groups of cultivars as follows:

1. Group of the non-narcotic plants, domesticated for stem fiber and/or oil seed in Western Asia and Europe. Low THC and high cannabidiol (CBD);

2. Group of the non-narcotic plants domesticated in East Asia, mainly China. Low to moderate THC, high CBD;

3. Group of the narcotic plants domesticated in South Central Asia. High cannabinoids, mostly THC;

4. Group of the narcotic plants domesticated in South Asia (Afghanistan and neighboring Countries), contains both THC and CBD.

In addition, there are also at least two stabilized hybrid groups with intermediate characters between the four groups ( Table 5 ).

Table 5.

Floral Characteristics of Cannabis

In 95% of Angiosperms (flowering plants), the flower contains both male and female reproductive structures, but in the remaining 5%, flowers bear either male or female reproductive structures. If the same individual bears both male and female flowers the plant is called monoecious, and if male and female flowers are produced by different individuals the plant is called dioecious.
Cannabis is a genus characterized by dioecy, with male individuals showing short life cycle, and higher and slimmer shoots compared to female ones, but cultivars that produce also hermaphrodite or monoecious flowers (bearing separate male and female flowers on the same individual) are well known. 35
Hybridization is the merging of differing gene pools to create offspring. Cannabis is wind pollinated; male plants produce vast amounts of pollen that can spread over large geographical areas, allowing the pollination of female flowers of plants growing very far from pollen-bearing flowers.
The extensive cultivation of Cannabis plants and the absence of barriers, which reduce or constrain interbreeding, lead to the production of numerous fertile hybrids that can maintain their characteristics over different generations. 1,24

This recent systematic treatment calls attention to the still existing practical difficulties of applying the International Code of Nomenclature to the genus Cannabis. Small 2 is careful in the application of the code, and this cautious attitude is the consequence of the perplexity about considering Cannabis exclusively as a cultivated plant. The studies of last two decades suggest that Cannabis, as other crops, exists in the so called crop–weed complexes, which are formed by cultivated forms and weedy forms escaped from cultivations and growing in the wild. These latter can establish new characters and are newly under the pressures of natural selection. Thus, it seems difficult to circumscribe Cannabis solely as a cultivated plant. In our opinion, an application of the taxon system to the genus Cannabis together with the sativa/indica distinction should be avoided, as recently suggested. 28 Due to the prevalent economic interest of the cultivated varieties of Cannabis, a simplified nomenclature system based on ICNCP should be applied. According to ICNCP, it is not mandatory to use the species epithets, sativa or indica, and a combination of the genus name and a cultivar epithet, in any language and bounded by single quotation marks (i.e., Cannabis ‘fibranova’, to cite a cultivar largely cultivated for fiber production), defines an exclusive name for each Cannabis cultivar.

However, due to its numerous medical and recreational usages, hundreds of Cannabis cultivated varieties have been developed and named with vernacular names by medical patients and recreational users. Few of these can be treated as real Cannabis cultivars, having been regularly named and registered according to the ICNCP, but many others, particularly marijuana strains, lack an adequate description and a standard. For this reason, their names cannot be accepted as cultivar epithets. Any strain that has not been formally described as a cultivar, for example, the so called Sour diesel, or Granddaddy Purple, should be named as follows: Cannabis strain Sour diesel, or strain Granddaddy Purple, with their popularized name without single quotation marks, having in mind that their names have no taxonomical validity.

Potentials and Challenges of Genomics for Breeding Cannabis Cultivars

Cannabis (Cannabis sativa L.) is an influential yet controversial agricultural plant with a very long and prominent history of recreational, medicinal, and industrial usages. Given the importance of this species, we deepened some of the main challenges—along with potential solutions—behind the breeding of new cannabis cultivars. One of the main issues that should be fixed before starting new breeding programs is the uncertain taxonomic classification of the two main taxa (e.g., indica and sativa) of the Cannabis genus. We tried therefore to examine this topic from a molecular perspective through the use of DNA barcoding. Our findings seem to support a unique species system (C. sativa) based on two subspecies: C. sativa subsp. sativa and C. sativa subsp. indica. The second key issue in a breeding program is related to the dioecy behavior of this species and to the comprehension of those molecular mechanisms underlying flower development, the main cannabis product. Given the role of MADS box genes in flower identity, we analyzed and reorganized all the genomic and transcriptomic data available for homeotic genes, trying to decipher the applicability of the ABCDE model in Cannabis. Finally, reviewing the limits of the conventional breeding methods traditionally applied for developing new varieties, we proposed a new breeding scheme for the constitution of F1 hybrids, without ignoring the indisputable contribution offered by genomics. In this sense, in parallel, we resumed the main advances in the genomic field of this species and, ascertained the lack of a robust set of SNP markers, provided a discriminant and polymorphic panel of SSR markers as a valuable tool for future marker assisted breeding programs.

1. General Introduction to Cannabis spp.: Taxonomy and History of Cultivated Varieties

Cannabis sativa L. is an agricultural plant species that today enjoys great interest because of its multiple uses in the recreational, medicinal, and industrial areas (Kovalchuk et al., 2020). This plant can be cultivated for the production of fibers (used to make different textiles), seeds (rich in unsaturated fatty acids for edible oils), and drugs from its female inflorescences that contain cannabinoids (compounds with psychotropic or psychopharmaceutical effects). Among these latter, the principal psychoactive constituent of cannabis is THC (tetrahydrocannabinol), and the concentration of this metabolite is at the basis of the distinction between hemp and drug (marijuana) types, with hemp considered low in concentration, 0.3% or less THC content (non-psychoactive), and marijuana, on the other hand, containing up to 30% THC by dry weight. In the present review, we will mainly focus on drug type cannabis.

The genus Cannabis belongs to the family of Cannabaceae (order Rosales). Its botanical classification had a very troubled genesis since the times of Linnaeus considering it was not clear whether the genus was mono- or polytypic (Schultes, 1970; Small and Cronquist, 1976; Schultes and Hofmann, 1980). In 1597, John Gerarde (Gerarde, 1597) first defined the plant species as dioecious, but the question remained open because monoecious plants can occur and hermaphroditism is also possible with plants that show reproductive organs within the same flower (Small and Cronquist, 1976; Clarke, 1981; Ming et al., 2011). All these biological variants are known to be very frequent in fiber varieties (Small and Cronquist, 1976). Plants also manifest sexual dimorphism, with male individuals being often characterized by a shorter crop cycle and a taller stature than female ones. Lamarck originally recognized two interfertile species C. sativa (from Persia) and C. indica (from India) (Lamarck, 1785). Based on this old taxonomy, many varieties available on the market are still classified as C. sativa × C. indica hybrids. As a matter of fact, the reproductive system of cannabis plants is characterized by allogamy and anemophily, and therefore open pollination is necessarily responsible for a certain degree of hybridization between improved and wild populations. This is why, according to Schultes, landraces of cannabis should no longer exist since several decades (Schultes, 1970). Later on, Small and Cronquist (Small and Cronquist, 1976) proposed a unique species system that is still widely accepted and that is based on two subspecies of C. sativa: C. sativa subsp. sativa and C. sativa subsp. indica. Although several authors, supporting the one-species system for cannabis, recommend to classify its varieties based on the cannabinoids and terpenoids profile (Hazekamp et al., 2016; Piomelli and Russo, 2016), a molecular system based on DNA barcoding could represent a cost- and time-effective technique of great help in clarifying some of the taxonomic issues related to the genus Cannabis. DNA barcoding could also play a crucial role in the identification and characterization of those uncertified cannabis strains, which are mainly derived from black market. Section 2 reviews the DNA barcoding data available for this genus and explores the potential use of this technique for taxonomic identity surveys.

According to Charlesworth et al. (2005), the dioecious species evolved from a common monoecious ancestor shared by Cannabis and Humulus (Kovalchuk et al., 2020) both characterized by having sex chromosomes (Renner, 2014). In particular, C. sativa possesses nine pairs of autosomes and a pair of X and Y sex chromosomes. The male sex is heterogametic (XY), while the female is homogametic (XX), and different authors reported distinct mechanisms involved in the determination of sex (Sakamoto et al., 1998; Faux et al., 2016). This uncertainty could derive from the fact that environmental conditions, and in particular abiotic stress factors, can influence the expression and the determination of sex (Vergara et al., 2016a). Although the structure of sex chromosomes is poorly understood in Cannabis spp., since it is not detectable with standard microscopic techniques (Sakamoto et al., 1998; Peil et al., 2003), the Y chromosome was shown to have larger dimensions than the X chromosome (Sakamoto et al., 1998; van Bakel et al., 2011). More recently, both male and female karyotypes of C. sativa L. were extensively characterized by DAPI banding procedures and FISH analyses using rDNA probes (Divashuk et al., 2014). Sex determination represents one of the main problems when breeding new cannabis varieties since it can only be assessed at the beginning of flowering, when male and female flowers are visible and distinguishable. The genetic control of dioecy seems to be determined by two specific genes at linked loci acting as sex determinants (Bergero and Charlesworth, 2008; Divashuk et al., 2014; Henry et al., 2018): Male plants would require a dominant suppressor of female organs (Su F ) and a dominant activator of maleness (M), while female plants would share homozygosity for their recessive alleles at both loci (su F su F mm), as illustrated in Figure 1 . For breeding purposes, male and female plants can then be identified in the early stages of development through the use of Y-specific DNA markers (Mandolino et al., 1999; Törjék et al., 2002). Apart from that, the molecular mechanisms underlying dioecy are essentially unknown but, considering that this condition is fully reversible (e.g., through chemical products treatment), the hypothesis that those genic regions involved in both sexes development remain potentially functional throughout the entire life cycle cannot be excluded (Di Stilio et al., 2005; Khadka et al., 2019). Given the role of homeotic genes in flower whorls identity (including anthers, pistils, and ovary), the hypothesis for their involvement in sex determination (Pfent et al., 2005; Sather et al., 2010; LaRue et al., 2013) and the lack of any information on the ABCDE model in the Cannabis genus, we screened all cannabis genomic and transcriptomic data available for homeotic genes and summarized them in Section 3. Traditionally, hemp-type and drug-type varieties have been bred mainly through mass selection. This method has been effectively used for the selection of cannabis showing improved quality traits such as fiber, oil, and cannabinoid content (Hennink, 1994). Nevertheless, one of the main problems associated with the first attempts of cannabis genetic improvement was, on the one hand, the need to avoid hemp genotypes with high THC contents, on the other hand, the availability of uniform medical genotypes, which was often linked to clandestine growers. More recently, cannabis cultivars were obtained from controlled mating using selected individuals from different landraces and cultivars. Usually, several selected individuals were used for open-pollination so that each of the female plants could be fertilized by each of the male plants (i.e., intercrosses). Synthetic varieties were also obtained by open-pollination using many female and male plants vegetatively propagated via cuttings (i.e., polycrosses).

Information on sex determinants (A) and sex chromosomes (B) in cannabis [adapted from (Bergero and Charlesworth, 2008; Divashuk et al., 2014)].

Heterosis (or hybrid vigor) has been a driving factor for breeding programs aimed at the development of both modern fiber- and drug-type cultivars. The heterotic effect is usually manifested by highly heterozygous plants produced by crossing two different lineages and/or antagonist genotypes (i.e., using parental lines that show high homozygosity for antagonist gene forms across most of the loci). The first NLD/BLD (Narrow Leaflet Drug/Broad Leaflet Drug) hybrid was “Skunk No. 1” produced in the early 1970s ( Figure 2 ). To obtain this variety, plants of the F2 progeny were chosen to carry out nine repeated inbreeding cycles aimed at increasing their homozygosity, then ten female and ten male plants were selected and vegetatively propagated for use as parental lines in all possible pairwise cross-combinations. Such a breeding strategy is very effective for the development of highly heterozygous synthetic varieties, especially if supported by progeny tests to assess the general combining ability (GCA) of parental lines.

Method used for the development of the “Skunk No. 1”: the first NLD/BLD hybrid bred in the early 1970s. To obtain this variety, plants of the F2 progeny were chosen to carry out nine repeated inbreeding cycles aimed at increasing their homozygosity, then ten female and ten male plants were selected and vegetatively propagated to be used as parental lines in all possible pairwise cross-combinations.

More frequently, selected F1 plants have been used to generate large segregating F2 populations from which favorable individuals could be eventually cloned via cuttings or used in half- or full-sibling matings. Cultivated varieties, or cultivars, were mainly produced by crossing a single male of one genetically distinct landrace with a single female of another landrace to create a hybrid, heterozygous and vigorous offspring. In the subsequent F1 generation, selected male or female progenies were bred by following one of these basic strategies: 1) Plants were inbred with one or more siblings to establish a relatively heterozygous or highly heterogeneous F2 population to be used in subsequent mass selection cycles to increase homozygosity and uniformity by intercrossing selected plants; 2) plants were backcrossed with a parental line (the seed parent or the pollen donor) to recover and fix specific traits before establishing mass selection; or 3) plants were outcrossed with an unrelated line (a plant from a third landrace) to integrate new traits and create new recombinants. Each of these breeding strategies was efficiently used to develop new cultivars using experimental hybrid materials that stemmed from crosses between distinct landraces. However, true F1 hybrid varieties were never bred in the past since agronomically super-pure inbred lines to be used yearly as parental lines were difficult to implement. Only recently some professional seed companies have produced and multiplied true F1 hybrid varieties by preserving vegetatively parental clones of the male and female lines. Nevertheless, if the parental clones are not fully homozygous and so genetically unstable, their hybrid progeny is frequently inconsistent phenotypically because of the genetic segregation of maternal and/or paternal traits. As a matter of fact, most seed companies invest in breeding programs aimed at selecting superior female plants, while male plants are deriving from the standard morphological analysis: an individual male is then used as a pollen donor in crosses performed with each of the female clones to produce commercial hybrid seed stocks. These seeds, which do not have the genetic constitution of F1 hybrids, are then widely distributed and grown to maturity so that female plants can be selected and multiplied by cuttings to achieve commercial sinsemilla production. In recent years, seeds of the so-called “all-female” cultivars have been largely set by promoting artificially selfing: this is possible by applying hormones to some branches of female plants to let them produce also male flowers with viable genetically female pollen. As a consequence, the offspring of female plants fertilized with female pollen of masculinized branches include only genetically female progeny. This is a very efficient strategy for commercial sinsemilla production as all seeds generate useful female plants with no need to remove male plants, so it provides the benefits of asexual propagation (i.e., fixation of the female genotype), but with the advantages of sexual reproduction (i.e., reproduction via seeds in place of cuttings). However, female seeds can give rise to unstable populations characterized by some degree of genetic diversity, in contrast to clonal populations produced from female cuttings. In fact, under sexual reproduction, segregation and recombination mechanisms are all possible unless the parental lines are highly homozygous inbred lines suitable for breeding true F1 hybrids. For this reason, Section 4 of this review offers new insights on next-generation methods for breeding new and true cannabis F1 hybrids.

Nevertheless, it cannot be overlooked that breeding methods conventionally used for the development of new varieties have been revolutionized since the advent of genomics applied to crop plant species. In fact, the examination of plant materials using molecular markers linked to single loci controlling specific traits of agronomic interest (i.e., marker-assisted selection, MAS) and the exploitation of multiple loci genotyping with molecular markers scattered throughout the genome (i.e., marker-assisted breeding, MAB) provide the opportunity to boost gain from selection (Tuberosa, 2012). For this reason, Section 5 provides an analytical review of the main achievements reached by genomics applied to plant resources of the genus Cannabis. Lastly, owing to the lack of a robust panel of SNP markers based on a standardized set of genes and considering the urgent need to develop a reference method for genotyping plant varieties with ease to detect markers, as well as reliable and transferable protocols, a discriminant panel of SSR markers was selected from polymorphic microsatellite regions of Cannabis spp. Recent signs of progress in the development of multiplex assays have been made in several crops (Palumbo et al., 2018; Patella et al., 2019a; Patella et al., 2019b), suggesting that these markers, especially when finely mapped and scattered throughout the genome, remain as relevant and cost-effective molecular tools at least for characterizing genetic resources and breeding new varieties. On the whole, this information is reported in Section 6.

2. Chloroplast DNA Barcodes and ITS Regions for Cannabis Species Authentication: What Is Available and Retrievable From Public Nucleotide Repositories

Currently, with the cannabis market showing increases in both demand and availability and cannabis seed companies arising wherever national law allows it, the necessity for a reliable molecular based-taxonomic system for this species is urgent. Many cannabis cultivars are obtained by crossing plants from what are commonly considered subspecies. In general, lines belonging to the two main subspecies of C. sativa, subsp. sativa and subsp. indica (Small and Cronquist, 1976), are used to produce new varieties suitable for different uses, such as fiber, oil, medical drug, and recreational applications. These subspecies differ in phenotype and chemotype, and the main characteristics according to which they are commonly distinguished are size, leaf shape, terpene accumulation, the quantity and chemistry of cannabinoids produced and earliness of flowering. A great amount of interest from breeders is focused on the determination of the subspecies “composition” of the parental lines used in crosses and that of the obtained offspring. It is important to consider the origin and phylogeny of a line or cultivar to better plan breeding strategies and guarantee a higher level of traceability. Whether for medical or recreational use, costumers are increasingly interested in tracing the origins of the products they use. Although much information about the phylogenetic taxonomy of this species is available, it is often controversial. In 2018, McPartland (2018) highlighted the different nomenclatures applied to this plant over time, from Linneus and Lemarck in the 18 th century to the most recent classification proposed by the Angiosperm Phylogeny Group in the 21 st century (The Angiosperm Phylogeny Group, 2003).

The common molecular approach for the taxonomic determination of a species or subspecies is to apply DNA barcoding to the extra-nuclear genome. In animal species, the cytochrome c oxidase I (coxI) mitochondrial gene has been set by the “Consortium of Barcode of Life” as a standard DNA barcode for determining the phylogenetic relationships between organisms, and Hebert (Hebert et al., 2003) proposed a threshold of a genetic difference in the coxI region equal to 2.7% for the discrimination of animal species. Since the coxI gene is not suitable for discriminating different taxa due to a low mutation rate in the plant mitochondrial genome, in 2007, Kress and Erickson (2007) demonstrated the suitability of the Ribulose-1,5-bisphosphate carboxylase/oxygenase large subunit (rbcL) gene and trnH-psbA noncoding spacer region as DNA barcodes for plant classification. Later, the maturase K (matK) gene was included in the list of exploitable markers for DNA barcoding in land plants (Asahina et al., 2010; Dunning and Savolainen, 2010; de Vere et al., 2015). Moreover, as the classification efficacy of these barcodes has sometimes been demonstrated to not be sufficiently informative, the use of other regions, both plastidial and nuclear, such as rpoC1 and ycf5, and ITS1 and ITS2, respectively, has been proposed for this purpose (Chen et al., 2010; Wang et al., 2014).

Much conflicting information regarding the taxonomic classification of C. sativa is available in the scientific literature (Lightfoot et al., 2016; McPartland, 2018), and the debate regarding its possible subdivision into different subspecies is still open. Because of this, we reviewed the DNA barcoding data (i.e., ITS1, ITS2, matK and rbcL sequences) available for cannabis in the two main public repositories (BOLD and GenBank) (Ratnasingham and Hebert, 2013) through a keyword search and BLASTn analysis for “Cannabis” taxa (taxid: 3482).

A total of 112 sequences were collected, including 15 matK (only C. sativa), 59 rbcL (only C. sativa), 12 ITS1 (10 C. sativa and 2 C. sativa subsp. indica) and 26 ITS2 (23 C. sativa, 2 C. sativa subsp. indica and 1 Cannabis ruderalis) sequences, which were aligned for each gene using the Geneious software Clustal Omega plug-in (Sievers et al., 2011) to investigate the percentage of pairwise identity within and between the taxa for which multiple sequences were available (i.e., ITS1 and ITS2) ( Table 1 and details in Supplementary Table 1).

Table 1

Sequences retrieved from BOLD/NCBI databases of the chloroplast genes matK and rbcL and nuclear regions ITS1 and ITS2.

N. seqs cpDNA Taxa Pairwise identity (%) Barcode gene
15 Cannabis sativa 99.7% matK
59 Cannabis sativa 99.6% rbcL
N. seqs ITS1 Taxa Cannabis sativa Cannabis sativa subsp. indica Cannabis ruderalis
10 Cannabis sativa 99.9%
2 Cannabis sativa subsp. indica 99.9% 100.0%
0 Cannabis ruderalis N/A N/A N/A
N. seqs ITS2 Taxa Cannabis sativa Cannabis sativa subsp. indica Cannabis ruderalis
23 Cannabis sativa 99.8%
2 Cannabis sativa subsp. indica 99.8% 99.5%
1 Cannabis ruderalis 99.8% 99.7% N/A

Pairwise identity percentages within and between taxa (where possible, in triangular matrixes) are also reported. N/A, number of sequences insufficient for the analysis.

Chloroplast genes were available only for the C. sativa taxa, and none were found for subsp. indica or ruderalis, making it impossible to compare them. Despite this, the calculated within-taxon (i.e., within C. sativa) percentages of identity were 99.7% and 99.6% for matK and rbcL, respectively.

On the other hand, the nuclear regions showed levels of identity within the same taxa of 100% (C. sativa subsp. indica) and 99.9% (C. sativa) for ITS1, while they were equal to 99.8% (C. sativa) and 99.5% (C. sativa subsp. indica) for ITS2. Regarding the sequence identity between C. sativa subsp. indica and C. sativa, it resulted 99.9% for ITS1 and 99.8% for ITS2 ( Table 1 ).

The only sequence available for C. ruderalis (ITS2) was used for a comparison between taxa, with values of 99.8% (C. ruderalis vs C. sativa) and 99.7% (C. ruderalis vs C. sativa subsp. indica; Table 1 ).

3. Genomics of Flower Organ Identity in Cannabis: A Comprehensive In Silico Survey of the ABCDE Genes Encoding MADS-Box Transcription Factors

Although several monoecious varieties have been developed for agronomical purposes, in nature, C. sativa is a dioecious plant characterized by unisexual flowers confined to separate individuals (Chandra et al., 2017). The male flowers are pale green, carried on axillary branched cymose panicles. The panicle flowers are solitary or alternative and occur in clusters or three-flowered cymules. Each flower is composed of five tepals and as many stamens, and a thin pedicel. The tepals are ovate-oblong, 2–4 cm in length, yellowish or whitish-green, scattered, with tiny hairs. The stamens hang and consist of thin oblong and greenish filaments and anthers. The pollen grains are released through the terminal pores of the anthers (Chandra et al., 2017). Female flowers, which are dark green, subsessile and carried in pairs are closely aggregated at the apex of inflorescences, which are prevalently formed at the upper axes of branches. Every single flower is constituted of an ovary with a style that terminates in a pair of long, thin feathered stigmas at the apex, a membranous perianth surrounding the ovary and a bract. The perianth is transparent and can be smooth or partially frayed, and when mature, it covers approximately two-thirds of the ovary. The bracts are green and rough, with overlapping edges, which enclose the female flower (Chandra et al., 2017). In angiosperms, the determination of floral organs identity is regulated by a complex genetic network acting through a range of both synergistic and antagonistic interactions (Vasconcelos et al., 2009), which have been rationalized in the so-called “ABC model” (Coen and Meyerowitz, 1991). This model, described for the first time by Weigel and Meyerowitz (1994), correlates the expression of homeotic genes to specific flower structures corresponding to the four characteristic whorls of typical eudicots, the sepals (whorl 1), petals (whorl 2), stamens (whorl 3), and carpels (whorl 4). In particular, the differentiation of each flower whorl is the result of specific interactions of transcription factors (TFs) belonging to the MADS-box multigenic family, except for APETALA 2 (AP2), which is part of the AP2/EREBP family (Irish, 2017). In the first stage, the model exclusively included the homeotic genes of A, B, and C classes but later it was extended to include also genes belonging to D and E classes (Jordan, 2006). A-class genes, when expressed alone, are responsible for the identity of the sepals (first whorl), while in combination with the B-class genes, they control the development of the second whorl (petals) (Jack, 2004). Female reproductive tissue (carpel) identity is specified by C-class genes, while stamen differentiation is the result of the combined interactions between B- and C-class genes (Coen and Meyerowitz, 1991). Finally, B-sister genes, closely related to the B-class, along with D-class genes, are specifically involved in determining ovule identity (Carmona et al., 2008; Vasconcelos et al., 2009). More recently, some genes exhibiting genetic redundancy and overlapping functionality (E-class) were found to form complexes with A, B, C, and D TFs (Vandenbussche et al., 2003; Castillejo et al., 2005), playing a decisive role in whorl development. In the last 20 years, the ABCDE model and the molecular bases underlying floral development have been deeply investigated and reviewed in model species such as Arabidopsis thaliana (Robles and Pelaz, 2005), Antirrhinum majus (Mizzotti et al., 2014), Petunia hybrida (Colombo et al., 1997), and Vitis vinifera (Palumbo et al., 2019b). In contrast, the application of this model in C. sativa has never previously been evaluated. To take the first step in this direction, we started by selecting 21 amino acid sequences from the cannabis proteome (GCA_900626175.1) based on their putative orthology (BLASTp; with well-characterized ABCDE proteins belonging to Arabidopsis and grapevine (Supplementary Table 2). A similarity-based neighbor-joining analysis (Geneious software v7.1.5, Biomatters, Ltd., Auckland, New Zealand) was then performed using the amino acid sequences of the three species (cannabis, Arabidopsis, and grapevine; Figure 3A ). The phylogenetic tree demonstrates that the ABCDE TFs selected from grapevine and Arabidopsis clustered together with the putative cannabis MADS-box protein orthologs. Moreover, the organization of the resulting dendrogram in six main clades was consistent with the gene classes represented by the model, reinforcing the correlation between sequence similarity and gene function. The putative ABCDE cannabis orthologs are reported in Table 2 . Among the A-class genes, the two isoforms of Cs_CAULIFLOWER_A ( <"type":"entrez-protein","attrs":<"text":"XP_030481490","term_id":"1731739702","term_text":"XP_030481490">> XP_030481490 and <"type":"entrez-protein","attrs":<"text":"XP_030481490","term_id":"1731739702","term_text":"XP_030481490">> XP_030481490) clustered together with VviAP1 and AtAP1/AGL7, while Cs_CAULIFLOWER_A-like_1 seemed to be the closest relative of VviFUL1 and At_FUL/AGL8. Among the B-class genes, our phylogenetic reconstruction showed that Cs_TM6 was homologous to VviAP3a and VviAP3b/VvTM6 in grapevine and At_AP3 in Arabidopsis and that Cs_MADS_2-like, Cs_MADS_2-like_X1 and Cs_MADS_2-like_X2 were all highly related to the PISTILLATA genes of A. thaliana and V. vinifera (AtPI and VviPI/VvMADS9, respectively). In our survey, Cs_FBP24-like and the two isoforms of Cs_FBP24 represent the best candidates for the B sister class due to their tight clustering with AtTT16/ABS and the three grapevine MADS-box proteins VviABS1, VviABS2, and VviABS3. The situation for classes C and D is far from clear. According to the BLASTp analysis (Supplementary Table 2) and the NJ dendrogram, CsAGAMOUS-like and Cs_MADS1 could represent orthologs of the C-class genes VviAG1/MADS1 and VviAG2 (in grapevine) and AtAG and AtAGL1/SHP1 (in Arabidopsis). However, the same two cannabis proteins also represent the two closest relatives of the class-D genes VviAG3/MADS5 and AtAGL11/STK (Supplementary Table 2), highlighting the need for further investigation. Another aspect that needs to be elucidated is the close phylogenetic relationship between a second clade of the C-class, namely, the AG6-like/MADS3 genes, and the E-class genes. In fact, the NJ dendrogram shows that AtAGL6, VviAGL6a/MADS3 and VviAGL6b along with the putative cannabis orthologs Cs_MADS3_1 and Cs_MADS3_2 grouped together with the SEPALLATA clade (E-class). Although their capability to bind to AP1, B-class, D-class, and SEP-like MADS-box proteins was proven (Hsu et al., 2003; de Folter et al., 2005), it must be noted that the function of AGL6-like/MADS3 genes in flower development has not yet been fully elucidated (Ohmori et al., 2009; Schauer et al., 2009). However, based on their phylogenetic relationship with the SEPALLATA genes and their transcriptomic profiles recently described in grapevine flower development kinetics (Palumbo et al., 2019b), we cannot exclude that these genes belong to the E-class rather than the C-class. Finally, the last branch of the NJ tree included all the clustered SEPALLATA (SEP) genes, whose redundant involvement in petal, stamen, and carpel formation led to a revision of the first ABC model (Pelaz et al., 2000). In cannabis, based on the BLASTp alignment and the NJ tree, Cs_MADS4 and five different copies and isoforms of Cs_MADS2 form a subgroup closely linked to the SEP genes of grapevine and Arabidopsis. With the aim of gaining more evidence about the role of candidate homeotic genes identified in cannabis, we took advantage of a the recent in silico analysis of 31 RNA-seq datasets derived from one hemp strain and two different psychoactive strains, Finola and Purple Kush (NCBI SRA accession numbers: SRP006678 and SRP008673), of C. sativa to investigate the behavior of floral identity MADS box genes identified in the Cannio-2 genome. The analyzed tissues and organs included the shoots, roots, stem, young and mature leaves, and early, mid- and mature-stage flowers (Massimino, 2017). A principal component analysis based on the ln(x+1)-transformed reads per kilobase of transcript per million mapped reads (RPKM) values of all MADS box genes identified showed a clear separation of the samples related to the reproductive organs from those related to the vegetative organs, with PC1 explaining 86% of the variation between samples ( Figure 3B ), confirming the hypothesis that MADS box genes identified through BLASTp analysis are effectively homeotic genes involved in the determination of flower identity in C. sativa. The heat map in Figure 3C shows the relative expression of each gene in the different tissues considered. Unsupervised hierarchical clustering of samples based on gene expression values revealed two clusters of samples with specific expression patterns for MADS box genes. Cluster 1 was almost exclusively composed of samples related to reproductive organs, including flower buds (stages 1–4), mature flowers (stages 1–4), and pre, early-, and mid-stage flowers from the Purple Kush genotype. Cluster 2 was composed exclusively of vegetative organs and tissues, including the roots, leaves, stems and petioles. Only one gene (Cannbio_057002) showed a different behavior from what was expected, being highly expressed in root organs. The fact that this MADS box did not clearly cluster with a specific group of homeotic genes in the phylogenetic tree ( Figure 3C ) and was not expressed in reproductive tissues allowed us to exclude a possible role in flower determination. Unfortunately, the RNA-seq data were limited to the flower buds and whole flowers at different developmental stages, making it difficult to appreciate the variation in expression among genes belonging to different homeotic classes and, thus, expressed in different whorls.

(A) Similarity-based neighbor-joining analysis performed using 21 amino acid sequences from the C. sativa (Cs) proteome (GCA_900626175.1) selected for their putative orthology ( Table 2 and, more specifically, Supplementary Table 2) with well-characterized ABCDE MADS box proteins belonging to Arabidopsis thaliana (At) and Vitis vinifera (Vvi). (B) Taking advantage of a recent in silico analysis of 31 RNA-seq datasets derived from different tissues of two different psychoactive strains (Finola and Purple Kush, NCBI SRA accession numbers: SRP006678 and SRP008673) of C. sativa (Massimino, 2017), a principal component analysis was performed using the expression values of the MADS box genes previously identified. The analysis is based on the ln(x+1) transformed (RPKM) values (reads per kilobase of transcript per million mapped reads) and showed a clear separation of samples related to reproductive organs from those related to vegetative organs. (C) Heat map showing the relative expression of each gene in the different tissues considered.

Table 2

Identification of ABCDE candidate genes in C. sativa.

Class from ABCDE model Cannabis sativa (GCA_900626175.1) best hitagainst V. vinifera and A. thalianaMADS-box TFs (BLASTp) Vitis vinifera (PN40024 v1 ID) Arabidopsis thaliana (Araport11) Transcripts ( <"type":"entrez-nucleotide","attrs":<"text":"GIFP00000000.1","term_id":"1789811449","term_text":"GIFP00000000.1">> GIFP00000000.1)corresponding to theGCA_900626175.1 proteins Correspondence between <"type":"entrez-nucleotide","attrs":<"text":"GIFP00000000.1","term_id":"1789811449","term_text":"GIFP00000000.1">> GIFP00000000.1 transcripts andSRP006678/SRP008673
Class A Cs_CAULIFLOWER_A X1 ( <"type":"entrez-protein","attrs":<"text":"XP_030481490","term_id":"1731739702","term_text":"XP_030481490">> XP_030481490),
Cs_CAULIFLOWER_A X2 ( <"type":"entrez-protein","attrs":<"text":"XP_030481491","term_id":"1731739704","term_text":"XP_030481491">> XP_030481491)
VviAP1 (VIT_01s0011g00100) At_AP1/AGL7 (AT1G69120) Cannbio_054734 PK21815.1
Cs_CAULIFLOWER_A-like_1 ( <"type":"entrez-protein","attrs":<"text":"XP_030485608","term_id":"1731663236","term_text":"XP_030485608">> XP_030485608),
Cs_CAULIFLOWER_A-like_2 ( <"type":"entrez-protein","attrs":<"text":"XP_030485101","term_id":"1731746536","term_text":"XP_030485101">> XP_030485101)
VviFUL1 (VIT_17s0000g04990) At_FUL/AGL8 (AT5G60910) Cannbio_008529,
PK19698.1, PK01844.1
Class B Cs_TM6 ( <"type":"entrez-protein","attrs":<"text":"XP_030499268","term_id":"1731696652","term_text":"XP_030499268">> XP_030499268) VviAP3a (VIT_18s0001g13460),
VviAP3b/VvTM6 (VIT_04s0023g02820)
At_AP3 (AT3G54340) Cannbio_014948 n.a.
Cs_MADS_2-like ( <"type":"entrez-protein","attrs":<"text":"XP_030484132","term_id":"1731744687","term_text":"XP_030484132">> XP_030484132),
Cs_MADS_2-like_X1 ( <"type":"entrez-protein","attrs":<"text":"XP_030482855","term_id":"1731742476","term_text":"XP_030482855">> XP_030482855),
Cs_MADS_2-like_X2 ( <"type":"entrez-protein","attrs":<"text":"XP_030482856","term_id":"1731742478","term_text":"XP_030482856">> XP_030482856)
VviPI/VvMADS9 (VIT_18s0001g0176) At_PI (AT5G20240) Cannbio_009872 PK22420.1
Class B-sister Cs_FBP24_X1 ( <"type":"entrez-protein","attrs":<"text":"XP_030484437","term_id":"1731745263","term_text":"XP_030484437">> XP_030484437),
Cs_FBP24_X2 ( <"type":"entrez-protein","attrs":<"text":"XP_030484436","term_id":"1731745261","term_text":"XP_030484436">> XP_030484436),
Cs_FBP24-like_X1 ( <"type":"entrez-protein","attrs":<"text":"XP_030490979","term_id":"1731680882","term_text":"XP_030490979">> XP_030490979)
VviABS1 (VIT_10s0042g00820),
VviABS2 (VIT_01s0011g01560),
VviABS3 VIT_02s0025g02350
At_TT16/ABS (AT5G23260) Cannbio_013942,
Class C Cs_AGAMOUS-like ( <"type":"entrez-protein","attrs":<"text":"XP_030480504","term_id":"1731737741","term_text":"XP_030480504">> XP_030480504),
Cs_MADS1 ( <"type":"entrez-protein","attrs":<"text":"XP_030481705","term_id":"1731740111","term_text":"XP_030481705">> XP_030481705)
VviAG1/MADS1 (VIT_12s0142g00360),
VviAG2 (VIT_10s0003g02070)
At_SHP1/AGL1 (AT3G58780),
At_AG (AT4G18960)
PK20142.1, PK03292.1
Cs_MADS3_1 ( <"type":"entrez-protein","attrs":<"text":"XP_030487367","term_id":"1731673942","term_text":"XP_030487367">> XP_030487367),
Cs_MADS3_2 ( <"type":"entrez-protein","attrs":<"text":"XP_030500965","term_id":"1731705403","term_text":"XP_030500965">> XP_030500965)
VviAGL6a/MADS3 (VIT_15s0048g01270),
VviAGL6b (VIT_16s0022g02330)
At_AGL6 (AT2G45650) Cannbio_062689,
PK14825.1, PK13658.1
Class D Cs_AGAMOUS-like ( <"type":"entrez-protein","attrs":<"text":"XP_030480504","term_id":"1731737741","term_text":"XP_030480504">> XP_030480504),
Cs_MADS1 ( <"type":"entrez-protein","attrs":<"text":"XP_030481705","term_id":"1731740111","term_text":"XP_030481705">> XP_030481705)
VviAG3/VvMADS5 (VIT_18s0041g01880) At_STK/AGL11 (AT4G09960) Cannbio_055846,
PK20142.1, PK03292.1
Class E Cs_MADS2_X1_1 ( <"type":"entrez-protein","attrs":<"text":"XP_030484352","term_id":"1731745102","term_text":"XP_030484352">> XP_030484352),
Cs_MADS2_X1_2 ( <"type":"entrez-protein","attrs":<"text":"XP_030492901","term_id":"1731684706","term_text":"XP_030492901">> XP_030492901),
Cs_MADS2_X2_1 ( <"type":"entrez-protein","attrs":<"text":"XP_030484353","term_id":"1731745104","term_text":"XP_030484353">> XP_030484353),
Cs_MADS2_X2_2 ( <"type":"entrez-protein","attrs":<"text":"XP_030492902","term_id":"1731684708","term_text":"XP_030492902">> XP_030492902),
Cs_MADS2_X2_3 ( <"type":"entrez-protein","attrs":<"text":"XP_030484350","term_id":"1731745100","term_text":"XP_030484350">> XP_030484350),
Cs_MADS4 ( <"type":"entrez-protein","attrs":<"text":"XP_030496177","term_id":"1731691009","term_text":"XP_030496177">> XP_030496177)
VviSEP1/VvMADS2 (VIT_14s0083g01050),
VviSEP2 (VIT_17s0000g05000),
VviSEP3/VviMADS4 (VIT_01s0010g03900),
VviSEP4 (VIT_01s0011g00110)
At_ SEP1/AGL2 (AT5G15800),
At_ SEP2/AGL4 (AT3G02310),
At_ SEP3/AGL9 (AT1G24260),
At_ SEP4/AGL3 (AT2G03710)
PK08909.1, PK19420.1

By means of a BLASTp alignment against the ABCDE proteins of V. vinifera and A. thaliana, the candidate ABCDE proteins of C. sativa were retrieved from the representative proteome (GCA_900626175.1). The corresponding transcripts were then searched through a tBLASTn approach, aligning the candidate ABCDE proteins against the cannabis transcriptome shotgun assembly ( <"type":"entrez-nucleotide","attrs":<"text":"GIFP00000000.1","term_id":"1789811449","term_text":"GIFP00000000.1">> GIFP00000000.1). Finally, to evaluate the expression levels of the putative ABCDE proteins in different tissues of two different cannabis strains (Finola and Purple Kush, SRP006678, and SRP008673, respectively), a BLASTn approach was applied, aligning the <"type":"entrez-nucleotide","attrs":<"text":"GIFP00000000.1","term_id":"1789811449","term_text":"GIFP00000000.1">> GIFP00000000.1 transcripts to the abovementioned RNA-seq experiments (SRP006678 and SRP008673).

4. An Overview of Conventional Schemes and a Glimpse Into Next-Generation Methods for Breeding Novel and Real F1 Hybrid Cannabis Cultivars

For many years, the development of new varieties of medical cannabis was not the exclusive preserve of breeders. Home growers who have acquired high-level skills and learned essential techniques of hybridization, selection, and cultivation have easily transitioned their activities from growing to breeding cannabis lineages. In recent decades, home growers have created most of the cannabis strains that have become popular in the market worldwide. Both medical (drug-type) and hemp (fiber-type) cultivars were traditionally developed for many years using mass selection. Cannabis varieties can then be easily preserved and multiplied via cuttings from individual plants that exhibit desirable traits matching a specific distinct phenotype. Propagation via cuttings is the main way to make prized varieties available as clones to maintain unaltered genotypes. When cannabis varieties are multiplied and commercialized through seeds, open-pollinated OP synthetics and F1 hybrids represent the only populations that can be reproduced sexually, giving rise to offspring characterized by morphological distinctiveness and uniformity, and genetic stability across generations. Cannabis is a dioecious (and anemophilous) species, with male and female plants exhibiting stamens and pistils in separate flowers. As a consequence, outcrossing through wind-mediated cross-pollination is the only natural reproduction system of Cannabis spp. The genetic structure of both natural populations and experimental breeds obtained via mass selection can usually be composed of a combination of highly heterozygous genotypes that share a common gene pool. Selfing is also possible and can be accomplished by artificially generating monoecious plants with unisexual flowers (i.e., reversing the sex of flowers from female to male on some branches) to induce self-pollination. Attempts were made to transform the reproductive organs of cannabis using irradiation (Nigam et al., 1981a) and streptovaricin (Nigam et al., 1981b) but the results were impractical. The successful use of other strategies, such as the feminization of male plants using ethephon (Mohan Ram and Sett, 1982b) and the masculinization of female plants with silver thiosulfate (Mohan Ram and Sett, 1982a), enabled to revolutionize breeding programs in cannabis. This latter treatment, in particular, is still largely used since thiosulfate inhibits the production of ethylene, a plant hormone that promotes the formation of female flowers. On the treated branches, the newly induced male flowers can develop anthers with viable pollen, while the other untreated branches of the plant will continue to grow female flowers. The female plants whose pistils are self-pollinated and their egg cells (X) fertilized by genetically female pollen (X) will give rise to a completely female progeny (XX). This method, exploitable for the multiplication of female plants by seeds, can be commercially more convenient than the female propagation by cuttings.

Nevertheless, sexual reproduction can originate segregating populations, genetically unstable and characterized by phenotypic variability, negative features that are not shown by clones. The only way to successfully use seeds of cannabis varieties is the one based on the development of true F1 hybrids by crossing genetically divergent but individually uniform parental inbreds.

In addition to this strategy for selfing, the production of highly homozygous genotypes can be achieved from full-sibling crosses performed by hand between sister-brother individuals that belong to the same progeny and share the same two parental lines.

Cannabis (sinsemilla) varieties were largely developed by crossing single male and female individuals belonging to genetically distinct landraces to create a pseudo-F1 hybrid. The genetic stability and uniformity of any new cultivar bred in this way can only be preserved as an individual clone through vegetative propagation through cuttings. To breed true F1 hybrid varieties, inbred lines stemmed via repeated selfing and/or full-siblings for some cycles can be used as parental stocks for the production of highly heterozygous hybrids through two-way crossing to exploit the effects of heterosis ( Figure 4 ). Heterosis refers to the phenomenon in which F1 progeny obtained by mating two genetically divergent and antagonist inbred lines exhibit greater biomass, rate of development, and fertility than the two homozygous parents. This biological phenomenon has been extensively exploited for the development of crop varieties in several species and has been important for the development of modern fiber (hemp) cultivars but is still largely unexplored or undocumented in recreational (drug) cultivars. Since heterosis often results from the complementation in the hybrid of different deleterious (recessive) alleles that were present in one parental genotype by superior (dominant) alleles from the opposite parental genotype, the development of F1 hybrids usually requires progeny tests for estimating the specific combining ability (SCA) of selected inbred lines in all possible pairwise cross-combinations (diallel design). This method not only requires the selection of individual breeding parents (single female and/or male plants) but also requires that some of the progeny plants are asexually propagated via cuttings to perform laboratory analyses and field trials. In particular, in each generation, the selection of the most appropriate plants from either selfing or full-siblings is based on agronomic, genomic, and metabolomic investigations to choose the best individuals in terms of agronomic performance, molecular genotypes, and biochemical profiles. Selected individuals should also be used to perform parallel progeny tests aimed at determining their SCA based on F1 hybrid evaluation. A key step for large-scale seed production is the use of an inbred female plant (XX) as the clonal seed parent line and another genetically divergent but complementary inbred female plant (XX) that has been masculinized as the clonal pollen parent. Thus, 100% of the F1 hybrid seeds will be female (XX): all-female seeds are produced by cross-pollination, but all-female plants are characterized by the same highly heterozygous and vigorous genotype. The same strategy can be exploited for breeding F1 varieties through two-way, three-way, or four-way hybrids using two, three, or four inbred lines derived from as many parental materials/landraces ( Figure 4 ) through intrasubspecific and intersubspecific hybridization. In fact, in addition to pure “indica” and “sativa” varieties, hybrid varieties with varying ratios of their genomes are common. For instance, among the most famous varieties worldwide, the “White Widow” exhibits approximately 60% “indica” and 40% “sativa” ancestry, and its plants exhibit traits from both parental biotypes. Nevertheless, the choice of the initial cross depends on the targeted cannabis market (fiber vs. drug utilization genotype and tetrahydrocannabinol/cannabidiol ratio), as some varieties are bred mostly as medicinal cannabis, and others are instead highly appreciated as recreational cannabis. Breeding for fiber production includes both monoecious and dioecious cultivars showing a high percentage of primary fibers, fast-retting phenotypes, and distinctive morphological descriptors in low-THC plants. Breeding for the production of cannabinoids comprises THC-predominant or cannabidiol (CBD)-predominant cultivars. It is worth mentioning that a limited number of cultivars have been specifically bred for seed production (Grassi and McPartland, 2017). Considering the relevance of genomics and metabolomics in the development of next-generation cannabis varieties, modern breeding methods must be based on the application of multidisciplinary skills and tools to assist professional agronomists in the evaluation or prediction, and early selection of plants with the highest potential in terms of molecular genotypes and biochemical profiles. Cannabinoids of breeding stocks can be assayed according to either quantity (i.e., percentage of cannabinoids in harvested material) or quality (i.e., THC/CBD ratio or chemotype). The quality of cannabinoids is strongly dependent on the genotype, whereas cannabinoid quantity is affected by agronomic practices, environmental conditions, and genotype x environment interactions.

Breeding methods for the development of commercial F1 hybrid cultivars: two-way (A), three-way (B) and four-way (C) F1 hybrids with inbreeding progression in case of selfing and full-sibling crosses (D) and large-scale hybridization and F1 female-seed production (E).

5. Advances in Cannabis Genomics

Since the advent of genomics applied to crop plant species, breeding methods conventionally used for the development of new varieties were rearranged and readapted, as for many traits selection can be assisted by molecular markers. In particular, both simple- and multiple-locus genotyping approaches proved their utility for improving the overall genetic stability and uniformity of cultivated populations as well as for pyramiding specific genes that control resistance or tolerance to both biotic and abiotic stresses. In addition to large panels of molecular markers useful for genotyping purposes, several next-generation platforms for genome sequencing and new biotechnological techniques for gene editing are nowadays available in many crop plant species. These molecular tools allow scientists to better characterize and estimate the breeding value of plant individuals and populations using lab analyses, materials which are then used by breeders for field trials to select the superior and ideal phenotypes showing distinctiveness, uniformity, and stability.

The use of genomics in cannabis has its roots around 25 years ago with the use of dominant markers such as RFLP, RAPD, and AFLP markers (Gillan et al., 1995; Faeti et al., 1996; Jagadish et al., 1996; Forapani et al., 2001; Datwyler and Weiblen, 2006) to assess the genetic relatedness of species, varieties, and even individuals. Later on, microsatellite or SSR markers were shown to be more informative, reliable and reproducible than dominant markers for cannabis genotyping (Alghanim and Almirall, 2003; Gilmore et al., 2003; Hsieh et al., 2003). Specific marker alleles/variants were also identified as predictive and capable of discriminating hemp from marijuana (Mendoza et al., 2009). Among the most relevant microsatellite-based studies conducted on cannabis, two relatively recent researches deserve to be mentioned. In the first one, a panel of 13 SSR markers was used to test over 1,300 samples of fiber cannabis and marijuana, together with accessions from local police seizure (Dufresnes et al., 2017). In the same year, Soler et al. (Soler et al., 2017) characterized the genetic structures of 154 individuals belonging to 20 cultivars of C. sativa subsp. indica and 2 cultivars of C. sativa subsp. sativa using a set of 6 SSR markers. However, despite the number of studies conducted using dominant markers and codominant microsatellites, only Soler et al. (2017) opened to the concrete possibility of using these molecular tools for breeding goals, including the improvement and development of new varieties. Most of the studies were instead focused on germplasm management, genetic discrimination of varieties and forensic applications (e.g., drug vs. non-drug types identification).

While any marker-assisted breeding strategy in cannabis is still far to be explored, marker-assisted selection has already been successfully used. One of the main achievements that contributed the most to the shift from traditional to molecular breeding in cannabis, is the release of the first two genomes of C. sativa in 2011 (van Bakel et al., 2011). Since then, many studies focused on bioinformatic analyses of these genomes to mine molecular markers tightly linked to expressed genes (Gao et al., 2014) and hence useful for cannabis marker-assisted characterization and selection studies. The availability of sequenced genomes also allowed the identification and exploitation of thousands of SNP variants, which together with Genotyping-by-Sequencing (GBS) approaches, enabled the analysis of the genetic diversity of several cannabis accessions belonging to hemp and medical/recreational varieties. The use of GBS in Cannabis spp. has been recently described by Soorni et al. (2017), which analyzed 98 samples from two Iran germplasm collections, obtaining over 24 thousand highly informative SNPs. Also, in this case, SNP markers proved to be useful not only to classify samples belonging to different cannabis varieties but also to identify polymorphisms associated with genes belonging to the cannabinoid pathway, like THCAS and CBDAS (delta-9-tetrahydrocannabinolic acid synthase and cannabidiolic acid synthase, respectively) (van Bakel et al., 2011; Onofri et al., 2015; Weiblen et al., 2015; McKernan et al., 2020). These markers could be extremely useful in breeding programs aimed at developing new cannabis varieties for fiber production (drug-free) or medical/recreational use. Using this approach, Laverty et al. (2019) developed a physical and genetic map of C. sativa focusing their attention on those genes involved in the cannabinoid synthase. In particular, authors coupled the genomes of Purple Kush and Finola varieties (van Bakel et al., 2011) to the Pacific Biosciences (PacBio) long-read single-molecule real-time (SMRT) sequencing and Hi-C technology to generate a combined genetic and physical maps of cannabis. This provided new insights on the chromosome arrangement and the cannabinoid biosynthetic genes. Another milestone from the Laverty et al. (2019) study is the identification of an important gene involved in the biosynthesis of cannabichromene, a cannabinoid with a weak activity on the CB1 and CB2 receptors (involved in the neural and psychoactive effect of THC and CBD) that could be possibly used in medical therapies against pain and gastro-inflammatory diseases (Maione et al., 2011; Izzo et al., 2012; Shinjyo and Di Marzo, 2013).

More recently, based on the latest knowledge acquired on cannabis genomics, Henry et al. (2020) described the efficiency of a screening method based on KASP (Kompetitive Allele Specific PCR) technique for the identification of 22 highly informative SNPs involved in the biosynthetic pathway of cannabinoids and terpenes (important compounds for the recreational and medical cannabis industries).

It must be recognized that the increased knowledge on the most relevant cannabis biosynthetic pathways has been possible thanks to the continuous refinement of available genomes together with the public delivery of new ones. Recently, McKernan et al. (2020) sequenced and annotated 42 Cannabis genomes identifying SNPs useful for molecular breeding related not only to the cannabinoid synthesis but also to pathogen resistances. This could help in the production of medical/recreational cannabis without the risk of mildew contaminants that could be dangerous for consumers. In parallel, Gao et al. (2020) assembled a new genome of C. sativa deriving from wild samples collected in Tibet using a combination of PacBio and Hi-C technologies. Despite all these efforts, an exhaustive meta-analysis of all the cannabis genomics data published so far (Kovalchuk et al., 2020) demonstrated that the currently available cannabis genome assemblies are: i) incomplete, with approximately 10% missing, 10–25% unmapped, and centromeres and satellite sequences unrepresented; ii) ordered at a low resolution and only partially annotated for what concerns genes, partial genes, and pseudogenes. Wrapping up if, on one hand, the enormous interest raised by specific metabolic compounds (e.g., THC) has boosted the achievement of high levels of knowledge for specific biosynthetic pathways, on the other hand, the use of molecular markers for breeding new varieties is still in its embryonic phase and undoubtedly deserves further investigation to develop efficient tools transferable among laboratories. Considering the availability of a remarkable number of sequenced cannabis genomes, the starting point could be the development and implementation of an informative and representative panel of polymorphic SSR marker loci scattered throughout the genome for standardized multilocus genotyping purposes.

6. Characterization of Microsatellites in the Cannabis Genome and In Silico Construction of Multilocus Panels for Marker-Assisted Breeding

Cannabis genome is diploid (2n = 2x = 20) and its haploid nuclear genome size is estimated to be 818 Mbp for females (karyotype XX) and 843 Mbp for males (karyotype XY) (Sakamoto et al., 1998). The C. sativa plastid and mitochondrial genomes are 153,871 bp (Vergara et al., 2016b) and 415,545 bp (White et al., 2016), respectively.

Among the 12 cannabis genomes available in GenBank, 5 were assembled at the chromosome level, while the remaining ones are considered drafts at the contig (6) or scaffold (1) assembly level. The C. sativa cs10 genome (BioProject ID: PRJNA560384), which is the most recent, the best-assembled and, thus, considered the representative genome of this species, was chosen for microsatellites or simple sequence repeat (SSR) searches using MISA (MIcro SAtellites Identification Tool) (Thiel et al., 2003). The parameters were set as follows: minimum of 15 repetitions for mononucleotide motifs, 8 for dinucleotides, 5 for trinucleotides, and 4 for tetra-, penta-, and hexanucleotides.

A total of 126,593 perfect and 12,017 compound SSR regions were identified, with a density equal to 148 SSRs/Mbp (0.34% of the total length of the genome). This value is slightly higher but still comparable with those found for 15 other plant genomes, including Solanum melongena, Capsicum annuum, Nicotiana tabacum, Petunia axillaris, and Coffea canephora by Portis et al. (2018), which ranged from 60 to 140 SSRs/Mbp according to the same search parameters for SSRs (Portis et al., 2018).

Most of the SSR sequences detected in C. sativa exhibited a length between 15 and 19 nucleotides (60.1%), 26.5% of the sequences were 20–29 nucleotides long, 5.4% presented a length of 30–39 nucleotides and the remaining 8% were more than 40 nucleotides in length. The motif category responsible for the longest microsatellites was the dinucleotides, for which 16.7% of the sequences showed >20 repetitions and, hence, were more than 40 nucleotides long (Supplementary Figure 1).

A second and more stringent SSR analysis was performed to identify sites suitable for genotyping analysis; longer and, putatively, more polymorphic sites were searched, increasing the stringency of the parameters to a minimum of 20 repetitions for mononucleotides, 15 for dinucleotides, 10 for trinucleotides, and 7 for tetra-, penta-, and hexanucleotides (Supplementary Table 3). The resulting 23,900 sequences were scored with a density of 28.2 SSRs/Mbp, with a total length equal to 0.13% of the genomic sequence. The most abundant motifs identified were the dinucleotide and the trinucleotide motifs, accounting for 55.3 and 23.9% of the total length of the SSR sequences, respectively ( Figure 5 ), followed by mononucleotides motifs (18.4%), while the remaining tetra-, penta-, and hexanucleotide motifs accounted for only 2.2% of the total length (with 0.6, 0.3, and 1.3% richness, respectively).

Information on SSR regions. (A) Abundance of the main repeat types (% base pairs among the total base pairs of the motifs) of SSR sites in the Cannabis cs10 genome. (B) Abundance of the motifs at the total SSR sites.

The most abundant type of SSR repeat was A/T for mononucleotides (the only type of this motif), AG/CT for dinucleotides (88.7% of the total length of this motif category), and AAT/ATT for trinucleotides (84.4%). Figure 5 illustrates the richness of all the main repeat types among the motifs (A) and the relative motif richness in the cannabis genome (B).

To develop a panel of SSR loci that are exploitable for marker-assisted breeding (MAB) purposes, several microsatellites were selected within each linkage group to cover the entire genome at a density equal to or greater than one SSR every 5 Mb. The selection was performed taking into consideration chromosomal position, nucleotide length, and repetitive motifs. SSR-specific primer pairs were designed using the Geneious plug-in Primer3 (Untergasser et al., 2012) following the same criteria described by Palumbo et al. (Palumbo et al., 2019a) and using the same parameters for all genomic loci to make multiplex PCR assays possible.

The panel of markers was also developed considering i) their presence in a single copy to avoid nonspecific PCR products and ii) their polymorphic nature through an in silico comparison of cs10 with two additional genomes (Finola SAMN02981385 and Purple Kush SAMN09375800). A total of 41 SSR primer pairs were designed, with an average of four per chromosome ( Figure 6 and Supplementary Table 4 for details on chromosome accessions). Further detailed information about the selected loci is reported in Supplementary Table 5.

Individual linkage groups in the Cannabis genome (n = x = 10) with the physical position and genetic information of the selected SSR markers. Basic information on intergene and intragene sites, including intron/exon positions of SSR markers, and their corresponding physically linked genes are also reported (marker loci found to be polymorphic among all the three explored genomes are marked with an asterisk).

7. General Perspectives and Conclusions

The topic of cannabis has always aroused controversy in debates within different areas, from the ideological and political one to those more scientific of pharmacology and applied therapeutics, and even in the botanical taxonomy (Russo, 2019). Regarding the taxonomic dispute about the speciation of cannabis or lack thereof, it is unlikely to be solved because all cannabis types (whether they are considered species, subspecies, or botanical varieties) are capable of undergoing cross-hybridization and producing fertile progeny. This is intensified by the increasing number of cannabis varieties sold through the black market, along with the parallel development of legal, registered, and patented materials. Therefore, considering that morphological traits such as leaflet width and plant height do not allow a clear-cut varietal classification, biochemical profiles remain, so far, the most reliable key to characterize cannabis cultivars. In other terms, it is possible to identify cannabis types as chemical varieties (Russo, 2019). Nevertheless, these characteristics are not easy to assess analytically or stably across different environments and/or cultivation systems. Conversely, molecular markers are easy to detect and are not influenced by external factors, so they can be profitably adopted and exploited for the identification and/or authentication of Cannabis biotypes as molecular cultivars, including multilocus genotypes or fingerprints. Additionally, the classification of Cannabis through approaches involving both chloroplast DNA barcoding based on the standard genes matK and rbcL and nuclear DNA haplotyping based on the ITS1 and ITS2 regions makes the scenario as complicated as expected. As reported in this study ( Table 1 ), the number of nuclear sequences attributed to the indica and ruderalis taxa is very low, and sequences for the chloroplast genes are lacking. Moreover, the nucleotide variation found for nuclear ITS regions within each subspecies was lower than that calculated between taxonomic units, probably due to the continuous hybridization/introgression this species has undergone over time. Overall, our findings support the conclusions proposed by McPartland (McPartland, 2018), for which the Cannabis genus should be preferably divided into botanical varieties rather than into subspecies. Additional investigations using chloroplast DNA barcodes are needed to verify whether it is possible to detect polymorphisms or haplotypes that are useful for the authentication of cannabis taxonomies for plant varieties and their derivatives.

After several years of accelerated clandestine cultivation improvements and home-developed breeding programs, modern lines and varieties now yield dried inflorescence material that displays over 30% THC acid (THCA) by dry weight (Swift et al., 2013; Lynch et al., 2017). However, tetrahydrocannabinol is not the only cannabinoid available in high concentrations. Cultivars with considerable amounts of cannabidiolic acid (CBDA) are frequently exploited in some hashish-based products (Rustichelli et al., 1996; Hanuš et al., 2016) and are currently highly demanded spasms treatments (Devinsky et al., 2014). However, CBD and THC display contrasting neurological effects (Lynch et al., 2017). Being a non-competitive CB1/CB2 receptor antagonist (Pertwee, 2008), CBD does not own any psychoactive effect, differently from THC, whose role as a partial agonist of the two abovementioned receptors is well known.