Posted on

breeding cannabis to find that 1 in a million seed

Genetic tools weed out misconceptions of strain reliability in Cannabis sativa: implications for a budding industry

Unlike other plants, Cannabis sativa is excluded from regulation by the United States Department of Agriculture (USDA). Distinctive Cannabis varieties are ostracized from registration and therefore nearly impossible to verify. As Cannabis has become legal for medical and recreational consumption in many states, consumers have been exposed to a wave of novel Cannabis products with many distinctive names. Despite more than 2000 named strains being available to consumers, questions about the consistency of commercially available strains have not been investigated through scientific methodologies. As Cannabis legalization and consumption increases, the need to provide consumers with consistent products becomes more pressing. In this research, we examined commercially available, drug-type Cannabis strains using genetic methods to determine if the commonly referenced distinctions are supported and if samples with the same strain name are consistent when obtained from different facilities.


We developed ten de-novo microsatellite markers using the “Purple Kush” genome to investigate potential genetic variation within 30 strains obtained from dispensaries in three states. Samples were examined to determine if there is any genetic distinction separating the commonly referenced Sativa, Indica and Hybrid types and if there is consistent genetic identity found within strain accessions obtained from different facilities.


Although there was strong statistical support dividing the samples into two genetic groups, the groups did not correspond to commonly reported Sativa/Hybrid/Indica types. The analyses revealed genetic inconsistencies within strains, with most strains containing at least one genetic outlier. However, after the removal of obvious outliers, many strains showed considerable genetic stability.


We failed to find clear genetic support for commonly referenced Sativa, Indica and Hybrid types as described in online databases. Significant genetic differences within samples of the same strain were observed indicating that consumers could be provided inconsistent products. These differences have the potential to lead to phenotypic differences and unexpected effects, which could be surprising for the recreational user, but have more serious implications for patients relying on strains that alleviate specific medical symptoms.


Cultivation of Cannabis sativa L. dates back thousands of years (Abel 2013) but has been largely illegal worldwide for the best part of the last century. The U.S. Drug Enforcement Agency considers Cannabis a Schedule I drug with no “accepted medical use in treatment in the United States” (United States Congress n.d.), but laws allowing Cannabis for use as hemp, medicine, and some adult recreational use are emerging (ProCon 2018). Global restrictions have limited Cannabis related research, and there are relatively few genetic studies focused on strains (Lynch et al. 2016; Soler et al. 2017), but studies with multiple accessions of a particular strain show variation (Lynch et al. 2016; Soler et al. 2017; Sawler et al. 2015).

Currently, the Cannabis industry has no way to verify strains. Consequently, suppliers are unable to provide confirmation of strains, and consumers have to trust the printed name on a label matches the product inside the package. Reports of inconsistencies, along with the history of underground trading and growing in the absence of a verification system, reinforce the likelihood that strain names may be unreliable identifiers for Cannabis products at the present time. Without verification systems in place, there is the potential for misidentification and mislabeling of plants, creating names for plants of unknown origin, and even re-naming or re-labeling plants with prominent names for better sale. Cannabis taxonomy is complex (Emboden 1974; Schultes et al. 1974; Hillig 2005; Russo 2007; Clarke and Merlin 2013; Clarke et al. 2015; Clarke and Merlin 2016; Small et al. 1976; Small 2015a), but given the success of using genetic markers, such as microsatellites, to determine varieties in other crops, we suggest that similar genetic based approaches should be used to identify Cannabis strains in medical and recreational marketplaces.

There are an estimated

3.5 million medical marijuana patients in the United States (U.S.) (Leafly 2018b) and various levels of recent legalization in many states has led to a surge of new strains (Leafly 2018a; Wikileaf 2018). Breeders are producing new Cannabis strains with novel chemical profiles resulting in various psychotropic effects and relief for an array of symptoms associated with medical conditions including (but not limited to): glaucoma (Tomida et al. 2004), Chron’s Disease (Naftali et al. 2013), epilepsy (U.S. Food and Drug Administration 2018; Maa and Figi 2014), chronic pain, depression, anxiety, PTSD, autism, and fibromyalgia (Naftali et al. 2013; Cousijn et al. 2018; Ogborne et al. 2000; Borgelt et al. 2013; ProCon 2016).

There are primarily two Cannabis usage groups, which are well supported by genetic analyses (Lynch et al. 2016; Soler et al. 2017; Sawler et al. 2015; Dufresnes et al. 2017): hemp defined by a limit of < 0.3% Δ 9 -tetrahydrocannabinol (THC) in the U.S., and marijuana or drug-types with moderate to high THC concentrations (always > 0.3% THC). Within the two major groups Cannabis has been further divided into strains (varietals) in the commercial marketplace, and particularly for the drug types, strains are assigned to one of three categories: Sativa which reportedly has uplifting and more psychotropic effects, Indica which reportedly has more relaxing and sedative effects, and Hybrid which is the result of breeding Sativa and Indica types resulting in intermediate effects. The colloquial terms Sativa, Hybrid, and Indica are used throughout this document even though these terms do not align with the current formal botanical taxonomy for Cannabis sativa and proposed Cannabis indica (McPartland 2017; Piomelli and Russo 2016). We feel the colloquial terminology is necessary here as the approach for this study was from a consumer view, and these are the terms offered as common descriptors for the general public (Leafly 2018a; Wikileaf 2018; 2018; NCSM 2018; 2018; Seedfinder 2018). Genetic analyses have not provided a clear consensus for higher taxonomic distinction among these commonly described Cannabis types (Lynch et al. 2016; Sawler et al. 2015), and whether there is a verifiable difference between Sativa and Indica type strains is debated (McPartland 2017; Piomelli and Russo 2016; Erkelens and Hazekamp 2014). However, both the recreational and medical Cannabis communities claim there are distinct differences in effects between Sativa and Indica type strains (Leafly 2018a; Wikileaf 2018; 2018; NCSM 2018; 2018; Seedfinder 2018; Leaf Science 2016; Smith 2012).

Female Cannabis plants are selected based on desirable characters (mother plants) and are produced through cloning and, in some cases, self-fertilization to produce seeds (Green 2005). Cloning allows Cannabis growers to replicate plants, ideally producing consistent products. There are an overwhelming number of Cannabis strains that vary widely in appearance, taste, smell and psychotropic effects (Leafly 2018a; Wikileaf 2018; 2018; NCSM 2018; 2018; Seedfinder 2018). Online databases such as Leafly (2018a) and Wikileaf (2018), for example, provide consumers with information about strains but lack scientific merit for the Cannabis industry to regulate the consistency of strains. Other databases exist ( 2018; NCSM 2018; 2018; Seedfinder 2018), but the method of assignment to the three groups is often undisclosed, confounded, or mysterious. Wikileaf reports a numeric percentage of assignment to Sativa and/or Indica (Wikileaf 2018), which is why we chose it as our reference scale of ancestry, although there is some disagreement among online sources (Additional file 1: Table S1). To our knowledge, there have not been any published scientific studies specifically investigating the genetic consistency of strains at multiple points of sale for Cannabis consumers.

Breeders and growers choose Cannabis plants with desirable characters (phenotype) related to flowers, cannabinoid profile, and terpene production. Phenotype is a product of genotype and environment. Cannabis is considerably variable and extraordinarily plastic in response to varying environmental conditions (Onofri and Mandolino 2017). Therefore, determining sources of variation, at the most basic level, requires examining genetic differences. Strains propagated through cloning should have minimal genetic variation. Eight of the strains examined in this study are reportedly clone only strains indicating there should be little to no genetic variation within these strains. That being said, it is possible for mutations to accumulate over multiple generations of cloning (Gabriel et al. 1993; Hojsgaard and Horandl 2015), but these should not be widespread. Self-fertilization and subsequent seed production may also be used to grow a particular strain. With most commercial plant products growers go through multiple generations of self-fertilization and backcrossing to remove genetic variability within a strain and provide a consistent product (Riggs 1988). However, for many Cannabis strains, the extent of genetic variability stabilization is uncertain. It has been observed that novel Cannabis strains developed through crossing are often phenotypically variable (Green 2005), which could be the result of seed producers growing seeds that are not stabilized enough to produce a consistent phenotype. Soler et al. (2017) examined the genetic diversity and structure of Cannabis cultivars grown from seed and found considerable variation, suggesting that seed lots are not consistent. Given the uncertainties surrounding named Cannabis strains, genetic data provides an ideal path to examine how widespread genetic inconsistencies might be.

In the U.S., protection against commercial exploitation, trademarking, and recognition of intellectual property for developers of new plant cultivars is provided through the United States Department of Agriculture (USDA) and The Plant Variety Protection Act of 1970 (United States Department of Agriculture 1970). Traditionally, morphological characters were used to define new varieties in crops such as grapes (Vitis vinifera L.), olives (Olea europea L.) and apples (Malus domestica Borkh.). With the rapid development of new varieties in these types of crops, morphological characters have become increasingly difficult to distinguish. Currently, quantitative and/or molecular characters are often used to demonstrate uniqueness among varieties. Microsatellite genotyping enables growers and breeders of new cultivars to demonstrate uniqueness through variable genetic profiles (Rongwen et al. 1995). Microsatellite genotyping has been used to distinguish cultivars and hybrid varieties of multiple crop varietals within species (Rongwen et al. 1995; Guilford et al. 1997; Hokanson et al. 1998; Cipriani et al. 2002; Belaj et al. 2004; Sarri et al. 2006; Baldoni et al. 2009; Stajner et al. 2011; Costantini et al. 2005; Pellerone et al. 2001; Poljuha et al. 2008; Muzzalupo et al. 2009). Generally, 3–12 microsatellite loci are sufficient to accurately identify varietals and detect misidentified individuals (Cipriani et al. 2002; Belaj et al. 2004; Sarri et al. 2006; Baldoni et al. 2009; Poljuha et al. 2008; Muzzalupo et al. 2009). Cannabis varieties however, are not afforded any legal protections, as the USDA considers it an “ineligible commodity” (United States Department of Agriculture 2014) but genetic variety identification systems provide a model by which Cannabis strains could be developed, identified, registered, and protected.

We used a well-established genetic technique to compare commercially available C. sativa strains to determine if products with the same name purchased from different sources have genetic congruence. This study is highly unique in that we approached sample acquisition as a common retail consumer by purchasing flower samples from dispensaries based on what was available at the time of purchase. All strains were purchased as-is, with no additional information provided by the facility, other than the identifying label. This study aimed to determine if: (1) any genetic distinction separates the common perception of Sativa, Indica and Hybrid types; (2) consistent genetic identity is found within a variety of different strain accessions obtained from different facilities; (3) there is evidence of misidentification or mislabeling.


Genetic material

Cannabis samples for 30 strains were acquired from 20 dispensaries or donors in three states (Table 1). All samples used in this study were obtained legally from either retail (Colorado and Washington), medical (California) dispensaries, or as a donation from legally obtained samples (Greeley 1). DNA was extracted using a modified CTAB extraction protocol (Doyle 1987) with 0.035–0.100 g of dried flower tissue per extraction. Several databases exist with various descriptive Sativa and Indica assignments for thousands of strains (Additional file 1: Table S1). For this study proportions of Sativa and Indica phenotypes from Wikileaf (2018) were used. Analyses were performed on the full 122-sample dataset (Table 1). The 30 strains were assigned a proportion of Sativa according to online information (Table 2). Twelve of the 30 strains were designated as ‘popular’ due to higher availability among the dispensaries as well as online information reporting the most popular strains (Table 2) (Rahn 2015; Rahn 2016; Rahn et al. 2016; Escondido 2014). Results from popular strains are highlighted to show levels of variation in strains that are more widely available or that are in higher demand.

Microsatellite development

The Cannabis draft genome from “Purple Kush” (GenBank accession AGQN00000000.1) was scanned for microsatellite repeat regions using MSATCOMMANDER-1.0.8-beta (Faircloth 2008). Primers were developed de-novo flanking microsatellites with 3–6 nucleotide repeat units (Additional file 1: Table S2). Seven of the microsatellites had trinucleotide motifs, two had hexanucleotide motifs, and one had a tetranucleotide motif (Additional file 1: Table S2). One primer in each pair was tagged with a 5′ universal sequence (M13 or T7) so that a matching sequence with a fluorochrome tag could be incorporated via PCR (Schwabe et al. 2015). Ten primer pairs produced consistent peaks within the predicted size range and were used for the genetic analyses herein (Additional file 1: Table S2).

PCR and data scoring

Microsatellite loci (Additional file 1: Table S2) were amplified in 12 μL reactions using 1.0 μL DNA (10–20 ng/ μL), 0.6 μL fluorescent tag (5 μM; FAM, VIC, or PET), 0.6 μL non-tagged primer (5 μM), 0.6 μL tagged primer (0.5 μM), 0.7 μL dNTP mix (2.5 mM), 2.4 μL GoTaq Flexi Buffer (Promega, Madison, WI, USA), 0.06 μL GoFlexi taq polymerase (Promega), 0.06 μL BSA (Bovine Serum Albumin 100X), 0.5–6.0 μL MgCl or MgSO4, and 0.48–4.98 μL dH2O. An initial 5 min denaturing step was followed by thirty five amplification cycles with a 1 min denaturing at 95 °C, 1 min annealing at primer-specific temperatures and 1 min extension at 72 °C. Two multiplexes (Additional file 1: Table S2) based on fragment size and fluorescent tag were assembled and 2 μL of each PCR product were combined into multiplexes up to a total volume of 10 μL. From the multiplexed product, 2 μL was added to Hi-Di formamide and LIZ 500 size standard (Applied Biosystems, Foster City, CA, USA) for electrophoresis on a 3730 Genetic Analyzer (Applied Biosystems) at the Arizona State University DNA Lab. Fragments were sized using GENEIOUS 8.1.8 (Biomatters Ltd).

Genetic statistical analyses

GENALEX ver. 6.4.1 (Peakall and Smouse 2006; Peakall and Smouse 2012) was used to calculate deviation from Hardy–Weinberg equilibrium (HWE) and number of alleles for each locus (Additional file 1: Table S2). Linkage disequilibrium was tested using GENEPOP ver. 4.0.10 (Raymond and Rousset 1995; Rousset 2008). Presence of null alleles was assessed using MICRO-CHECKER (Van Oosterhout et al. 2004). Genotypes were analyzed using the Bayesian cluster analysis program STRUCTURE ver. 2.4.2 (Pritchard et al. 2000). Burn-in and run-lengths of 50,000 generations were used with ten independent replicates for each STRUCTURE analysis. STRUCTURE HARVESTER (Earl and vonHoldt 2012) was used to determine the K value to best describe the likely number of genetic groups for the data set. GENALEX produced a Principal Coordinate Analysis (PCoA) to examine variation in the dataset. Lynch & Ritland (1999) mean pairwise relatedness (r) statistics were calculated between all 122 samples resulting in 7381 pairwise r-values showing degrees of relatedness. For all strains the r-mean and standard deviation (SD) was calculated averaging among all samples. Obvious outliers were determined by calculating the lowest r-mean and iteratively removing those samples to determine the relatedness among the remaining samples in the subset. A graph was generated for 12 popular strains (Table 2) to show how the r-mean value change within a strain when outliers were removed.


The microsatellite analyses show genetic inconsistencies in Cannabis strains acquired from different facilities. While popular strains were widely available, some strains were found only at two dispensaries (Table 1). Since the aim of the research was not to identify specific locations where strain inconsistencies were found, dispensaries are coded to protect the identity of businesses.

There was no evidence of linkage-disequilibrium when all samples were treated as a single population. All loci deviate significantly from HWE, and all but one locus was monomorphic in at least two strains. All but one locus had excess homozygosity and therefore possibly null alleles. Given the inbred nature and extensive hybridization of Cannabis, deviations from neutral expectations are not surprising, and the lack of linkage-disequilibrium indicates that the markers are spanning multiple regions of the genome. The number of alleles ranged from 5 to 10 across the ten loci (Additional file 1: Table S2). There was no evidence of null alleles due to scoring errors.

STRUCTURE HARVESTER calculated high support (∆K = 146.56) for two genetic groups, K = 2 (Additional file 2: Figure S1). STRUCTURE assignment is shown in Fig. 1 with the strains ordered by the purported proportions of Sativa phenotype (Wikileaf 2018). A clear genetic distinction between Sativa and Indica types would assign 100% Sativa strains (“Durban Poison”) to one genotype and assign 100% Indica strains (“Purple Kush”) to the other genotype (Table 2, Fig. 1, Additional file 3: Figure S2). Division into two genetic groups does not support the commonly described Sativa and Indica phenotypes. “Durban Poison” and “Purple Kush” follow what we would expect if there was support for the Sativa/Indica division. Seven of nine “Durban Poison” (100% Sativa) samples had 96% assignment to genotype 1, and three of four “Purple Kush” (100% Indica) had 89% assignment to genotype 2 (Fig. 1, Additional file 3: Figure S2). However, samples of “Hawaiian” (90% Sativa) and “Grape Ape” (100% Indica) do not show consistent patterns of predominant assignment to genotype 1 or 2. Interestingly, two predominantly Sativa strains “Durban Poison” (100% Sativa) and “Sour Diesel” (90% Sativa) have 86 and 14% average assignment to genotype 1, respectively. Hybrid strains such as “Blue Dream” and “Tahoe OG” (50% Sativa) should result in some proportion of shared ancestry, with assignment to both genotype 1 and 2. Eight of nine samples of “Blue Dream” show > 80% assignment to genotype 1, and three of four samples of “Tahoe OG” show < 7% assignment to genotype 1.

Bar plot graphs generated from STRUCTURE analysis for 122 individuals from 30 strains dividing genotypes into two genetic groups, K=2. Samples were arranged by purported proportions from 100% Sativa to 100% Indica (Wikileaf 2018) and then alphabetically within each strain by city. Each strain includes reported proportion of Sativa in parentheses (Wikileaf 2018) and each sample includes the coded location and city from where it was acquired. Each bar indicates proportion of assignment to genotype 1 (blue) and genotype 2 (yellow)

A Principal Coordinate Analyses (PCoA) was conducted using GENALEX (Fig. 2). Principal Coordinate Analyses (PCoA) is organized by color from 100% Sativa types (red), through all levels of Hybrid types (green 50:50), to 100% Indica types (purple; Fig. 2). Strain types with the same reported proportions are the same color but have different symbols. The PCoA of all strains represents 14.90% of the variation in the data on coordinate axis 1, 9.56% on axis 2, and 7.07% on axis 3 (not shown).

Principal Coordinates Analysis (PCoA) generated in GENALEX using Nei’s genetic distance matrix. Samples are a color-coded continuum by proportion of Sativa (Table 1) with the strain name given for each sample: Sativa type (red: 100% Sativa proportion, Hybrid type (dark green:50% Sativa proportion), and Indica type (purple: 0% Sativa proportion). Different symbols are used to indicate different strains within reported phenotype. Coordinate axis 1 explains 14.29% of the variation, coordinate axis 2 explains 9.56% of the variation, and Coordinate axis 3 (not shown) explains 7.07%

Lynch & Ritland (1999) pairwise genetic relatedness (r) between all 122 samples was calculated in GENALEX. The resulting 7381 pairwise r-values were converted to a heat map using purple to indicate the lowest pairwise relatedness value (− 1.09) and green to indicate the highest pairwise relatedness value (1.00; Additional file 4: Figure S3). Comparisons are detailed for six popular strains (Fig. 3) to illustrate the relationship of samples from different sources and the impact of outliers. Values of close to 1.00 indicate a high degree of relatedness (Lynch and Ritland 1999), which could be indicative of clones or seeds from the same mother (Green 2005; SeedFinder 2018a). First order relatives (full siblings or mother-daughter) share 50% genetic identity (r-value = 0.50), second order relatives (half siblings or cousins) share 25% genetic identity (r-value = 0.25), and unrelated individuals are expected to have an r-value of 0.00 or lower. Negative values arise when individuals are less related than expected under normal panmictic conditions (Moura et al. 2013; Norman et al. 2017).

Heat maps of six prominent strains (af) using Lynch & Ritland (Faircloth 2008) pairwise genetic relatedness (r) values: purple indicates no genetic relatedness (minimum value -1.09) and green indicates a high degree of relatedness (maximum value 1.0). Sample strain names and location of origin are indicated along the top and down the left side of the chart. Pairwise genetic relatedness (r) values are given in each cell and cell color reflects the degree to which two individuals are related

Individual pairwise r-values were averaged within strains to calculate the overall r-mean as a measure of genetic similarity within strains which ranged from − 0.22 (“Tangerine”) to 0.68 (“Island Sweet Skunk”) (Table 3). Standard deviations ranged from 0.04 (“Jack Herer”) to 0.51 (“Bruce Banner”). The strains with higher standard deviation values indicate a wide range of genetic relatedness within a strain, while low values indicate that samples within a strain share similar levels of genetic relatedness. In order to determine how outliers impact the overall relatedness in a strain, the farthest outlier (lowest pairwise r-mean value) was removed and the overall r-means and SD values within strains were recalculated (Table 3). In all strains, the overall r-means increased when outliers were removed. In strains with more than three samples, a second outlier was removed and the overall r-means and SD values were recalculated. Overall r-means were used to determine degree of relatedness as clonal (or from stable seed; overall r-means > 0.9), first or higher order relatives (overall r-means 0.46–0.89), second order relatives (overall r-means 0.26–0.45), low levels of relatedness (overall r-means 0.00–0.25), and not related (overall r-means < 0.00). Overall r-means are displayed for all 30 strains (Table 3), and graphically for 12 popular strains (Fig. 4). Initial overall r-means indicate only three strains are first or higher order relatives (Table 3). Removing first or second outliers, depending on sample size, revealed that the remaining samples for an additional ten strains are first or higher order relatives (0.46–1.00), three strains are second order relatives (r-means 0.26–0.45), ten strains show low levels of relatedness (r-means 0.00–0.25; Table 3), and five strains are not related (r-means < 0.00). The impact of outliers can be clearly seen in the heat map for “Durban Poison” which shows the relatedness for 36 comparisons (Fig. 3a), six of which are nearly identical (r-value 0.90–1.0), while 13 are not related (r-value < 0.00). However, removal of two outliers, Denver 1 and Garden City 2, reduces the number of comparisons ranked as not related from 13 to zero.

This graph indicates the mean pairwise genetic relatedness (r) initially (light purple), and after the removal of one (medium purple) or two (dark purple) outlying samples in 12 popular strains


Cannabis is becoming an ever-increasing topic of discussion, so it is important that scientists and the public can discuss Cannabis in a similar manner. Currently, not only are Sativa and Indica types disputed (Emboden 1974; Hillig 2005; Russo 2007; Clarke and Merlin 2013; Clarke et al. 2015; Clarke and Merlin 2016; McPartland 2017; Piomelli and Russo 2016; Small 2015b; De Meijer and Keizer 1996), but experts also are at odds about nomenclature for Cannabis (Emboden 1974; Hillig 2005; Russo 2007; Clarke and Merlin 2013; Clarke et al. 2015; Clarke and Merlin 2016; McPartland 2017; Piomelli and Russo 2016; Small 2015b; De Meijer and Keizer 1996). We postulated that genetic profiles from samples with the same strain identifying name should have identical, or at least, highly similar genotypes no matter the source of origin. The multiple genetic analyses used here address paramount questions for the medical Cannabis community and bring empirical evidence to support claims that inconsistent products are being distributed. An important element for this study is that samples were acquired from multiple locations to maximize the potential for variation among samples. Maintenance of the genetic integrity through genotyping is possible only following evaluation of genetic consistency and continuing to overlook this aspect will promote genetic variability and phenotypic variation within Cannabis. Addressing strain variability at the molecular level is of the utmost importance while the industry is still relatively new.

Genetic analyses have consistently found genetic distinction between hemp and marijuana, but no clear distinction has been shown between the common description of Sativa and Indica types (Lynch et al. 2016; Soler et al. 2017; Sawler et al. 2015; Dufresnes et al. 2017; De Meijer and Keizer 1996). We found high support for two genetic groups in the data (Fig. 1) but no discernable distinction or pattern between the described Sativa and Indica strains. The color-coding of strains in the PCoA for all 122 samples allows for visualization of clustering among similar phenotypes by color: Sativa (red/orange), Indica (blue/purple) and Hybrid (green) type strains (Fig. 2). If genetic differentiation of the commonly perceived Sativa and Indica types previously existed, it is no longer detectable in the neutral genetic markers used here. Extensive hybridization and selection have presumably created a homogenizing effect and erased evidence of potentially divergent historical genotypes.

Wikileaf maintains that the proportions of Sativa and Indica reported for strains are largely based on genetics and lineage (Nelson 2016), although online databases do not give scientific evidence for their categorization other than parentage information from breeders and expert opinions. This has seemingly become convoluted over time (Russo 2007; Clarke and Merlin 2013; Small 2015a; Small 2016). Our results show that commonly reported levels of Sativa, Indica and Hybrid type strains are often not reflected in the average genotype. For example, two described Sativa type strains “Durban Poison” and “Sour Diesel”, have contradicting genetic assignments (Fig. 1, Table 2). This analysis indicates strains with similar reported proportions of Sativa or Indica may have differing genetic assignments. Further illustrating this point is that “Bruce Banner”, “Flo”, “Jillybean”, “Pineapple Express”, “Purple Haze”, and “Tangerine” are all reported to be 60/40 Hybrid type strains, but they clearly have differing levels of admixture both within and among these reportedly similar strains (Table 2, Fig. 1). From these results, we can conclude that reported ratios or differences between Sativa and Indica phenotypes are not discernable using these genetic markers. Given the lack of genetic distinction between Indica and Sativa types, it is not surprising that reported ancestry proportions are also not supported.

To accurately address reported variation within strains, samples were purchased from various locations, as a customer, with no information of strains other than publicly available online information. Evidence for genetic inconsistencies is apparent within many strains and supported by multiple genetic analyses. Soler et al. (2017) found genetic variability among seeds from the same strain supplied from a single source, indicating genotypes within strains are variable. When examining the STRUCTURE genotype assignments, it is clear that many strains contained one or more divergent samples with a difference of > 0.10 genotype assignment (e.g. “Durban Poison” – Denver 1; Figs. 1, 3a). Of the 30 strains examined, only four strains had consistent STRUCTURE genotype assignment and admixture among all samples. The number of strains with consistent STRUCTURE assignments increased to 11 and 15 when one or two samples were ignored, respectively. These results indicate that half of the included strains showed relatively stable genetic identity among most samples. Six strains had only two samples, both of which were different (e.g., “Trainwreck” and “Headband”). The remaining nine strains in the analysis had more than one divergent sample (e.g., “Sour Diesel”) or had no consistent genetic pattern among the samples within the strain (e.g., “Girl Scout Cookies”; Table 3, Figs. 1, 2, Additional file 3: Figure S2). It is noteworthy that many of the strains used here fell into a range of genetic relatedness indicative of first order siblings (see Lynch & Ritland analysis below) when samples with high genetic divergence were removed from the data set (Table 3; Figs. 3, 4). Eight of the 30 strains examined are identified as clone only (Table 2). All eight of the strains described as clone only show differentiation of at least one sample within the strain (Fig. 1). For example, one sample of “Blue Dream” is clearly differentiated from the remaining eight, and “Girl Scout Cookies” has little genetic cohesiveness among the eight samples (Figs. 1, 2). Other genetic studies have similarly found genetic inconsistencies across samples within the same strain (Lynch et al. 2016; Soler et al. 2017; Sawler et al. 2015). These results lend support to the idea that unstable genetic lines are being used to produce seed.

A pairwise genetic heat map based on Lynch & Ritland (1999) pairwise genetic relatedness (r-values) was generated to visualize genetic relatedness throughout the data set (Additional file 4: Figure S3). Values of 1.00 (or close to) are assumed to be clones or plants from self-fertilized seed. Six examples of within-strain pairwise comparison heat maps were examined to illustrate common patterns (Fig. 3). The heat map shows that many strains contain samples that are first order relatives or higher (r-value > 0.49). For example, “Sour Diesel” (Fig. 3) has 12 comparisons of first order or above, and six have low/no relationship. There are also values that could be indicative of clones or plants from a stable seed source such as “Blue Dream” (Fig. 3), which has 10 nearly identical comparisons (r-value 0.90–1.00), and no comparisons in “Blue Dream” have negative values. While “Blue Dream” has an initial overall r-mean indicating first order relatedness within the samples (Table 3, Fig. 4), it still contains more variation than would be expected from a clone only strain (Clone Only Strains n.d.). Other clone-only strains (Clone Only Strains n.d.) e.g. “Girl Scout Cookies” (Table 3, Fig. 3) and “Golden Goat” (Table 3, Fig. 3), have a high degree of genetic variation resulting in low overall relatedness values. Outliers were calculated and removed iteratively to demonstrate how they affected the overall r– mean within the 12 popular strains (Table 3, Fig. 4). In all cases, removing outliers increased the mean r-value, as illustrated by “Bruce Banner”, which increased substantially, from 0.3 to 0.9 when samples with two outlying genotypes were removed. There are unexpected areas in the entire dataset heat map that indicate high degrees of relatedness between different strains (Additional file 4: Figure S3). For example, comparisons between “Golden Goat” and “Island Sweet Skunk” (overall r– mean 0.37) are higher than within samples of “Sour Diesel”. Interestingly, “Golden Goat” is reported to be a hybrid descendant of “Island Sweet Skunk” (Leafly 2018a; Wikileaf 2018; NCSM 2018; 2018; Seedfinder 2018) which could explain the high genetic relatedness between these strains. However, most of the between strain overall r– mean are negative (e.g., “Golden Goat” to “Durban Poison” -0.03 and “Chemdawg” to “Durban Poison” -0.22; Additional file 4: Figure S3), indicative of limited recent genetic relationship.

While collecting samples from various dispensaries, it was noted that strains of “Chemdawg” had various different spellings of the strain name, as well as numbers and/or letters attached to the name. Without knowledge of the history of “Chemdawg”, the assumption was that these were local variations. These were acquired to include in the study to determine if and how these variants were related. Upon investigation of possible origins of “Chemdawg”, an interesting history was uncovered, especially in light of the results. Legend has it that someone named “Chemdog” (a person) grew the variations (“Chem Dog”, “Chem Dog D”, “Chem Dog 4”) from seeds he found in a single bag of Cannabis purchased at a Grateful Dead concert (Danko 2016). However, sampling suggests dispensaries use variations of the name, and more often the “Chemdawg” form of the name is used, albeit incorrectly (Danko 2016). The STRUCTURE analysis indicates only one “Chemdawg” individual has > 0.10 genetic divergence compared to the other six samples (Fig. 1, Additional file 3: Figure S2). Five of seven “Chemdawg” samples cluster in the PCoA (Fig. 2), and six of seven “Chemdawg” samples are first order relatives (r-value > 0.50; Table 3, Fig. 3). The history of “Chem Dog” is currently unverifiable, but the analysis supports that these variations could be from seeds of the same plant. This illustrates how Cannabis strains may have come to market in a non-traditional manner. Genetic analyses can add scientific support to the stories behind vintage strains and possibly help clarify the history of specific strains.

Genetic inconsistencies may come from both suppliers and growers of Cannabis clones and stable seed, because currently they can only assume the strains they possess are true to name. There is a chain of events from seed to sale that relies heavily on the supplier, grower, and dispensary to provide the correct product, but there is currently no reliable way to verify Cannabis strains. The possibility exists for errors in plant labeling, misplacement, misspelling (e.g. “Chem Dog” vs. “Chemdawg”), and/or relabeling along the entire chain of production. Although the expectation is that plants are labeled carefully and not re-labeled with a more desirable name for a quick sale, these misgivings must be considered. Identification by genetic markers has largely eliminated these types of mistakes in other widely cultivated crops such as grapes, olives and apples. Modern genetic applications can accurately identify varieties and can clarify ambiguity in closely related and hybrid species (Guilford et al. 1997; Hokanson et al. 1998; Sarri et al. 2006; Costantini et al. 2005; United States Department of Agriculture 2014).

Matching genotypes within the same strains were expected, but highly similar genotypes between samples of different strains could be the result of mislabeling or misidentification, especially when acquired from the same source. The pairwise genetic relatedness r-values were examined for incidence of possible mislabeling or re-labeling. There were instances in which different strains had r-values = 1.0 (Additional file 4: Figure S3), indicating clonal genetic relationships. Two samples with matching genotypes were obtained from the same location (“Larry OG” and “Tahoe OG” from San Luis Obispo 3). This could be evidence for mislabeling or misidentification because these two samples have similar names. It is unlikely that these samples from reportedly different strains have identical genotypes, and more likely that these samples were mislabeled at some point. Misspelling may also be a source of error, especially when facilities are handwriting labels. An example of possible misspelling may have occurred in the sample labeled “Chemdog 1” from Garden City 1. “Chemdawg 1”, a described strain, could have easily been misspelled, but it is unclear whether this instance is evidence for mislabeling or renaming a local variant. Inadvertent mistakes may carry through to scientific investigation where strains are spelled or labeled incorrectly. For example, Vergara et al. (2016) reports genome assemblies for “Chemdog” and “Chemdog 91” as they are reported in GenBank (GCA_001509995.1), but neither of these labels are recognized strain names. “Chemdawg” and “Chemdawg 91” are recognized strains (Leafly 2018a; Wikileaf 2018; 2018; NCSM 2018; 2018; Seedfinder 2018), but according to the original source, the strain name “Chemdawg” is incorrect, and it should be “Chem Dog” (Danko 2016), but the name has clearly evolved among growers since it emerged in 1991 (Danko 2016). Another example that may lead to confusion is how information is reported in public databases. For example, data is available for the reported monoisolate of “Pineapple Banana Bubba Kush” in GenBank (SAMN06546749), and while “Pineapple Kush”, “Banana Kush” and “Bubba Kush” are known strains (Leafly 2018a; Wikileaf 2018; 2018; NCSM 2018; 2018; Seedfinder 2018), the only record we found of “Pineapple Banana Bubba Kush” is in GenBank. This study has highlighted several possible sources of error and how genotyping can serve to uncover sources of variation. Although this study was unable to confirm sources of error, it is important that producers, growers and consumers are aware that there are errors and they should be documented and corrected whenever possible.


Over the last decade, the legal status of Cannabis has shifted and is now legal for medical and some recreational adult use, in the majority of the United States as well as several other countries that have legalized or decriminalized Cannabis. The recent legal changes have led to an unprecedented increase in the number of strains available to consumers. There are currently no baseline genotypes for any strains, but steps should be taken to ensure products marketed as a particular strain are genetically congruent. Although the sampling in this study was not exhaustive, the results are clear: strain inconsistency is evident and is not limited to a single source, but rather exists among dispensaries across cities in multiple states. Various suggestions for naming the genetic variants do not seem to align with the current widespread definitions of Sativa, Indica, Hybrid, and Hemp (Hillig 2005; Clarke and Merlin 2013). As our Cannabis knowledge base grows, so does the communication gap between scientific researchers and the public. Currently, there is no way for Cannabis suppliers, growers or consumers to definitively verify strains. Exclusion from USDA protections due to the Federal status of Cannabis as a Schedule I drug has created avenues for error and inconsistencies. Presumably, the genetic inconsistencies will often manifest as differences in overall effects (Minkin 2014). Differences in characteristics within a named strain may be surprising for a recreational user, but differences may be more serious for a medical patient who relies on a particular strain for alleviation of specific symptoms.

This study shows that in neutral genetic markers, there is no consistent genetic differentiation between the widely held perceptions of Sativa and Indica Cannabis types. Moreover, the genetic analyses do not support the reported proportions of Sativa and Indica within each strain, which is expected given the lack of genetic distinction between Sativa and Indica. There may be land race strains that phenotypically and genetically separate as Sativa and Indica types, however our sampling does not include an adequate number of these strains to define these as two potentially distinct genotypes. The recent and intense breeding efforts to create novel strains has likely merged the two types and blurred previous separation between the two types. However, categorizing strains this way helps consumers communicate their preference for a spectrum of effects (e.g.: Sativa-dominant Hybrid), and the vernacular usage will likely continue to be used, despite a lack of evidence of genetic differentiation.

Instances we found where samples within strains are not genetically similar, which is unexpected given the manner in which Cannabis plants are propagated. Although it is impossible to determine the source of these inconsistencies as they can arise at multiple points throughout the chain of events from seed to sale, we theorize misidentification, mislabeling, misplacement, misspelling, and/or relabeling are all possible. Especially where names are similar, there is the possibility for mislabeling, as was shown here. In many cases genetic inconsistencies within strains were limited to one or two samples. We feel that there is a reasonable amount of genetic similarity within many strains, but currently there is no way to verify the “true” genotype of any strain. Although the sampling here includes merely a fragment of the available Cannabis strains, our results give scientific merit to previously anecdotal claims that strains can be unpredictable.

The High Times Interview: Compound Genetics

Want to know some specifics about breeding cannabis? Know the difference between regular and feminized seeds? Ever heard of an S1, F1 or F2? Do you have any idea what any of it means? I got the chance to link up with Compound Genetics in Portland, Oregon a few month back, and I was lucky enough to sit down with them and talk a little bit about different breeding methods and some terms that are commonly used.

High Times: Can you please explain what an F1 and F2 are when it comes to breeding cannabis?

Compound Genetics: F1 is the first generation [of plants] from unrelated parents. You take a male and a female, you cross them and you create an F1. The F2 is when you work that first generation into the second generation. For example, you could take a Legend Orange Apricot F1, which would be an Orange Apricot male to a Legend OG female, which are completely unrelated parents. When you cross them, the seeds that come from that will be the Legend Orange Apricot F1 generation.

To create the F2, you would take a male from that first generation that you selected from seed and a female from that first generation that you want to breed into the next generation. You would pollinate the F1 female with the F1 male pollen, and the seeds that will be produced will be the Legend Orange Apricot F2.

If you work that into an F3 and F4, once you get past F4, you are in IBL. To do that successfully, you really need to make sure you don’t lock down too many of the same traits, and you’re not bottlenecking because you can create a big mess if you lock down too many of the same traits. You want to keep the gene pool open.

Once you get past F1, it becomes really advanced breeding. F2’s can be a big mess if they aren’t done right. You can send it in the wrong direction by selecting bad parents. F2’s should be left to people who know what they are doing. Very few people work into the F2 and beyond anymore. More people are playing in the F1 generations.

HT: What can you tell us about S1’s and feminized seeds?

CG: It’s also called “selfing.” It’s a part of making feminized seeds in a sense. I think people confuse it with bag seeds sometimes. Bag seeds aren’t always S1’s. Bag seed can be an S1, or it can be from pollen on you floating around and you brush up against a plant. If it’s on you and you brush up against a plant, you just pollinated a plant. S1’s could be made other ways too. It can be caused naturally, like stressing the plant so it forces itself to make S1’s—whether it’s light leaks, feeding or mistreating. S1’s could be created by a breeder, by part of a reversal where they actually spray a mixture of chemicals onto a plant, and it will turn a female plant into a male plant over time.

For example, if I took a Jet Fuel Gelato female and reverse it to force it to turn into a male, I collect the pollen from that reversal,and hit it onto a Jet Fuel Gelato female, the seeds from that cross would be the Jet Fuel Gelato S1’s, which would be the feminized version of Jet Fuel Gelato

S1’s are feminized; 99 percent of feminized seeds will be female. S1 and R1 are considered different. R1 is the equivalent of a feminized F1. F1 is the regular version. Regular is the industry term for a male seed. Feminized is the industry term for a reversed seed or feminized seed. You get feminized seeds from reversing, and you get regular seeds from using a male. The method you would do for S1’s would be the same thing you do to make feminized seeds. You spray it on the plant in certain periods of the early flower cycle, and it makes the plant switch sexes. You can take any female plant and spray over a two to three week period. It won’t grow any female flowers at all. It will actually start to grow male flowers. The whole plant will turn into a complete male. It will release pollen, but reversed plants never release the same amount of pollen as a male plant. Male pollinations will almost always have more seeds. That’s why many breeders prefer males because you can get so many more seeds, and you can do a lot more with males.

With reversals, it’s kinda hit or miss. You can get a small amount of pollen, or sometimes you can get almost no pollen. Some plants don’t want to spit pollen when you reverse them. If you are reversing a plant and you want to create a bunch of seeds, you going to need to reverse a lot more plants to collect enough pollen versus having a male around which dumps pollen. I would need two males to pollenate a grow room the size of this hotel room. It would take 20 reversed plants to do the work two males can do, and you would have to apply the pollen with direct contact, like brushing the pollen on the plants or hitting the females with a male branch. You have to actually work the plant to spread the pollen, because it’s not dumping pollen like the males plants would. There are a lot of different methods you can use to apply reversed pollen onto other plants.

Jet Fuel Gelato (Photo Courtesy of Compound Genetics)

Feminized seeds kinda have a stigma in the industry. People frown a little bit upon them as if they are made from Monsanto. Genetically modified, not natural, they really have their purpose. The fact that you can run a whole garden of feminized seeds, and you don’t have to worry about weeding out or selecting males. For a beginner grower, it’s perfect. For someone who doesn’t know how to select males, they can just take feminized seeds, run them, and know they do not have to worry about finding any males pollenating their crop and losing their crop because a male slipped through the cracks.

HT: Can you tell me more about hermaphrodite plants?

CG: It definitely is an unstable trait, and it’s definitely more prevalent in untested and unworked gear that is bred from unstable plants. If you are going to create a feminized seed, one of the most important things about fems is that you should never reverse a plant that is unstable. If you reverse a plant that has unstable traits, it’s going to continue those traits onto the next generation.

HT: Can regular plants still throw out herms?

CG: Oh, yeah. Definitely. It can be from that same factor from breeding unstable traits to begin with or environmental issues. In general, it’s usually a genetic trait. Strong genetics that are worked well won’t hermaphrodite under any circumstances. Sometimes when you run seeds from the first generation, some will hermaphrodite, but if you clone them and run them again, that trait won’t come out once you don’t run them from seed. I don’t know exactly why that is true, but it definitely happens. I advise anybody running from seed that experiences hermaphrodites, if it’s a plant that looks real good late in flower, run it again from clone and see what happens. Sometimes, those traits don’t come out in clone. You really can’t do anything to stop them.

I don’t recommend people spraying “Switch” or anything like that. Same goes for any chemical product that’s going to revert your plant back. If your plant is showing that trait, you should either be prepared to live with that trait, eliminate that plant, try to breed it out by outcrossing or crossing it over a few generations, or reverse something onto it that’s going to help eliminate that trait.

HT: What would be your preferred method for collecting pollen, and around what time is best to harvest pollen from the males? Do different males throw out pollen at different times?

CG: Males throw out pollen usually between week 3.5 – 5. That’s usually when they are known to drop pollen. After week 6 or 7, they kind of get spent. If I was going to collect pollen, I would try to collect it around week 4.

There are various ways you can collect it. You can just go up to the male and get a bag, box or something like that and collect it over time by leaving the bag or box open under the plant, or you can take a whole branch of the plant and just literally put it in a bag, just shake it, and all the pollen will come off. Then, you will be left with the male parts of the plant. What you can do with that is take the male pollen sacs, sift or filter it out, then you are left with this fine powder. Ultimately, you just want to have just powder, and storing it clean and dry. The key to saving pollen long periods of time is having it dry. Moisture is the killer for pollen. Light as well. Anything that is bad for dried cannabis will be bad for pollen.

Sour Gelato (Photo Courtesy of Compound Genetics)

HT: What’s the longest you can keep properly stored pollen viable before having to go through another round to collect pollen?

CG: If it’s stored properly, you can keep it for a long time, but generally pollen tends to fade after a year or two. It’s not something that should be your long-term [plan] for saving genetics. To save them long-term, you have to save them in seed form. The best method is using a male of course, but to get the true genetics into seed form, if you can self it, that’s how you can really save anything. If you want to save all your cultivars, S1 everything. You will have pure versions of those clones only in feminized seed form. When you hunt those seeds, you can find the same traits and exact same examples of the mother plants you’re reversing, plus you might find versions that are even better.

S1’s will have the same exact phenotypes that you reverse, plus it will have other unique versions. You can find some better versions of the clone-only mom in the S1’s. If you are trying to save your genetics for a long [time], S1′ s might be the way to go.

Cannabis Breeding Banter with Connoisseur Genetics

Firmly established in the U.K and Dutch scenes as a purveyor of the finest cannabis genetics on the planet, High Times sat down with OJ from Amsterdam-based Connoisseur Genetics. Producing close to one million feminized seeds per year, we talk to him about the breeding scene, his influences, genetics, how he crafts his seeds, and what his top tips are when it comes to making a consistent, reliable, and delicious end product.

High Times: How different is the U.K breeding scene compared to America’s?

OJ: The American scene is very different to the UK scene as we are entirely illegal and still totally underground. We have no dispensaries so can’t trade or access clones as easily as the USA. It is tough to operate in the UK and stay under the radar. We do have one advantage, and that is cannabis seeds are legal, so we can trade in them but not produce them. This is the frustrating part of our challenging laws.

When creating a new strain, what are the desired traits you look for as a breeder?

When creating a new strain, I’m trying to merge some of the best qualities of each plant into each other and in my case, focusing on the aroma, taste, and effect as my priority. I also clone only plants that I’m just trying to produce the strain in feminized form. This is done by selfing the clone to create an S1, for people who don’t have access to these special clones, yet want the genetic in seed form now have an easy to work with feminized seed. For male plants I’m looking for the trait of the plant I plan to capture. I consider leaf shape to indicate pheno and structure of the plant. My number one goal for me is aroma, meaning I always select a stinky male that throws off that particular scent.

Can you explain what S1 means when breeding and how female pollen is produced?

An F1 or first generation is crossing two unrelated landraces together to create hybrid vigor. However, S1 is where you take a clone and reverse it using chemicals or Silver.

The process is logistically much harder than regular male breeding, due to the female pollen producing a lot less viable pollen than a regular male plant. This can result in a much smaller percentage of final seeds meaning a much lower conversion rate to regular seeds.

Courtesy of Connoisseur Genetics

What are your breeding facilities like and how is everything quality controlled?

I have multiple breeding spots that are used to produce my seeds. Each place does one unique reversal or male project at a time, to guarantee there is no cross-contamination of pollen. For smaller testing spaces, I use males or reverse female clones that I have not used before. Once the choice clones have been selected from the smaller rooms, they are then used in my main breeding room. This ensures only the best of the best stock is used. The way I store pollen is in an airtight container in the fridge, but I would advise to use fresh pollen and only stored pollen if no other options are available. The viability rate of the pollen dies over time, which can be a total waste of time as a breeder.

What are your top tips for any up and coming breeders out there?

My top tips are: breed for the people and not for your taste. Unless you are home breeding for fun, then this is different. Try and be unique not like every other company, come with some exclusive flavors and effect. This will allow you to stand out from the rest at a commercial level. Get out there and try what’s going in the top dispensary, coffee shop, social club and cup events. Then you can know if your stuff is the best of the best and not just the best in your circle

Courtesy of Connoisseur Genetics

How long have you been breeding for, and how would you best describe your work?

I first started breeding just for fun around 2005, but began selling seeds around 2009 through which I have a forum there. I best describe my work as made by a connoisseur for connoisseurs focusing on taste and effect. This means growing with the best organic mediums and zero chemicals.

Who have been your biggest influences in breeding?

My biggest influence was Nevil of the seed bank Sensi Seeds, Greenhouse Seeds and Mr. Nice, as well as Soma of Soma seeds. Neville is a genius and responsible for over 90% of any top end plants that floated around Europe between the 1980 – 2005. He is the real Haze king producing the best plants that I have ever seen or smoked. Soma used to have his buds for sale in Amsterdam coffeeshops years back, and they were another level, so tasty it was like you were eating fresh fruit. They were grown organic in a hydro filled city, and they stood above everything. Soma’s New York City Diesel (Old Red Grapefruit cut) is in my top 3 ever and fighting that 1st place as the most delicious organic cannabis around. Long live the red grapefruit cut and let it find its way to me one day!

When did your pursuit for the original Hazes begin and can you tell us more about the origin of Haze in Amsterdam?

My love for Haze started after smoking the Northern Lights # 5 Haze from seed, then traveling to Amsterdam and smoking and growing all the best Haze I came across there. The origins of Haze started when Sam the Skunkman came to Holland from the USA with seeds in the late 70’s or early 80’s and shared some seeds with Neville who went to went to work like a D.J cutting and mixing it up. Known for best creating the world famous NL #5 Haze cuts that are responsible for 90% of Haze today.

Courtesy of Connoisseur Genetics

What are your top five strains you’ve made?

That is quite tricky, but I would say number one is Super Silver Sour Diesel Haze. An Original release of SSSDH that I got from Reservoir Seeds and have made F2 for Reservoir seeds. I created the feminized line of the original SSSDH, which is sold through my seedbank. The second is Strawberry Cookies, which is a cross of Strawberry Cough x Girl Scout Cookies reversed. Third would be Hey Dave, who is a cross of OG Affie x OG Kush by Raskal Seeds x Casey Jones reversed. My fourth strain is the Diesel Dipped Cookies, who is a cross of the original Diesel x Girl Scout Cookies reversed. Finally, I would say Nevil The G which is Nevil Mango by MR Nice seeds x G13 Haze male ( Soma’s cut )

Will Connoisseur Genetics be at any expos in 2019, and do you have any social media platforms our readers can follow you on?