IN PRACTICAL DEVELOPMENT OF GENOME-WIDE MARKERS FOR TIMBER ORIGIN IDENTIFICATION AND OTHER APPLICATIONS

The forest genetics, tree improvement and protection can greatly benefit from complete genome sequence data made recently available for several major conifer species. They allow to identify and annotate genes, other functional elements (sRNA, transcription factors, regulatory elements, etc.) and genetic networks that control adaptation and dis-ease resistance. They can be used to develop highly informative genetic markers that can be used in population genetic studies to create database of barcodes for individual populations to fight illegal timber harvest and trade. They are very much needed for development of genome-wide genetic markers for association studies for linking genetic variation (SNPs, alleles, haplotypes, and genotypes) with environmental factors, adaptive traits and phenotypes for better under-standing genetic control of agronomically and economically important traits. They can be also used to develop genome-wide genetic markers for genomic-assisted selection to breed for better adapted, stress resistant and climate change resi-lient trees with desirable quality ecological and economic traits. Finally, whole genome sequences allow to integrate proteomics, transcriptomics and metabolomics and provide reference genomes for resequencing. In this brief summary we would like to present one of many practical applications of genetics and genomics in forestry– development of highly polymorphic and informative molecular genetic markers for several very important boreal forest species in Eurasia, Siberian larch ( Larix sibirica Ledeb.), Siberian stone pine ( Pinus sibirica Du Tour) and Scots pine ( Pinus sylvestris L.), based on the whole genome data obtained in the “Genomics of the Key Boreal Forest Conifer Species and Their Major Phytopathogens in the Russian Federation” project funded by the Government of the Russian Federation (grant no. 14.Y26.31.0004).


Introduction
The whole genome sequence data are the foundation for subsequent studies of evolutionary, biochemical and physiological processes in the sequenced organisms. Deep knowledge of the genome structure including the fine exon-intron gene structure, repeated sequences and intergenic sites help us better understand the mechanisms of gene regulation and expression, as well as the genome evolution. The whole genomic data become more available recently, including conifer species, and are widely used now to develop new DNA markers, such as single nucleotide polymorphisms (SNPs) and microsatellite loci or simple sequence repeats (SSRs) that can be used in population genetic analysis and for solving practical forestry problems, for example, to identify the origin of wood and planting material, for certification and identification of clones.
The development of molecular genetic markers for the main forest-forming tree species are extremely important and needed for solving problems of forestry, reforestation and afforestation. To solve these problems, estimates of the level of genetic variability, data on the population structure and differentiation, and effective methods of genetic identification of the wood and plant material origin are required.
Among the available genetic markers, nuclear microsatellite loci can be used to address these problems and are most fully meet requirements for reliable and convenient genetic markers. They are characterized by high specificity, reproducibility, codominance, multiple alleles, high heterozygosity and, moreover, do not require sophisticated equipment for analysis.
For example, Siberian larch (Larix sibirica Ledeb.) is one of the main forest-forming conifer species in Siberia, such species-specific markers have not been developed till recently. Siberian larch grows in the forest zone of the east and northeast of the European part of Russia, the Urals, Western and Eastern Siberia. Its area stretches from tundra (71°N latitude) on the north to the southern latitudes of Altai and Sayan (46° N) on the south. On the territory of the Russian Federation, larch forests occupy 263 million hectares, about 40% of the forest area of the country (769.8 million hectares). Previously, markers based on nuclear microsatellite loci developed for other species of this genus were used to analyze the population-genetic variation of L. sibirica [1][2][3]. With the help of these markers, genetic diversity and differentiation were studied in several populations of this species [4,5]. However, a small number of markers was used in these studies due to poor PCR amplification and the presence of a large number of "null alleles" for many non-species-specific markers.
Siberian stone pine, Pinus sibirica Du Tour and Scots pine (Pinus sylvestris L.) are also among the most economically and environmentally important forest-forming species of conifers in Eurasia. To study these forests a large number of highly polymorphic molecular genetic markers, such as microsatellite loci, are also required that were unavailable for Siberian stone pine till recently.
Prior to the new high-throughput next generation sequencing (NGS) methods, discovery of microsatellite loci and development of microsatellite markers were very time consuming and laborious. The recently developed draft assemblies of the Siberian larch, Siberian stone pine and Scots pine genomes sequenced using the NGS methods in the Laboratory of Forest Genomics of the Siberian Federal University [6][7][8], it has become possible to develop species-specific microsatellite primers for these species.

Materials and methods
The draft genome assemblies presented in Table 1 allowed us to identify a large number of microsatellite loci in the Siberian larch and Siberian stone pine genomes and to develop species-specific PCR primers for their amplification and genotyping. The primers were designed using contigs containing short simple sequence tandem repeats.

Естественные науки и лес
The most promising markers were selected, and multiplex genotyping panels were designed for Siberian larch and tested for fragment analysis using the ABI 3130xl Genetic Analyzer with capillary electrophoresis [10].
The sequencing of the Siberian larch genome was done with 93X coverage using the Illumina HiSeq 2000 platform. To select high quality reads and to remove adapter dimers the raw reads were filtered using MUSKET [11] and Trimmomatic [12]. A draft assembly was generated using the CLC Assembly Cell assembler (https://www.qiagen-bioinformatics.com). The obtained assembly contained 12.4 million contigs with a total length of ~8 Gbp. This assembly was searched for contigs containing microsatellite loci using the GMATo program [13]. The preliminary analysis showed that microsatellite loci with tri-, tetra-and pentanucleotide motifs were much less variable in larch than the loci with dinucleotide motifs. Therefore, from all microsatellite loci found, only loci with dinucleotide motifs repeated at least 20 times were selected for the PCR primer design. Primers for the selected microsatellite loci were designed using the WebSat online service [14]. As a result, 59 primers pairs were designed and tested. Needle samples collected from 100 individ-ual Siberian larch trees in 2014 in two populations (50 trees per population) in the Republic of Khakassia were used in this study [10]. The one population is located in the Shirinsky District of Khakassia near the Shira-Berenjak highway (larch forest with pine on a gentle slope), anothernear the Efremkino Village (larch on a steep slope and at its foot).
Similar search for microsatellite loci were done using the Siberian stone pine 32X genome coverage assembly [9]. The designed primers were first tested on DNA samples of four P. sibirica trees to select successful primers that generate amplification product and to optimize the PCR conditions. The selected primers were then tested on eight specimens from the same population in order to detect polymorphisms. Variability of the loci that were monomorphic in this sample was tested further in nine individuals from nine geographically distant populations representing different regions of the Siberian stone pine area. The final testing of the polymorphic loci was performed using 10-12 specimens per each of several populations.

Results and discussion
Larix sibirica SSRs Among 59 primer pairs selected in the first test 20 produced no product, 12 had non-specific amplification and 27 stably amplified supposedly a single-locus PCR product that could be well-genotyped on gels. After the first selection, the forward primer in each of the 27 pairs was labelled either by "blue" (FAM) or "green" (HEX) fluorescent dyes for further testing on the ABI PRISM 3730 sequencer. The labelled oligonucleotide primers were synthesized by Sigma (Germany). The trial PCR multiplexes consisting of two or three primer pairs were made taking into account the size of the PCR fragments. Multiplexing was done at the PCR reaction stage by combining two or three different primer pairs in the same PCR reaction and adjusting the total volume by reducing the water portion accordingly. The obtained PCR amplification product was necessarily diluted 50-100 times before electrophoresis. The testing of polymorphic loci at this stage was carried out using 8-16 samples from each of the two populations. After this testing on a capillary sequencer, additional 9 pairs of primers had to be excluded due to poor or nonspecific amplification, and supposedly a large number of null alleles.
Pinus sibirica SSRs Based on the testing of primers for 70 microsatellite loci with tri-, tetra-or pentanucleotide repeats, 18 most promising, reliable and polymorphic loci were selected that can be used further as molecular genetic markers in population genetic studies of Siberian stone pine [9].
Pinus sylvestris mitochondrial DNA markers Five SNPs and a single minisatellite locus were identified [15]. Caucasian samples differed from the rest by three SNPs. Two SNPs have been linked to an early described marker in the first intron of the nad7 gene, and all together revealed three haplotypes in Eu-ropean populations. No variable SNPs were found in the Siberian and Mongolian populations. The minisatellite locus contained 41 alleles across European, Siberian, and Mongolian populations, but, this locus demonstrated a weak population differentiation (F ST = 0.058), probably due to its high mutation rate.
These new markers were further used in the Scots pine population and phylogeographic studies [16]. Three mitochondrial DNA markers were genotyped in 90 populations of Scots pine located from Eastern Europe to Eastern Siberia. The geographic distribution of seven mitotypes demonstrated the split between western and eastern populations approximately along the 38th meridian. Genetic diversity in the western part was significantly higher than in the eastern one. Five mitotypes were western-and one eastern-specific. One mitotype was common in both regions, but in the eastern part it occurred only in the South Urals and adjacent areas. The geographic structure in the mitotype distribution supports a hypothesis of post-glacial recolonization of the studied territory from the European and Ural refugia.

Conclusions
The whole genome sequencing data provided rich material for developing highly polymorphic molecular genetic markers that were efficiently used for genotyping of natural and artificial populations of Siberian stone pine, Siberian larch and Scots pine, Newly developed markers will allow us obtaining reliable quantitative estimates of the parameters of their genetic structure, such as within and between population allelic and genetic diversity, genetic subdivision and differentiation at different hierarchical levels, inbreeding, gene flow, etc.