Background:
Genome-wide association studies (GWAS) are a widely-used study design for detecting genetic causes of complex diseases. Current studies provide good coverage of common causal SNPs, but not rare ones. A popular method to detect rare causal variants is haplotype testing. A disadvantage of this approach is that many parameters are estimated simultaneously, which can mean a loss of power and slower fitting to large datasets.Haplotype testing effectively tests both the allele frequencies and the linkage disequilibrium (LD) structure of the data. LD has previously been shown to be mostly attributable to LD between adjacent SNPs. We propose a generalised linear model (GLM) which models the effects of each SNP in a region as well as the statistical interactions between adjacent pairs. This is compared to two other commonly used multimarker GLMs: one with a main-effect parameter for each SNP; one with a parameter for each haplotype.
Results:
We show the haplotype model has higher power for rare untyped causal SNPs, the main-effects model has higher power for common untyped causal SNPs, and the proposed model generally has power in between the two others. We show that the relative power of the three methods is dependent on the number of marker haplotypes the causal allele is present on, which depends on the age of the mutation. Except in the case of a common causal variant in high LD with markers, all three multimarker models are superior in power to single-SNP tests.Including the adjacent statistical interactions results in lower inflation in test statistics when realistic levels of population stratification are present in a dataset.Using the multimarker models, we analyse data from the Molecular Genetics of Schizophrenia study. The multimarker models find potential associations that are not found by single-SNP tests. However, multimarker models also require stricter control of data quality since biases can have a larger inflationary effect on multimarker test statistics than single-SNP test statistics.
Conclusions:
Analysing a GWAS with multimarker models can yield candidate regions which may contain rare untyped causal variants. This is useful for increasing prior odds of association in future whole-genome sequence analyses.
Background:
The genetic etiology of complex diseases in human has been commonly viewed as a complex process involving both genetic and environmental factors functioning in a complicated manner. Quite often the interactions among genetic variants play major roles in determining the susceptibility of an individual to a particular disease. Statistical methods for modeling interactions underlying complex diseases between single genetic variants (e.g. single nucleotide polymorphisms or SNPs) have been extensively studied. Recently, haplotype-based analysis has gained its popularity among genetic association studies. When multiple sequence or haplotype interactions are involved in determining an individual's susceptibility to a disease, it presents daunting challenges in statistical modeling and testing of the interaction effects, largely due to the complicated higher order epistatic complexity.
Results:
In this article, we propose a new strategy in modeling haplotype-haplotype interactions under the penalized logistic regression framework with adaptive L1-penalty. We consider interactions of sequence variants between haplotype blocks. The adaptive L1-penalty allows simultaneous effect estimation and variable selection in a single model. We propose a new parameter estimation method which estimates and selects parameters by the modified Gauss-Seidel method nested within the EM algorithm. Simulation studies show that it has low false positive rate and reasonable power in detecting haplotype interactions. The method is applied to test haplotype interactions involved in mother and offspring genome in a small for gestational age (SGA) neonates data set, and significant interactions between different genomes are detected.
Conclusions:
As demonstrated by the simulation studies and real data analysis, the approach developed provides an efficient tool for the modeling and testing of haplotype interactions. The implementation of the method in R codes can be freely downloaded from http://www.stt.msu.edu/~cui/software.html.
Background:
In genome-wide association studies, thousands of individuals are genotyped in hundreds of thousands of single nucleotide polymorphisms (SNPs). Statistical power can be increased when haplotypes, rather than three-valued genotypes, are used in analysis, so the problem of haplotype phase inference (phasing) is particularly relevant. Several phasing algorithms have been developed for data from unrelated individuals, based on different models, some of which have been extended to father-mother-child "trio" data.
Results:
We introduce a technique for phasing trio datasets using a tree-based deterministic sampling scheme. We have compared our method with publicly available algorithms PHASE v2.1, BEAGLE v3.0.2 and 2SNP v1.7 on datasets of varying number of markers and trios. We have found that the computational complexity of PHASE makes it prohibitive for routine use; on the other hand 2SNP, though the fastest method for small datasets, was significantly inaccurate. We have shown that our method outperforms BEAGLE in terms of speed and accuracy for small to intermediate dataset sizes in terms of number of trios for all marker sizes examined. Our method is implemented in the "Tree-Based Deterministic Sampling" (TDS) package, available for download at www.ee.columbia.edu/~anastas/tds
Conclusions:
Using a Tree-Based Deterministic sampling technique, we present an intuitive and conceptually simple phasing algorithm for trio data. The trade off between speed and accuracy achieved by our algorithm makes it a strong candidate for routine use on trio datasets.
Background:
Domestication and breeding involve the selection of particular phenotypes, limiting the genomic diversity of the population and creating a bottleneck. These effects can be precisely estimated when the location of domestication is established. Few analyses have focused on understanding the genetic consequences of domestication and breeding in fruit trees. In this study, we aimed to analyse genetic structure and changes in the diversity in sweet cherry Prunus avium L.
Results:
Three subgroups were detected in sweet cherry, with one group of landraces genetically very close to the analysed wild cherry population. A limited number of SSR markers displayed deviations from the frequencies expected under neutrality. After the removal of these markers from the analysis, a very limited bottleneck was detected between wild cherries and sweet cherry landraces, with a much more pronounced bottleneck between sweet cherry landraces and modern sweet cherry varieties. The loss of diversity between wild cherries and sweet cherry landraces at the S-locus was more significant than that for microsatellites. Particularly high levels of differentiation were observed for some S-alleles.
Conclusions:
Several domestication events may have happened in sweet cherry or/and intense gene flow from local wild cherry was probably maintained along the evolutionary history of the species. A marked bottleneck due to breeding was detected, with all markers, in the modern sweet cherry gene pool. The microsatellites did not detect the bottleneck due to domestication in the analysed sample. The vegetative propagation specific to some fruit trees may account for the differences in diversity observed at the S-locus. Our study provides insights into domestication events of cherry, however, requires confirmation on a larger sampling scheme for both sweet cherry landraces and wild cherry.
Background:
Identification of global livestock diversity hotspots and their importance in diversity maintenance is essential for making global conservation efforts. We screened 52 sheep breeds from the Eurasian subcontinent with 20 microsatellite markers. By estimating and weighting differently within- and between-breed genetic variation our aims were to identify genetic diversity hotspots and prioritize the importance of each breed for conservation, respectively. In addition we estimated how important within-species diversity hotspots are in livestock conservation.
Results:
Bayesian clustering analysis revealed three genetic clusters, termed Nordic, Composite and Fat-tailed. Southern breeds from close to the region of sheep domestication were more variable, but less genetically differentiated compared with more northern populations. Decreasing weight for within-breed diversity component led to very high representation of genetic clusters or regions containing more diverged breeds, but did not increase phenotypic diversity among the high ranked breeds. Sampling populations throughout 14 regional groups was suggested for maximized total genetic diversity.
Conclusions:
During initial steps of establishing a livestock conservation program populations from the diversity hot-spot area are the most important ones, but for the full design our results suggested that approximately equal population presentation across environments should be considered. Even in this case, higher per population emphasis in areas of high diversity is appropriate. The analysis was based on neutral data, but we have no reason to think the general trend is limited to this type of data. However, a comprehensive valuation of populations should balance production systems, phenotypic traits and available genetic information, and include consideration of probability of success.
Background:
It has been shown that integron-associated gene cassettes exist largely in tandem arrays of variable size, ranging from antibiotic resistance arrays of three to five cassettes up to arrays of more than 100 cassettes associated with the vibrios. Further, the ecology of the integron/gene cassette system has been investigated by showing that very many different cassettes are present in even small environmental samples. In this study, we seek to extend the ecological perspective on the integron/gene cassette system by investigating the way in which this diverse cassette metagenome is apportioned amongst prokaryote lineages in a natural environment.
Results:
We used a combination of PCR-based techniques applied to environmental DNA samples and ecological analytical techniques to establish co-assortment within cassette populations, then establishing the relationship between this co-assortment and genomic structures. We then assessed the distribution of gene cassettes within the environment and found that the majority of gene cassettes existed in large co-assorting groups.
Conclusions:
Our results suggested that the gene cassette diversity of a relatively pristine sampling environment was structured into co-assorting groups, predominantly containing large numbers of cassettes per group. These co-assorting groups consisted of different gene cassettes in stoichiometric relationship. Conservatively, we then attributed co-assorting cassettes to the gene cassette complements of single prokaryote lineages and by implication, to large integron-associated arrays. The prevalence of large arrays in the environment raises new questions about the assembly, maintenance and utility of large cassette arrays in prokaryote populations.
Background:
Stress, elicited for example by aggressive interactions, has negative effects on various biological functions including immune defence, reproduction, growth, and, in livestock, on product quality. Stress response and aggressiveness are mutually interrelated and show large interindividual variation, partly attributable to genetic factors. In the pig little is known about the molecular-genetic background of the variation in stress responsiveness and aggressiveness. To identify candidate genes we analyzed association of DNA markers in each of ten genes (CRH g.233C>T, CRHR1 c.*866_867insA, CRHBP c.51G>A, POMC c.293_298del, MC2R c.306T>G, NR3C1 c.*2122A>G, AVP c.207A>G, AVPR1B c.1084A>G, UCN g.1329T>C, CRHR2 c.*13T>C) related to the hypothalamic-pituitary-adrenocortical (HPA) axis, one of the main stress-response systems, with various stress- and aggression-related parameters at slaughter. These parameters were: physiological measures of the stress response (plasma concentrations of cortisol, creatine kinase, glucose, and lactate), adrenal weight (which is a parameter reflecting activity of the central branch of the HPA axis over time) and aggressive behaviour (measured by means of lesion scoring) in the context of psychosocial stress of mixing individuals with different aggressive temperament.
Results:
The SNP NR3C1 c.*2122A>G showed association with cortisol concentration (p = 0.024), adrenal weight (p = 0.003) and aggressive behaviour (front lesion score, p = 0.012; total lesion score p = 0.045). The SNP AVPR1B c.1084A>G showed a highly significant association with aggressive behaviour (middle lesion score, p = 0.007; total lesion score p = 0.003). The SNP UCN g.1329T>C showed association with adrenal weight (p = 0.019) and aggressive behaviour (front lesion score, p = 0.029). The SNP CRH g.233C>T showed a significant association with glucose concentration (p = 0.002), and the polymorphisms POMC c.293_298del and MC2R c.306T>G with adrenal weight (p = 0.027 and p < 0.0001 respectively).
Conclusions:
The multiple and consistent associations shown by SNP in NR3C1 and AVPR1B provide convincing evidence for genuine effects of their DNA sequence variation on stress responsiveness and aggressive behaviour. Identification of the causal functional molecular polymorphisms would not only provide markers useful for pig breeding but also insight into the molecular bases of the stress response and aggressive behaviour in general.
Background:
Leptin modulates appetite, energy expenditure and the reproductive axis by signalling via its receptor the status of body energy stores to the brain. The present study aimed to quantify the associations between 10 novel and known single nucleotide polymorphisms in genes coding for leptin and leptin receptor with performance traits in 848 Holstein-Friesian sires, estimated from performance of up to 43,117 daughter-parity records per sire.
Results:
All single nucleotide polymorphisms were segregating in this sample population and none deviated (P > 0.05) from Hardy-Weinberg equilibrium. Complete linkage disequilibrium existed between the novel polymorphism LEP-1609, and the previously identified polymorphisms LEP-1457 and LEP-580. LEP-2470 associated (P < 0.05) with milk protein concentration and calf perinatal mortality. It had a tendency to associate with milk yield (P < 0.1). The G allele of LEP-1238 was associated (P < 0.05) with reduced milk fat concentration, reduced milk protein concentration, longer gestation length and tended to associate (P < 0.1) with an increase in calving difficulty, calf perinatal mortality and somatic cells in the milk. LEP-963 exhibited an association (P < 0.05) with milk fat concentration, milk protein concentration, calving difficulty and gestation length. It also tended to associate with milk yield (P < 0.1). The R25C SNP associated (P < 0.05) with milk fat concentration, milk protein concentration, calving difficulty and length of gestation. The T allele of the Y7F SNP significantly associated with reduced angularity (P < 0.01) and reduced milk protein yield (P < 0.05). There was also a tendency (P < 0.1) for Y7F to associate with increased body condition score, reduced milk yield and shorter gestation (P < 0.1). A80V associated with reduced survival in the herd (P < 0.05).
Conclusions:
Several leptin polymorphisms (LEP-2470, LEP-1238, LEP-963, Y7F and R25C) associated with the energetically expensive process of lactogenesis. Only SNP Y7F associated with energy storage. Associations were also observed between leptin polymorphisms and calving difficulty, gestation length and calf perinatal mortality. The lack of an association between the leptin variants investigated with calving interval in this large data set would question the potential importance of these leptin variants, or indeed leptin, in selection for improved fertility in the Holstein-Friesian dairy cow.
Background:
Gene flow maintains genetic diversity within a species and is influenced by individual behavior and the geographical features of the species' habitat. Here, we have characterized the geographical distribution of genetic patterns in giant pandas (Ailuropoda melanoleuca) living in four isolated patches of the Xiaoxiangling and Daxiangling Mountains. Three geographic distance definitions were used with the "isolation by distance theory": Euclidean distance (EUD), least-cost path distance (LCD) defined by food resources, and LCD defined by habitat suitability.
Results:
A total of 136 genotypes were obtained from 192 fecal samples and one blood sample, corresponding to 53 unique genotypes. Geographical maps plotted at high resolution using smaller neighborhood radius definitions produced large cost distances, because smaller radii include a finer level of detail in considering each pixel. Mantel tests showed that most correlation indices, particularly bamboo resources defined for different sizes of raster cell, were slightly larger than the correlations calculated for the Euclidean distance, with the exception of Patch C. We found that natural barriers might have decreased gene flow between the Xiaoxiangling and Daxiangling regions.
Conclusions:
Landscape features were found to partially influence gene flow in the giant panda population. This result is closely linked to the biological character and behavior of giant pandas because, as bamboo feeders, individuals spend most of their lives eating bamboo or moving within the bamboo forest. Landscape-based genetic analysis suggests that gene flow will be enhanced if the connectivity between currently fragmented bamboo forests is increased.
Background:
The Alaskan sled dog offers a rare opportunity to investigate the development of a dog breed based solely on performance, rather than appearance, thus setting the breed apart from most others. Several established breeds, many of which are recognized by the American Kennel Club (AKC), have been introduced into the sled dog population to enhance racing performance. We have used molecular methods to ascertain the constitutive breeds used to develop successful sled dog lines, and in doing so, determined the breed origins of specific performance-related behaviors.One hundred and ninety-nine Alaskan sled dogs were genotyped using 96 microsatellite markers that span the canine genome. These data were compared to that from 141 similarly genotyped purebred dog breeds. Sled dogs were evaluated for breed composition based on a variety of performance phenotypes including speed, endurance and work ethic, and the data stratified based on population structure.
Results:
We observe that the Alaskan sled dog has a unique molecular signature and that the genetic profile is sufficient for identifying dogs bred for sprint versus distance. When evaluating contributions of existing breeds we find that the Alaskan Malamute and Siberian Husky contributions are associated with enhanced endurance; Pointer and Saluki are associated with enhanced speed and the Anatolian Shepherd demonstrates a positive influence on work ethic.
Conclusion:
We have established a genetic breed profile for the Alaskan sled dog, identified profile variance between sprint and distance dogs, and established breeds associated with enhanced performance attributes. These data set the stage for mapping studies aimed at finding genes that are associated with athletic attributes integral to the high performing Alaskan sled dog.