Ancestry analysis

Ancestry modules

Below are the ancestry modules currently available on Genostruct. Use this section as the main listing page for ancestry workflows as additional modules are added.

Modules

Available Ancestry Modules

Works with your raw DNA export

23andMeAncestryDNAFTDNAMyHeritage

Upload a .txt, .csv, .zip, or .gz raw data export from any of these providers.

Method

How Genostruct Models Ancestry

This framework is designed for model-based ancestry inference, using real ancient DNA and population-genetic statistics rather than nearest-point matching on a compressed map.

01

Methodology

Our ancestry framework is built on statistical inference, not nearest-point matching. It compares your DNA against real ancient DNA and other reference populations, then tests which combinations of references best explain your profile.

Instead of reducing your genome to a small set of visual coordinates and asking which sample is closest, the method evaluates ancestry as a fitted population model. It searches across many possible source combinations, estimates best-fit mixture proportions, and measures how well each model explains the observed data. The result is a structured ancestry fit, not a geometric approximation.

This approach is designed to capture ancestry as composition: which combination of reference populations most plausibly accounts for your genetic profile. That is fundamentally different from asking who lies nearest on a PCA map.

02

Why It's Better Than PCA

PCA is useful for visualization, but Euclidean distance in PCA space is not the same thing as ancestry modeling. A PCA plot is a projection of very high-dimensional genetic data onto a small number of axes chosen to capture major variance. That makes it useful for seeing broad structure, but it does not directly encode admixture proportions or full genetic composition.

In other words, PCA distance tells you how close two projected points are after compression. It does not formally test whether one individual can be explained as a mixture of multiple reference populations, and it does not use a population-genetic fitting framework to estimate that mixture.

Our method is stronger because it models ancestry directly. Rather than relying on geometric proximity in a reduced space, it evaluates how well different reference combinations explain genome-wide patterns of shared genetic drift. That gives you a best-fit ancestry model with interpretable proportions, instead of a nearest-cluster approximation.

So while both approaches involve fit in a broad sense, PCA minimizes distance in a projection. This method minimizes model mismatch in a population-genetic framework. That is a much more meaningful basis for ancestry inference.

03

How To Read Your Results

Your results should be read as model-based ancestry estimates. The reported components are the reference populations that best explain your DNA within this statistical framework.

They are not claims that these exact labeled groups are your literal direct ancestors. They are genetic proxies: reference populations whose shared genetic drift patterns provide the strongest explanation for your profile.

The larger components usually represent the clearest signals. Smaller components should be interpreted more carefully, especially when several high-quality models produce similar overall fits. The most useful way to read the result is as a best-supported ancestry composition, not as a simple nearest match.

04

Scientific Basis

The scientific basis of the method is population-genetic statistical inference using genome-wide patterns of shared genetic drift via f4 statistics. These statistics are widely used in ancient DNA research because they measure how populations relate to one another through shared ancestry and drift, rather than through simple visual similarity.

By working with genetic drift relationships, the model can test whether your DNA is better explained by one reference population, or by a mixture of several. It then compares alternative models, estimates best-fit proportions, and ranks them by how well they explain the data.

That is why this approach is more scientifically informative than PCA distance alone: it is not just measuring closeness after dimensionality reduction, but formally testing ancestry composition using real ancient DNA, reference populations, and population-genetic statistics.