Link to legacy Y-DNA Comparision Utility: ySearch Compatible
What's New (March 18, 2014):
- Fix output to be W3C compliant (Thanks to Colin Ferguson!)
- Support links to FTDNA's new genetic distance relatedness faq items
- Fix modal determination for partial data sets
- Fix haplotype display for partial data sets
- Updated description for mutation rates
What's New (July 15, 2012):
- Fix display of modal when no data exists in a column and omits modal when fewer than three cells
with values (Thanks to Colin Ferguson!)
- Fix GD and GMRCA calculations when multi-copy markers are disabled (Thanks to Colin Ferguson!)
- Fix broken links for FTDNA's Interpreting Genetic Distance for various markers.
What's New (May 1, 2011):
This version of the Y-DNA Comparison Utility has several new features:
- 111 Marker support: the new FTDNA results of 111 markers are now supported. These are
supported through 102 cells, some of which have multi-copy markers
- Multi-copy markers are supported: multiple values in one cell, separated by a hyphen, "-", are
supported.
- All multi-copy markers use the infinite allele method of calculating genetic distance regardless
of the setup setting. All multi-copy markers use the same mutation rate within each multi-copy group
- All single-copy markers use either the stepwise or infinite allele method as specified in the setup
- A new output box is supported to show the setup values of all user-modifiable checkboxes, text boxes, and radio buttons. This is enabled by the "Show Setup Data" checkbox.
- A new command button is supported to set the setup values of one or more user-modifiable
checkboxes, text boxes and radio buttons. Use the same box for haplotype data, but paste the
saved setup data. This allows you to save and use various configurations for specialized analysis.
- Individual mutation rates can be set using the new setup function. These values are effective
when the new "Custom" mutation rate is selected
- A new CSV, comma separated value, output format is supported so you can paste the output of the
comparison table to a .csv file for loading into a spreadsheet that can be loaded in older versions
of Excel with acceptable handling of multi-copy markers. This is enabled by the new
"Show Data in CSV Format" checkbox
This utility is free for your use.
You are welcome to use this utility and it's output as you see fit.
Instructions
- Processing large datasets
When using Y-Utility with large datasets, you may get the following message repeatedly in Internet Explorer:
A script on this page is causing Internet Explorer to run slowly. If it continues to run, your computer may become unresponsive. Do you want to abort the script?
If you frequently process large datasets, you may want to edit the
Windows registry to increase the processing time allowed for scripts.
By default Windows supports 5,000,000 script instruction executions
before posting the warning. If you are experienced in performing registry
edits, you may be interested in the Microsoft Knowledge Base article
How To Set Timeout Period for Script.
This procedure is not for novices. A mistake in performing registry
edits can make your computer unusable.
Should you decide to perform the edit, a value of 100,000,000 would give
twenty times more script execution time than the default. There may be similar
procedures for other browsers.
- Copying haplotype data
This utility is designed for copy-and-paste input of haplotype data
from web pages, especially from the new format FTDNA public project pages,
or Excel tables. The haplotype data must be in
the normal order provided by FTDNA.
No non-numeric characters other than whitespace and hyphens ("-", for multi-copy markers)
are allowed after the first
column of allele data.
Normally generated tables of data will copy correctly, but if you are copying
from a manually created table where there may be extra whitespace between
allele values this utility may not parse the data correctly. You can either
edit the data so that no more than one blank space exists between allele values (or three blank spaces for a missing value), or you can copy the data first
to an Excel spreadsheet then copy from Excel and paste into this utility.
Data copied from an Excel spreadsheet is tab-delimited, and this utility should
be able to parse it without problem.
- Y-DNA Allele setup, "Exists" checkboxes
The default setup of this utility should handle most cases.
If the allele exists
in your data then the corresponding checkbox must be checked. If the allele
does not exist in your data then the corresponding checkbox must not be
checked.
- "Enabled" checkboxes
You can enable or disable comparison of any allele. If the "Enabled" checkbox
is not checked then the data for that allele will not be used in genetic
distance or MRCA calculations. Also, it will not be included in SMGF links
and color highlighting will not be performed for FTDNA and SMGF tables.
- ID Column
By default all characters before the first data column for each haplotype
will be used as identification info. You can select a particular column to
use for identification.
- 1st Data Column
By default this utility will try to identify the first column of allele data
for each haplotype. It does this by looking for the first three consecutive
numeric tokens that are within the limits for the first three alleles.
Normally this works without issue, but you can construct a table of data where
this technique fails, in which case you can specify the starting column of the
allele data. Even this may fail in some cases, in which event, you should
copy the data first to an Excel spreadsheet then paste from Excel into this
utility.
- FTDNA order haplotype comparison
If this table is enabled, then a listing of all the haplotype data is
generated in the FTDNA order. Copying and pasting directly from public FTDNA surname project
tables should work without issue.
If the modal haplotype is enabled then a modal haplotype will be
determined and listed as the first row. If difference highlighting is
enabled, then difference between the reference haplotype and every other
haplotype for each allele will be determined. The background color will
be set according to the following:
- 0 : none
- 1 : green
- 2 : yellow
- 3+ : pink
- Fluxus Phylogenetic Analysis Software
The .ych data is one of the formats supported by the Phylogenetic Network
software available from
Fluxus Technology from their
Network Software
page. You need to make sure that the haplotype data for all sets of results
have the same number of markers. You may want to edit the ID field either in
the source file or in the .ych data. The Network software supports five
character IDs. For duplicated haplotype data, the ID of the first set is used.
For an example of the output, please refer to:
McGee Network Phylogenetic Diagrams
There is a 2MB Flash movie of the phylogenetic network creation process:
McGee Network Creation Demonstration
- Genetic Distance
If this table is enabled then a table of genetic distances between every pair
of haplotypes is created. The diagonal elements of the table indicate the
number of allele data existing for that haplotype.
The calculations can be of two types:
- Hybrid Mutation Model
The target of this model is to match that used by Ysearch and FTDNA. It uses
the stepwise mutation model for all alleles except DYS464 and YCA which use
the infinite allele model. The stepwise model says that each mutation is
allowed to change the allele value by exactly one, so a difference of two
means that two mutations occurred and a difference of three means that three
mutations occurred. The infinite allele model says that the entire difference
between allele values, no matter how large, is the result of one mutation.
Please notify me if you find
any inconsistencies between the genetic distances calculated here and those
provided by Ysearch or FTDNA.
- Infinite Allele Mutation Model
The infinite allele model says that the entire difference between allele
values, no matter how large, is the result of one mutation.
The close distances are highlighted for easier visualization of possible
family relationships by the following:
- Related : green
- Probably Related : yellow
- Possibly Related : pink
- TMRCA (Time to the Most Recent Common Ancestor)
If this table is enabled then a table of generations or years to the
most recent
common ancestor for every pair of haplotypes is created. The number listed
is either the 50% or 95% probability that the MRCA was no longer than
the specified
number of years or generations. The algorithm taken from Bruce Walsh paper,
Estimating the Time to the Most Recent Common Ancestor for the
Y chromosome or Mitochondrial DNA for a Pair of Individuals.
There are other utilities related to MRCA calculations and can be found
on the Y-DNA tools page of the ISOGG, International Society of Genetic Genealogy, website.
The TMRCA calculations use the average mutation rate for
all the markers common between the pair of haplotypes being compared.
There are four selections for how the mutation rate is determined.
-
The first option for mutation rate calculation uses a constant mutation
rate for all markers. This constant mutation rate can be changed in the box provided.
The default mutation rate is 0.0024
mutations/allele/generation which
represents the 60 total mutations during 24870 total allele meioses as
given in
Y-Chromosomal Microsatellite Mutation Rates: Differences
in Mutation Rate Between and Within Loci
by B.Myhre Dupuy, M.Stenerson,
T.Egeland, and B.Olaisen; Human Mutation 23:117-124 (2004). Note that
this rate does not include rates of some of the newer fast-mutating
alleles that are currently being used.
-
The second mutation rate selection uses the FTDNA derived mutation rates.
This includes a rate of 0.00399 for
the first 12 markers, 0.00481 for markers 13 through 25, and 0.00748 for the
markers 26 through 37. For markers outside the 37 used by FTDNA the value in the
constant textbox (default = 0.0024) is used.
-
The third mutation rate selection uses
the values determined by Doug McDonald as derived from the Sorenson database.
Markers not included in the Sorenson database are derived by Doug
through other means. Values not included in the McDonald rates use the value in the constant textbox
(default = 0.0024).
-
The default selection and the last mutation rate selection, "Custom", uses the McDonald Rates by default but can be modified
by using the setup data as described above.
The close distances are highlighted for easier visualization of possible
family relationships by the following:
- 0-9 Generations: green
- 10-19 Generations: yellow
- 20-29 Generations: pink
- 30-39 Generations: blue
Note that if units of years are chosen, the number of generations are mulitplied by the specified rate of years per generation.
- PHYLIP Data
If this item is checked then a textbox is generated that includes the TMRCA
output in a format that can be used by Kitsch.exe and other programs which are
part of the PHYLIP package for inferring phylogenetic relationships.
Please refer to The Hamm Surname DNA
Project and the
L. David Roper PHYLIP and TreeView instructions for additional insight
on using the PHYLIP software.
- Modal Haplotype
If this item is enabled then a modal haplotype will be created that is the
haplotype consisting of the allele values with the largest number of
occurrences in the haplotypes provided. In the event of a tie, the larger
allele value is used.
- Show Legends
This item causes a color legend to be associated with each chart for
haplotype, genetic distance or generations to most recent common ancestor
- Show status
This item causes a status box to be displayed at the top of the output window.
Large numbers of haplotypes may take some time to process and this status box
gives an indication of what items this utility is currently processing.
- Show ToolTip names
This item enables the marker name to be displayed by hovering the mouse over
the value in a results table.
- Show HTML Source
This item causes the HTML source code for each table to be output to a
text box for ease of cut-and-paste to surname web pages.
- Highlight Reference
If this item is enabled then the specified haplotype will be used as a
reference for the haplotype comparison table and will appear with white
background and blue fontcolor in all the tables.
- Execute Button
Pressing this button will cause a new browser window to open containing the
specified tables of data.
- Clear Button
Pressing this button will clear the input data textboxes but retain all
other configuration information.
- Example Data Button
Pressing this button will initialize the haplotype box with several sets of
McGee project example haplotype data. This may be useful in understanding how the
data should be formatted in the event that you have difficulty in using this
utility.
- Choose Colors Button
Pressing this will bring up a window that will allow you to change the colors used for
various aspects of this utility.
- Execute Setup Button
Pressing this button will treat the data input textbox as setup data. This uses the same
format that is included in the output "Setup Data" textbox if the "Show Setup Data" checkbox
is checked. This allows you
to set the setup values of one or more user-modifiable checkboxes, text boxes and radio
buttons. Use the same box for haplotype data, but paste the saved setup data.
This allows you to save and use various configurations for specialized analysis. This
provides a way to use custome mutation rates.
- Debug box
This textbox contains the processed data and includes line numbers. It may
be useful to use this to identify where the input data is incompatible with
the parsing algorithm used by this utility or to identify the row number
of the desired haplotype for Highlight Reference selection.
|
contact:
Dean McGee
Please notify me if some portion of this utility is not operating correctly
McGee Surname DNA Project
McGee Surname DNA Project Results
Last Modified: March 21, 2014