Y-Utility: Y-DNA Comparison Utility 111 Marker

Link to legacy Y-DNA Comparision Utility: ySearch Compatible

Generate Tables

Max alleles per row

Generate Fluxus phylogenetic network .ych data

Genetic Distance

Hybrid mutation model
Infinite allele mutation model

TMRCA (infinite allele model)

Generate PHYLIP data

Probability Units

50%
95%
Other %

Mutation Rate Constant:
FTDNA 0.004..0.0075
McDonald (0.0005..0.01)
Custom
Generations
Years
years/generation

General Setup Show Line Numbers
Create modal haplotype
Show Legends
Show Status
Show ToolTip names
Show Diagonal Count
Show HTML Source
Show Data in CSV format
Show Mutation Rates
Show Setup Data
Highlight Reference
Modal Reference
Row Reference
None

ID Column 1
2
3
4
All non-Data

1^st Data Column 2
3
4
5
6
Automatic

Quick Start: To quickly see what this utility does, press "Example Data" button then press the "Execute" button. Sample output will be generated in a new window.

instructions & info Public FTDNA McGee Results

Paste haplotype or setup data here:

Debug box:

What's New (March 18, 2014):

Fix output to be W3C compliant (Thanks to Colin Ferguson!)
Support links to FTDNA's new genetic distance relatedness faq items
Fix modal determination for partial data sets
Fix haplotype display for partial data sets
Updated description for mutation rates

What's New (July 15, 2012):

Fix display of modal when no data exists in a column and omits modal when fewer than three cells with values (Thanks to Colin Ferguson!)
Fix GD and GMRCA calculations when multi-copy markers are disabled (Thanks to Colin Ferguson!)
Fix broken links for FTDNA's Interpreting Genetic Distance for various markers.

What's New (May 1, 2011):
This version of the Y-DNA Comparison Utility has several new features:

111 Marker support: the new FTDNA results of 111 markers are now supported. These are supported through 102 cells, some of which have multi-copy markers
Multi-copy markers are supported: multiple values in one cell, separated by a hyphen, "-", are supported.
All multi-copy markers use the infinite allele method of calculating genetic distance regardless of the setup setting. All multi-copy markers use the same mutation rate within each multi-copy group
All single-copy markers use either the stepwise or infinite allele method as specified in the setup
A new output box is supported to show the setup values of all user-modifiable checkboxes, text boxes, and radio buttons. This is enabled by the "Show Setup Data" checkbox.
A new command button is supported to set the setup values of one or more user-modifiable checkboxes, text boxes and radio buttons. Use the same box for haplotype data, but paste the saved setup data. This allows you to save and use various configurations for specialized analysis.
Individual mutation rates can be set using the new setup function. These values are effective when the new "Custom" mutation rate is selected
A new CSV, comma separated value, output format is supported so you can paste the output of the comparison table to a .csv file for loading into a spreadsheet that can be loaded in older versions of Excel with acceptable handling of multi-copy markers. This is enabled by the new "Show Data in CSV Format" checkbox

This utility is free for your use. You are welcome to use this utility and it's output as you see fit.

Instructions

Processing large datasets
When using Y-Utility with large datasets, you may get the following message repeatedly in Internet Explorer:
A script on this page is causing Internet Explorer to run slowly. If it continues to run, your computer may become unresponsive. Do you want to abort the script?
If you frequently process large datasets, you may want to edit the Windows registry to increase the processing time allowed for scripts. By default Windows supports 5,000,000 script instruction executions before posting the warning. If you are experienced in performing registry edits, you may be interested in the Microsoft Knowledge Base article How To Set Timeout Period for Script. This procedure is not for novices. A mistake in performing registry edits can make your computer unusable. Should you decide to perform the edit, a value of 100,000,000 would give twenty times more script execution time than the default. There may be similar procedures for other browsers.
Copying haplotype data
This utility is designed for copy-and-paste input of haplotype data from web pages, especially from the new format FTDNA public project pages, or Excel tables. The haplotype data must be in the normal order provided by FTDNA.
No non-numeric characters other than whitespace and hyphens ("-", for multi-copy markers) are allowed after the first column of allele data.
Normally generated tables of data will copy correctly, but if you are copying from a manually created table where there may be extra whitespace between allele values this utility may not parse the data correctly. You can either edit the data so that no more than one blank space exists between allele values (or three blank spaces for a missing value), or you can copy the data first to an Excel spreadsheet then copy from Excel and paste into this utility. Data copied from an Excel spreadsheet is tab-delimited, and this utility should be able to parse it without problem.
Y-DNA Allele setup, "Exists" checkboxes
The default setup of this utility should handle most cases. If the allele exists in your data then the corresponding checkbox must be checked. If the allele does not exist in your data then the corresponding checkbox must not be checked.
"Enabled" checkboxes
You can enable or disable comparison of any allele. If the "Enabled" checkbox is not checked then the data for that allele will not be used in genetic distance or MRCA calculations. Also, it will not be included in SMGF links and color highlighting will not be performed for FTDNA and SMGF tables.
ID Column
By default all characters before the first data column for each haplotype will be used as identification info. You can select a particular column to use for identification.
1^st Data Column
By default this utility will try to identify the first column of allele data for each haplotype. It does this by looking for the first three consecutive numeric tokens that are within the limits for the first three alleles. Normally this works without issue, but you can construct a table of data where this technique fails, in which case you can specify the starting column of the allele data. Even this may fail in some cases, in which event, you should copy the data first to an Excel spreadsheet then paste from Excel into this utility.
FTDNA order haplotype comparison
If this table is enabled, then a listing of all the haplotype data is generated in the FTDNA order. Copying and pasting directly from public FTDNA surname project tables should work without issue. If the modal haplotype is enabled then a modal haplotype will be determined and listed as the first row. If difference highlighting is enabled, then difference between the reference haplotype and every other haplotype for each allele will be determined. The background color will be set according to the following:
- 0 : none
- 1 : green
- 2 : yellow
- 3+ : pink
Fluxus Phylogenetic Analysis Software The .ych data is one of the formats supported by the Phylogenetic Network software available from Fluxus Technology from their Network Software page. You need to make sure that the haplotype data for all sets of results have the same number of markers. You may want to edit the ID field either in the source file or in the .ych data. The Network software supports five character IDs. For duplicated haplotype data, the ID of the first set is used.
For an example of the output, please refer to: McGee Network Phylogenetic Diagrams
There is a 2MB Flash movie of the phylogenetic network creation process: McGee Network Creation Demonstration
Genetic Distance
If this table is enabled then a table of genetic distances between every pair of haplotypes is created. The diagonal elements of the table indicate the number of allele data existing for that haplotype. The calculations can be of two types:
1. Hybrid Mutation Model
  The target of this model is to match that used by Ysearch and FTDNA. It uses the stepwise mutation model for all alleles except DYS464 and YCA which use the infinite allele model. The stepwise model says that each mutation is allowed to change the allele value by exactly one, so a difference of two means that two mutations occurred and a difference of three means that three mutations occurred. The infinite allele model says that the entire difference between allele values, no matter how large, is the result of one mutation.
  Please notify me if you find any inconsistencies between the genetic distances calculated here and those provided by Ysearch or FTDNA.
2. Infinite Allele Mutation Model
  The infinite allele model says that the entire difference between allele values, no matter how large, is the result of one mutation.
The close distances are highlighted for easier visualization of possible family relationships by the following:
- Related : green
- Probably Related : yellow
- Possibly Related : pink
TMRCA (Time to the Most Recent Common Ancestor)
If this table is enabled then a table of generations or years to the most recent common ancestor for every pair of haplotypes is created. The number listed is either the 50% or 95% probability that the MRCA was no longer than the specified number of years or generations. The algorithm taken from Bruce Walsh paper, Estimating the Time to the Most Recent Common Ancestor for the Y chromosome or Mitochondrial DNA for a Pair of Individuals. There are other utilities related to MRCA calculations and can be found on the Y-DNA tools page of the ISOGG, International Society of Genetic Genealogy, website.
The TMRCA calculations use the average mutation rate for all the markers common between the pair of haplotypes being compared.
There are four selections for how the mutation rate is determined.
- The first option for mutation rate calculation uses a constant mutation rate for all markers. This constant mutation rate can be changed in the box provided. The default mutation rate is 0.0024 mutations/allele/generation which represents the 60 total mutations during 24870 total allele meioses as given in Y-Chromosomal Microsatellite Mutation Rates: Differences in Mutation Rate Between and Within Loci by B.Myhre Dupuy, M.Stenerson, T.Egeland, and B.Olaisen; Human Mutation 23:117-124 (2004). Note that this rate does not include rates of some of the newer fast-mutating alleles that are currently being used.
- The second mutation rate selection uses the FTDNA derived mutation rates. This includes a rate of 0.00399 for the first 12 markers, 0.00481 for markers 13 through 25, and 0.00748 for the markers 26 through 37. For markers outside the 37 used by FTDNA the value in the constant textbox (default = 0.0024) is used.
- The third mutation rate selection uses the values determined by Doug McDonald as derived from the Sorenson database. Markers not included in the Sorenson database are derived by Doug through other means. Values not included in the McDonald rates use the value in the constant textbox (default = 0.0024).
- The default selection and the last mutation rate selection, "Custom", uses the McDonald Rates by default but can be modified by using the setup data as described above.
The close distances are highlighted for easier visualization of possible family relationships by the following:
- 0-9 Generations: green
- 10-19 Generations: yellow
- 20-29 Generations: pink
- 30-39 Generations: blue
Note that if units of years are chosen, the number of generations are mulitplied by the specified rate of years per generation.
PHYLIP Data
If this item is checked then a textbox is generated that includes the TMRCA output in a format that can be used by Kitsch.exe and other programs which are part of the PHYLIP package for inferring phylogenetic relationships. Please refer to The Hamm Surname DNA Project and the L. David Roper PHYLIP and TreeView instructions for additional insight on using the PHYLIP software.
Modal Haplotype
If this item is enabled then a modal haplotype will be created that is the haplotype consisting of the allele values with the largest number of occurrences in the haplotypes provided. In the event of a tie, the larger allele value is used.
Show Legends
This item causes a color legend to be associated with each chart for haplotype, genetic distance or generations to most recent common ancestor
Show status
This item causes a status box to be displayed at the top of the output window. Large numbers of haplotypes may take some time to process and this status box gives an indication of what items this utility is currently processing.
Show ToolTip names
This item enables the marker name to be displayed by hovering the mouse over the value in a results table.
Show HTML Source
This item causes the HTML source code for each table to be output to a text box for ease of cut-and-paste to surname web pages.
Highlight Reference
If this item is enabled then the specified haplotype will be used as a reference for the haplotype comparison table and will appear with white background and blue fontcolor in all the tables.
Execute Button
Pressing this button will cause a new browser window to open containing the specified tables of data.
Clear Button
Pressing this button will clear the input data textboxes but retain all other configuration information.
Example Data Button
Pressing this button will initialize the haplotype box with several sets of McGee project example haplotype data. This may be useful in understanding how the data should be formatted in the event that you have difficulty in using this utility.
Choose Colors Button
Pressing this will bring up a window that will allow you to change the colors used for various aspects of this utility.
Execute Setup Button
Pressing this button will treat the data input textbox as setup data. This uses the same format that is included in the output "Setup Data" textbox if the "Show Setup Data" checkbox is checked. This allows you to set the setup values of one or more user-modifiable checkboxes, text boxes and radio buttons. Use the same box for haplotype data, but paste the saved setup data. This allows you to save and use various configurations for specialized analysis. This provides a way to use custome mutation rates.
Debug box
This textbox contains the processed data and includes line numbers. It may be useful to use this to identify where the input data is incompatible with the parsing algorithm used by this utility or to identify the row number of the desired haplotype for Highlight Reference selection.

contact:
Dean McGee
Please notify me if some portion of this utility is not operating correctly

McGee Surname DNA Project
McGee Surname DNA Project Results

Last Modified: March 21, 2014