Mice data exercise

1. Find out how many individuals and how many SNPs there are in the mice data. Which chromosomes are considered in the data? Is it possible to check it for sex discrepancy?

There are 1940 individuals and 2984 SNPs. There is only data from chromosomes 1-4. You cannot look for chromosome X inbreeding

2. Are there any parents in the fam file? Hint: the fam file has just a backspace as a separator and is not tab-separated. This needs the specific option -d' ' for the cut command.

No parents’ information.

4. Is there any data for sex and phenotype in the .fam file? Is this data from a case-control study or a sample-based study?

No information in the fam file.

6. How can you find the minimum and maximum value of the phenotype (last column) using only the command line?

The minimum value is -4.13 and the maximum is 3.60.

Bash Bash kernel.

Commands for the solutions:

ln -sf ../../Data
# Number of individuals 
wc -l Data/mice.fam
1940 Data/mice.fam
# Number of variants
wc -l Data/mice.bim
2984 Data/mice.bim
# Chromosomes info
cut -f1 Data/mice.bim | sort | uniq -c 
    870 1
    757 2
    684 3
    673 4
# Mother and father info 
cut -f3 -d" " Data/mice.fam | sort | uniq -c 
cut -f4 -d" " Data/mice.fam | sort | uniq -c 
   1940 0
   1940 0
# View .fam file 
head -n 5 Data/mice.fam 
A048005080 A048005080 0 0 0 0.000000
A048006063 A048006063 0 0 0 0.000000
A048006555 A048006555 0 0 0 0.000000
A048007096 A048007096 0 0 0 0.000000
A048010273 A048010273 0 0 0 0.000000
# sex 
cut -f6 -d" " Data/mice.fam | sort | uniq -c 
   1940 0.000000
# phenotype 
cut -f6 -d" " Data/mice.fam | sort | uniq -c 
   1940 0.000000
# phenotype lowest value
cut -f3 -d" " Data/mice.pheno | sort -n | head -n1
-4.124257
# phenotype highest value
cut -f3 -d" " Data/mice.pheno | sort -n | tail -n1
3.594109