
These are genes that are expressed in all samples. We are mostly interested in the peak on the right. There are a lot of genes that are not expressed (ie zero on x-axis) in any sample. What does `cr>3` do? Why did we use 3? Is it better than using `cr>0`?Īnd we can create a similar plot for detection rate across genes. None of the samples look bad enough to be removed. All samples are more or less close to the average. On average, about `r sprintf("%0.5f",mean(colSums(cr>3)))` genes are detected. Download the count table generated by featureCounts into the `data/` subdirectory and name it **gene_counts_original.tsv**.ĭownload_data( "data/gene_counts_original.tsv ")ĭownload_data( "data/metadata_original.csv ")Ĭo 3), ylab = "Number of detected genes ", las = 2) The first step is some data wrangling and clean-up to prepare the data for analyses.

Rafalib ::mypar( mar =c( 6, 2.5, 2.5, 1)) #sets nice arrangement for the whole document Library( rafalib) # nice plot arrangement

Load the necessary R packages and source the download function. Create a directory named `data` in your current working directory for input and output files.ĭata preprocessing is done in R.
