data:image/s3,"s3://crabby-images/25e93/25e93b0d625dae36911be7e60ac57a79bbb33350" alt="Subtitle workshop 2.5.1"
These are genes that are expressed in all samples. We are mostly interested in the peak on the right. There are a lot of genes that are not expressed (ie zero on x-axis) in any sample. What does `cr>3` do? Why did we use 3? Is it better than using `cr>0`?Īnd we can create a similar plot for detection rate across genes. None of the samples look bad enough to be removed. All samples are more or less close to the average. On average, about `r sprintf("%0.5f",mean(colSums(cr>3)))` genes are detected. Download the count table generated by featureCounts into the `data/` subdirectory and name it **gene_counts_original.tsv**.ĭownload_data( "data/gene_counts_original.tsv ")ĭownload_data( "data/metadata_original.csv ")Ĭo 3), ylab = "Number of detected genes ", las = 2) The first step is some data wrangling and clean-up to prepare the data for analyses.
data:image/s3,"s3://crabby-images/61980/619806b31e34e21f032f4e926ac9a7edb3aa29f6" alt="subtitle workshop 2.5.1 subtitle workshop 2.5.1"
Rafalib ::mypar( mar =c( 6, 2.5, 2.5, 1)) #sets nice arrangement for the whole document Library( rafalib) # nice plot arrangement
data:image/s3,"s3://crabby-images/035e6/035e6c17be53f45534bf2c3641ff37efa6bdde8d" alt="subtitle workshop 2.5.1 subtitle workshop 2.5.1"
Load the necessary R packages and source the download function. Create a directory named `data` in your current working directory for input and output files.ĭata preprocessing is done in R.
data:image/s3,"s3://crabby-images/25e93/25e93b0d625dae36911be7e60ac57a79bbb33350" alt="Subtitle workshop 2.5.1"