Last updated: 2020-03-10
Checks: 7 0
Knit directory: apaQTL/analysis/
This reproducible R Markdown analysis was created with workflowr (version 1.6.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20190411)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: data/.DS_Store
Ignored: data/ProSeq/
Ignored: output/.DS_Store
Untracked files:
Untracked: .Rprofile
Untracked: ._.DS_Store
Untracked: .gitignore
Untracked: @
Untracked: GEO_brimittleman/
Untracked: _workflowr.yml
Untracked: analysis/._PASdescriptiveplots.Rmd
Untracked: analysis/._cuttoffPercUsage.Rmd
Untracked: analysis/APApeak_Phenotype_GeneLocAnno.Nuclear.5perc.fc.gz.qqnorm.allChrom
Untracked: analysis/APApeak_Phenotype_GeneLocAnno.Total.5perc.fc.gz.qqnorm.allChrom
Untracked: analysis/QTLexampleplots.Rmd
Untracked: analysis/cuttoffPercUsage.Rmd
Untracked: analysis/eQTLoverlap.Rmd
Untracked: analysis/interpret verify bam.Rmd
Untracked: analysis/interpret_verifybam.Rmd
Untracked: analysis/mergeRNA.Rmd
Untracked: analysis/oldstuffNotNeeded.Rmd
Untracked: analysis/remove_badlines.Rmd
Untracked: analysis/totSpecInNuclear.Rmd
Untracked: analysis/totSpecIncludenotTested.Rmd
Untracked: analysis/totalspec.Rmd
Untracked: apaQTL.Rproj
Untracked: checksumsfastq.txt.gz
Untracked: code/.NascentRNAdtPlotFirstintronicPAS.sh.swp
Untracked: code/._Allsplicesite2fasta.py
Untracked: code/._ApaQTL_nominalNonnorm.sh
Untracked: code/._BothFracDTPlotGeneRegions.sh
Untracked: code/._BothFracDTPlotGeneRegions_normalized.sh
Untracked: code/._ClosestTissuePAS.sh
Untracked: code/._ColocApAeQTL.sh
Untracked: code/._ColocApAeQTL_PM.sh
Untracked: code/._Coloc_generalAPAeQTL.R
Untracked: code/._Coloc_generalAPAeQTL_PM.R
Untracked: code/._CreateRNALZforeQTLs.sh
Untracked: code/._CreateRNALZnucAPAqtls.sh
Untracked: code/._DistPAS2Sig_RandomIntron.py
Untracked: code/._EandPqtl_perm.sh
Untracked: code/._EandPqtls.sh
Untracked: code/._ExtractGene4eQTLLZ.py
Untracked: code/._ExtractGene4eQTLLZpy
Untracked: code/._ExtractGeneRNAAssoc.py
Untracked: code/._ExtractPAS4LZeQTLs.py
Untracked: code/._ExtractPAS4eQTLsLZ.sh
Untracked: code/._ExtractPASforLZ.py
Untracked: code/._ExtractPASforLZ_run.sh
Untracked: code/._FC_NucintornUpandDown.sh
Untracked: code/._FC_UTR.sh
Untracked: code/._FC_intornUpandDownsteamPAS.sh
Untracked: code/._FC_nascentseq.sh
Untracked: code/._FC_newPeaks_olddata.sh
Untracked: code/._HMMpermuteTotal.py
Untracked: code/._HmmPermute.py
Untracked: code/._IntronicPASDT.sh
Untracked: code/._LC_samplegroups.py
Untracked: code/._LD_qtl.sh
Untracked: code/._LD_snpsproxy.sh
Untracked: code/._MapAllRBP.sh
Untracked: code/._NascentRNAdtPlot.sh
Untracked: code/._NascentRNAdtPlot3UTRPAS.sh
Untracked: code/._NascentRNAdtPlotExcludeFirstintronicPAS.sh
Untracked: code/._NascentRNAdtPlotNucPAS.sh
Untracked: code/._NascentRNAdtPlotTotPAS.sh
Untracked: code/._NascentRNAdtPlotintronicPAS.sh
Untracked: code/._NascnetRNAdtPlotPAS.sh
Untracked: code/._NetSeq_fourthintronDT.sh
Untracked: code/._NomResfromPASSNP.py
Untracked: code/._NuclearPAS_5per.bed.py
Untracked: code/._NuclearandRNA5samp_dtplots.sh
Untracked: code/._PTTfacetboxplots.R
Untracked: code/._PrematureQTLNominal.sh
Untracked: code/._PrematureQTLPermuted.sh
Untracked: code/._QTL2bed.py
Untracked: code/._QTL2bed_withstrand.py
Untracked: code/._RBPdisrupt.sh
Untracked: code/._RNAbam2bw.sh
Untracked: code/._RNAseqDTplot.sh
Untracked: code/._Randomsplicesite2fasta.py
Untracked: code/._Rplots.pdf
Untracked: code/._RunRes2PAS.sh
Untracked: code/._SAF215upbed.py
Untracked: code/._SnakefilePAS
Untracked: code/._SnakefilefiltPAS
Untracked: code/._TESplots100bp.sh
Untracked: code/._TESplots150bp.sh
Untracked: code/._TESplots200bp.sh
Untracked: code/._TotalPAS_5perc.bed.py
Untracked: code/._Untitled
Untracked: code/._ZipandTabPheno.sh
Untracked: code/._aAPAqtl_nominal39ind.sh
Untracked: code/._allNucSpecQTLine.py
Untracked: code/._allNucSpecfromNonNorm.py
Untracked: code/._annotatePacBioPASregion.sh
Untracked: code/._annotatedPAS2bed.py
Untracked: code/._apaInPandE.py
Untracked: code/._apaQTLCorrectPvalMakeQQ.R
Untracked: code/._apaQTLCorrectpval_6or7a.R
Untracked: code/._apaQTL_Nominal.sh
Untracked: code/._apaQTL_nominalInclusive.sh
Untracked: code/._apaQTL_nominalv67.sh
Untracked: code/._apaQTL_permuted.sh
Untracked: code/._apaQTL_permuted_test6A7A.sh
Untracked: code/._apainRibo.py
Untracked: code/._assignNucIntonpeak2intronlocs.sh
Untracked: code/._assignTotIntronpeak2intronlocs.sh
Untracked: code/._bam2BW_5primemost.sh
Untracked: code/._bed2saf.py
Untracked: code/._bothFracDTplot1stintron.sh
Untracked: code/._bothFracDTplot4thintron.sh
Untracked: code/._bothFrac_FC.sh
Untracked: code/._callPeaksYL.py
Untracked: code/._changeRibonomQTLres2genename.py
Untracked: code/._changenomQTLres2geneName.py
Untracked: code/._chooseAnno2PAS_pacbio.py
Untracked: code/._chooseAnno2SAF.py
Untracked: code/._chooseSignalSite
Untracked: code/._chooseSignalSite.py
Untracked: code/._closestannotated.sh
Untracked: code/._closestannotated_byfrac.sh
Untracked: code/._cluster.json
Untracked: code/._clusterPAS.json
Untracked: code/._clusterfiltPAS.json
Untracked: code/._codingdms2bed.py
Untracked: code/._config.yaml
Untracked: code/._config2.yaml
Untracked: code/._configOLD.yaml
Untracked: code/._convertNominal2SNPLOC.py
Untracked: code/._convertNominal2SNPloc2Versions.py
Untracked: code/._convertNumeric.py
Untracked: code/._correctNomeqtl.R
Untracked: code/._createPlinkSampfile.py
Untracked: code/._dag.pdf
Untracked: code/._eQTL_switch2snploc.py
Untracked: code/._eQTLgenestestedapa.py
Untracked: code/._encodeRNADTplots.sh
Untracked: code/._extactPAS100meanphyloP.py
Untracked: code/._extractGeneLZfiles.sh
Untracked: code/._extractGeneLZfileseQTLs.sh
Untracked: code/._extractGenotypes.py
Untracked: code/._extractPACmeanPhyloP.py
Untracked: code/._extractPhylop50up.py
Untracked: code/._extractPhylopextra50.py
Untracked: code/._extractRNApval4lz.py
Untracked: code/._extractseqfromqtlfastq.py
Untracked: code/._fc2leafphen.py
Untracked: code/._fc_filteredPAS6and7As.sh
Untracked: code/._fifteenBPupstreamPAS.py
Untracked: code/._fiftyBPupstreamPAS.py
Untracked: code/._filter5perc.R
Untracked: code/._filter5percPheno.py
Untracked: code/._filterLDsnps.py
Untracked: code/._filterMPPAS.py
Untracked: code/._filterMPPAS_15.py
Untracked: code/._filterMPPAS_15_7As.py
Untracked: code/._filterMPPAS_50.py
Untracked: code/._filterSAFforMP.py
Untracked: code/._filterpeaks.py
Untracked: code/._finalPASbed2SAF.py
Untracked: code/._fix4su304corr.py
Untracked: code/._fix4su604corr.py
Untracked: code/._fix4sukalisto.py
Untracked: code/._fixExandUnexeQTL
Untracked: code/._fixExandUnexeQTL.py
Untracked: code/._fixFChead.py
Untracked: code/._fixFChead_bothfrac.py
Untracked: code/._fixFChead_short.py
Untracked: code/._fixGWAS4Munge.py
Untracked: code/._fixH3k12ac.py
Untracked: code/._fixPASregionSNPs.py
Untracked: code/._fixRNAhead4corr.py
Untracked: code/._fixRNAkalisto.py
Untracked: code/._fix_randomIntron.py
Untracked: code/._fixgroupedtranscript.py
Untracked: code/._fixhead_netseqfc.py
Untracked: code/._getAPAfromanyeQTL.py
Untracked: code/._getApapval4eqtl.py
Untracked: code/._getApapval4eqtl_unexp.py
Untracked: code/._getApapval4eqtl_version67.py
Untracked: code/._getDownstreamIntronNuclear.py
Untracked: code/._getIntronDownstreamPAS.py
Untracked: code/._getIntronUpstreamPAS.py
Untracked: code/._getQTLalleles.py
Untracked: code/._getQTLfastq.sh
Untracked: code/._getUpstreamIntronNuclear.py
Untracked: code/._grouptranscripts.py
Untracked: code/._intersectVCFandupPAS.sh
Untracked: code/._keep5perMAF.py
Untracked: code/._keepSNP_vcf.sh
Untracked: code/._make5percPeakbed.py
Untracked: code/._makeFileID.py
Untracked: code/._makePheno.py
Untracked: code/._makeSAFbothfrac5perc.py
Untracked: code/._makeSNP2rsidfile.py
Untracked: code/._makeeQTLempirical_unexp.py
Untracked: code/._makeeQTLempiricaldist.py
Untracked: code/._makegencondeTSSfile.py
Untracked: code/._mapSSsnps2PAS.sh
Untracked: code/._mergRNABam.sh
Untracked: code/._mergeAllBam.sh
Untracked: code/._mergeAnnotations.sh
Untracked: code/._mergeBW_norm.sh
Untracked: code/._mergeBamNascent.sh
Untracked: code/._mergeByFracBam.sh
Untracked: code/._mergePeaks.sh
Untracked: code/._miRNAdisrupt.sh
Untracked: code/._mnase1stintron.sh
Untracked: code/._mnaseDT_fourthintron.sh
Untracked: code/._namePeaks.py
Untracked: code/._netseqDTplot1stIntron.sh
Untracked: code/._netseqFC.sh
Untracked: code/._nucQTLGWAS.py
Untracked: code/._nucSpecQTLineData.py
Untracked: code/._nucSpeceffectsize.py
Untracked: code/._nucspecnucPASine.py
Untracked: code/._pQTLsotherdata.py
Untracked: code/._pacbioDT.sh
Untracked: code/._pacbioIntronicDT.sh
Untracked: code/._parseALLSSres.py
Untracked: code/._parseBestbamid.py
Untracked: code/._parseLDRes.py
Untracked: code/._parseLDresBothPAS.sh
Untracked: code/._parseRanodmSSres.py
Untracked: code/._parseSSres.py
Untracked: code/._peak2PAS.py
Untracked: code/._peakFC.sh
Untracked: code/._pheno2countonly.R
Untracked: code/._phenoQTLfromlist.py
Untracked: code/._processYRIgen.py
Untracked: code/._pttQTLsinapaQTL.py
Untracked: code/._qtlRegionseq.sh
Untracked: code/._qtlsPvalOppFrac.py
Untracked: code/._quantassign2parsedpeak.py
Untracked: code/._removeXfromHmm.py
Untracked: code/._removeloc_pheno.py
Untracked: code/._riboQTL.sh
Untracked: code/._runCorrectNomEqtl.sh
Untracked: code/._runFixGWAS4Munge.sh
Untracked: code/._runHMMpermuteAPAqtls.sh
Untracked: code/._runHMMpermuteeQTLS.sh
Untracked: code/._runMakeEmpiricaleQTL_unexp.sh
Untracked: code/._runMakeeQTLempirical.sh
Untracked: code/._run_bam2bw_all3prime.sh
Untracked: code/._run_bam2bw_extra3.sh
Untracked: code/._run_bestbamid.sj
Untracked: code/._run_dist2sig_randomintron.sh
Untracked: code/._run_filtersnpLD.sh
Untracked: code/._run_getAPAfromeQTL_version6.7.sh
Untracked: code/._run_getApaPval4eqtl.sh
Untracked: code/._run_getapafromeQTL.py
Untracked: code/._run_getapafromeQTL.sh
Untracked: code/._run_getapapval4eqtl_unexp.sh
Untracked: code/._run_leafcutterDiffIso.sh
Untracked: code/._run_prxySNP.sh
Untracked: code/._run_pttfacetboxplot.sh
Untracked: code/._run_sepUsagephen.sh
Untracked: code/._run_sepgenobychrom.sh
Untracked: code/._run_verifybam.sh
Untracked: code/._selectNominalPvalues.py
Untracked: code/._sepUsagePhen.py
Untracked: code/._sepgenobychrom.py
Untracked: code/._snakemakePAS.batch
Untracked: code/._snakemakefiltPAS.batch
Untracked: code/._sortindexRNAbam.sh
Untracked: code/._specAPAinE.py
Untracked: code/._splicesite2fasta.py
Untracked: code/._submit-snakemakePAS.sh
Untracked: code/._submit-snakemakefiltPAS.sh
Untracked: code/._subsetAPAnotEorPgene.py
Untracked: code/._subsetAPAnotEorPgene_2versions.py
Untracked: code/._subsetAPAnotEorR.py
Untracked: code/._subsetApanoteGene.py
Untracked: code/._subsetApanoteGene_2versions.py
Untracked: code/._subsetNootherQTL.py
Untracked: code/._subsetUnexplainedeQTLs.py
Untracked: code/._subsetVCF_SS.sh
Untracked: code/._subsetVCF_noSSregions.sh
Untracked: code/._subsetVCF_upstreamPAS.sh
Untracked: code/._subset_diffisopheno.py
Untracked: code/._subsetpermAPAwithGenelist.py
Untracked: code/._subsetpermAPAwithGenelist_2versions.py
Untracked: code/._subsetvcf_otherreg.sh
Untracked: code/._subsetvcf_permSS.sh
Untracked: code/._subtrachfiveprimeUTR.sh
Untracked: code/._subtractExons.sh
Untracked: code/._subtractfiveprimeUTR.sh
Untracked: code/._tabixSNPS.sh
Untracked: code/._tenBPupstreamPAS.py
Untracked: code/._test.pdf
Untracked: code/._testVerifyBam.sh
Untracked: code/._tissuePAS2hg19.sh
Untracked: code/._totSeceffectsize.py
Untracked: code/._totspecinE.py
Untracked: code/._twentyBPupstreamPAS.py
Untracked: code/._utrdms2saf.py
Untracked: code/._vcf2bed.py
Untracked: code/._verifyBam18517N.sh
Untracked: code/._verifyBam18517T.sh
Untracked: code/._verifyBam19128N.sh
Untracked: code/._verifyBam19128T.sh
Untracked: code/._wrap_verifybam.sh
Untracked: code/._writePTTexamplecode.py
Untracked: code/._writePTTexamplecode.sh
Untracked: code/.pversion
Untracked: code/.snakemake/
Untracked: code/1
Untracked: code/APAqtl_nominal.err
Untracked: code/APAqtl_nominal.out
Untracked: code/APAqtl_nominal_39.err
Untracked: code/APAqtl_nominal_39.out
Untracked: code/APAqtl_nominal_inclusive.err
Untracked: code/APAqtl_nominal_inclusive.out
Untracked: code/APAqtl_nominal_nonNorm.err
Untracked: code/APAqtl_nominal_nonNorm.out
Untracked: code/APAqtl_nominal_versions67.err
Untracked: code/APAqtl_nominal_versions67.out
Untracked: code/APAqtl_permuted.err
Untracked: code/APAqtl_permuted.out
Untracked: code/APAqtl_permuted_versions67.err
Untracked: code/APAqtl_permuted_versions67.out
Untracked: code/Allsplicesite2fasta.py
Untracked: code/BothFracDTPlot1stintron.err
Untracked: code/BothFracDTPlot1stintron.out
Untracked: code/BothFracDTPlot4stintron.err
Untracked: code/BothFracDTPlot4stintron.out
Untracked: code/BothFracDTPlotGeneRegions.err
Untracked: code/BothFracDTPlotGeneRegions.out
Untracked: code/BothFracDTPlotGeneRegions_norm.err
Untracked: code/BothFracDTPlotGeneRegions_norm.out
Untracked: code/ClosestTissuePAS.sh
Untracked: code/ColocApAeQTL.err
Untracked: code/ColocApAeQTL.out
Untracked: code/ColocApAeQTL.sh
Untracked: code/ColocApAeQTLPM.err
Untracked: code/ColocApAeQTLPM.out
Untracked: code/ColocApAeQTL_PM.sh
Untracked: code/Coloc_generalAPAeQTL.R
Untracked: code/Coloc_generalAPAeQTL_PM.R
Untracked: code/CreateRNALZforeQTLs.sh
Untracked: code/CreateRNALZnucAPAqtls.sh
Untracked: code/DistPAS2Sig_RandomIntron.py
Untracked: code/EandPqtl.err
Untracked: code/EandPqtl.out
Untracked: code/EncodeRNADTPlotGeneRegions.err
Untracked: code/EncodeRNADTPlotGeneRegions.out
Untracked: code/ExtractGene4eQTLLZ.py
Untracked: code/ExtractGene4eQTLLZpy
Untracked: code/ExtractGeneRNAAssoc.py
Untracked: code/ExtractPAS4LZeQTLs.py
Untracked: code/ExtractPAS4eQTLsLZ.sh
Untracked: code/ExtractPASforLZ.py
Untracked: code/ExtractPASforLZ_run.sh
Untracked: code/FC_NucintronPASupandDown.err
Untracked: code/FC_NucintronPASupandDown.out
Untracked: code/FC_UTR.err
Untracked: code/FC_UTR.out
Untracked: code/FC_intronPASupandDown.err
Untracked: code/FC_intronPASupandDown.out
Untracked: code/FC_nascent.err
Untracked: code/FC_nascentout
Untracked: code/FC_newPAS_olddata.err
Untracked: code/FC_newPAS_olddata.out
Untracked: code/HmmPermute.p
Untracked: code/IntronicPASDT.err
Untracked: code/IntronicPASDT.out
Untracked: code/LD_vcftools.hap.out
Untracked: code/MapAllRBP.sh
Untracked: code/MapRBP.err
Untracked: code/MapRBP.out
Untracked: code/NascentDTPlotGeneRegions.err
Untracked: code/NascentDTPlotGeneRegions.out
Untracked: code/NascentDTPlotPAS.err
Untracked: code/NascentDTPlotPAS.out
Untracked: code/NascentDTPlotPAS_3utr.err
Untracked: code/NascentDTPlotPAS_3utr.out
Untracked: code/NascentDTPlotPAS_firstintron.err
Untracked: code/NascentDTPlotPAS_firstintron.out
Untracked: code/NascentDTPlotPAS_intron.err
Untracked: code/NascentDTPlotPAS_intron.out
Untracked: code/NascentDTPlotPAS_nuc.err
Untracked: code/NascentDTPlotPAS_nuc.out
Untracked: code/NascentDTPlotPAS_tot.err
Untracked: code/NascentDTPlotPAS_tot.out
Untracked: code/Nuclear_example.err
Untracked: code/Nuclear_example.out
Untracked: code/NuclearandRNA5samp_dtplots.sh
Untracked: code/NuclearandRNAFracDTPlotGeneRegions.err
Untracked: code/NuclearandRNAFracDTPlotGeneRegions.out
Untracked: code/PACbioDT.err
Untracked: code/PACbioDT.out
Untracked: code/PACbioDTitronic.err
Untracked: code/PACbioDTitronic.out
Untracked: code/Prematureqtl_nominal.err
Untracked: code/Prematureqtl_nominal.out
Untracked: code/Prematureqtl_permuted.err
Untracked: code/Prematureqtl_permuted.out
Untracked: code/RBPdisrupt.err
Untracked: code/RBPdisrupt.out
Untracked: code/RBPdisrupt.sh
Untracked: code/README.md
Untracked: code/RNABam2BW.err
Untracked: code/RNABam2BW.out
Untracked: code/RNAseqDTPlotGeneRegions.err
Untracked: code/RNAseqDTPlotGeneRegions.out
Untracked: code/Randomsplicesite2fasta.py
Untracked: code/Rplots.pdf
Untracked: code/TESplots100bp.err
Untracked: code/TESplots100bp.out
Untracked: code/TESplots150bp.err
Untracked: code/TESplots150bp.out
Untracked: code/TESplots200bp.err
Untracked: code/TESplots200bp.out
Untracked: code/Tissueclosestannotated.err
Untracked: code/Tissueclosestannotated.out
Untracked: code/Total_example.err
Untracked: code/Total_example.out
Untracked: code/Untitled
Untracked: code/YRI_LCL.vcf.gz
Untracked: code/YRI_LCL_chr1.vcf.gz.log
Untracked: code/YRI_LCL_chr1.vcf.gz.recode.vcf
Untracked: code/annotatedPASregion.err
Untracked: code/annotatedPASregion.out
Untracked: code/apaQTL_nominalInclusive.sh
Untracked: code/assignPeak2Intronicregion.err
Untracked: code/assignPeak2Intronicregion.out
Untracked: code/assigntotPeak2Intronicregion.err
Untracked: code/assigntotPeak2Intronicregion.out
Untracked: code/bam2bw.err
Untracked: code/bam2bw.out
Untracked: code/bam2bw_5primemost.err
Untracked: code/bam2bw_5primemost.out
Untracked: code/binary_fileset.log
Untracked: code/bothFrac_FC.err
Untracked: code/bothFrac_FC.out
Untracked: code/callSHscripts.txt
Untracked: code/closestannotated.err
Untracked: code/closestannotated.out
Untracked: code/closestannotatedbyfrac.err
Untracked: code/closestannotatedbyfrac.out
Untracked: code/dag.pdf
Untracked: code/dagPAS.pdf
Untracked: code/dagfiltPAS.pdf
Untracked: code/extactPAS100meanphyloP.py
Untracked: code/extractGeneLZfiles.err
Untracked: code/extractGeneLZfiles.out
Untracked: code/extractGeneLZfiles.sh
Untracked: code/extractGeneLZfileseQTLs.err
Untracked: code/extractGeneLZfileseQTLs.out
Untracked: code/extractGeneLZfileseQTLs.sh
Untracked: code/extractPACmeanPhyloP.py
Untracked: code/extractPASLZfiles.err
Untracked: code/extractPASLZfiles.out
Untracked: code/extractPASLZfileseQTLs.err
Untracked: code/extractPASLZfileseQTLs.out
Untracked: code/extractPhylop50up.py
Untracked: code/extractPhylopextra50.py
Untracked: code/extractRNApval4lz.py
Untracked: code/fixExandUnexeQTL
Untracked: code/fixGWAS4Munge.py
Untracked: code/fix_randomIntron.py
Untracked: code/fixmunge
Untracked: code/genotypesYRI.gen.proc.keep.vcf.log
Untracked: code/genotypesYRI.gen.proc.keep.vcf.recode.vcf
Untracked: code/getseq100up.err
Untracked: code/getseq100up.out
Untracked: code/grouptranscripts.err
Untracked: code/grouptranscripts.out
Untracked: code/intersectPAS_ssSNPS.err
Untracked: code/intersectPAS_ssSNPS.out
Untracked: code/intersectVCFPAS.err
Untracked: code/intersectVCFPAS.out
Untracked: code/liftoverPAShg38to19.err
Untracked: code/liftoverPAShg38to19.out
Untracked: code/log/
Untracked: code/logs/
Untracked: code/merge53PRIMEbam.err
Untracked: code/merge53PRIMEbam.out
Untracked: code/merge53RNAbam.err
Untracked: code/merge53prime.sh
Untracked: code/merge5RNABam.err
Untracked: code/merge5RNABam.out
Untracked: code/merge5RNAbam.out
Untracked: code/merge5RNAbam.sh
Untracked: code/mergeAnno.err
Untracked: code/mergeAnno.out
Untracked: code/mergeBWnorm.err
Untracked: code/mergeBWnorm.out
Untracked: code/mergeBamNacent.err
Untracked: code/mergeBamNacent.out
Untracked: code/mergeRNAbam.err
Untracked: code/mergeRNAbam.out
Untracked: code/miRNAdisrupt.err
Untracked: code/miRNAdisrupt.out
Untracked: code/miRNAdisrupt.sh
Untracked: code/mnaseDTPlot1stintron.err
Untracked: code/mnaseDTPlot1stintron.out
Untracked: code/mnaseDTPlot4thintron.err
Untracked: code/mnaseDTPlot4thintron.out
Untracked: code/netDTPlot4thintron.out
Untracked: code/netseqFC.err
Untracked: code/netseqFC.out
Untracked: code/neyDTPlot4thintron.err
Untracked: code/nucspecinE.py
Untracked: code/parseALLSSres.py
Untracked: code/parseLDRes.py
Untracked: code/parseLDres.err
Untracked: code/parseLDres.out
Untracked: code/parseLDresBothPAS.sh
Untracked: code/parseRanodmSSres.py
Untracked: code/parseSSres.py
Untracked: code/plink.log
Untracked: code/prxySNP.err
Untracked: code/prxySNP.out
Untracked: code/pttFacetBoxplots.err
Untracked: code/pttFacetBoxplots.out
Untracked: code/qtlFacetBoxplots.err
Untracked: code/qtlFacetBoxplots.out
Untracked: code/rLD_vcftools.hap.err
Untracked: code/riboqtl.err
Untracked: code/riboqtl.out
Untracked: code/runBestBamID.err
Untracked: code/runCorrectNomeqtl.err
Untracked: code/runCorrectNomeqtl.out
Untracked: code/runFilterLD.err
Untracked: code/runFilterLD.out
Untracked: code/runFixGWAS4Munge.sh
Untracked: code/runHMMpermute.err
Untracked: code/runHMMpermute.out
Untracked: code/runHMMpermuteeQTLs.err
Untracked: code/runHMMpermuteeQTLs.out
Untracked: code/runMakeEmpiricaleQTLs.err
Untracked: code/runMakeEmpiricaleQTLs.out
Untracked: code/runMakeEmpiricaleQTLsunex.err
Untracked: code/runMakeEmpiricaleQTLsunex.out
Untracked: code/run_DistPAS2Sig.err
Untracked: code/run_DistPAS2Sig.out
Untracked: code/run_DistPAS2Sig_intron.err
Untracked: code/run_DistPAS2Sig_intron.out
Untracked: code/run_bam2bw.err
Untracked: code/run_bam2bw.out
Untracked: code/run_bam2bwexta.err
Untracked: code/run_bam2bwexta.out
Untracked: code/run_dist2sig_randomintron.sh
Untracked: code/run_getAPAfromanyeQTL.err
Untracked: code/run_getAPAfromanyeQTL.out
Untracked: code/run_getApaPval4eQTLs.err
Untracked: code/run_getApaPval4eQTLs.out
Untracked: code/run_getApaPval4eQTLsunexplained.err
Untracked: code/run_getApaPval4eQTLsunexplained.out
Untracked: code/run_leafcutter_ds.err
Untracked: code/run_leafcutter_ds.out
Untracked: code/run_sepgenobychrom.err
Untracked: code/run_sepgenobychrom.out
Untracked: code/run_sepusage.err
Untracked: code/run_sepusage.out
Untracked: code/run_verifybam.err
Untracked: code/run_verifybam.out
Untracked: code/run_verifybam128N.err
Untracked: code/run_verifybam128N.out
Untracked: code/run_verifybam128T.err
Untracked: code/run_verifybam128T.out
Untracked: code/run_verifybam517N.err
Untracked: code/run_verifybam517N.out
Untracked: code/run_verifybam517T.err
Untracked: code/run_verifybam517T.out
Untracked: code/runprxySNP.err
Untracked: code/runprxySNP.out
Untracked: code/runres2pas.err
Untracked: code/runres2pas.out
Untracked: code/scripts/
Untracked: code/scripts_PAS_500_Lymph/
Untracked: code/seqQTLfastq.err
Untracked: code/seqQTLfastq.out
Untracked: code/seqQTLregion.err
Untracked: code/seqQTLregion.out
Untracked: code/snakePASlog.out
Untracked: code/snakefiltPASlog.out
Untracked: code/sortindexRNABam.err
Untracked: code/sortindexRNABam.out
Untracked: code/specAPAinE.py
Untracked: code/splicesite2fasta.py
Untracked: code/subsetAPAnotEorR.py
Untracked: code/subsetNootherQTL.py
Untracked: code/subsetvcf_SS.err
Untracked: code/subsetvcf_SS.out
Untracked: code/subsetvcf_noSS.err
Untracked: code/subsetvcf_noSS.out
Untracked: code/subsetvcf_pas.err
Untracked: code/subsetvcf_pas.out
Untracked: code/subsetvcf_perm.err
Untracked: code/subsetvcf_perm.out
Untracked: code/subsetvcf_rand.err
Untracked: code/subsetvcf_rand.out
Untracked: code/subtract5UTR.err
Untracked: code/subtract5UTR.out
Untracked: code/subtractExons.err
Untracked: code/subtractExons.out
Untracked: code/tabixSNPs.err
Untracked: code/tabixSNPs.out
Untracked: code/test.pdf
Untracked: code/testFix.txt
Untracked: code/test_verifybam.err
Untracked: code/test_verifybam.out
Untracked: code/tissuePAS2hg19.sh
Untracked: code/totspecinE.py
Untracked: code/vcf_keepsnps.err
Untracked: code/vcf_keepsnps.out
Untracked: code/wrap_verifybam.err
Untracked: code/wrap_verifybam.out
Untracked: code/zipandtabPhen.err
Untracked: code/zipandtabPhen.out
Untracked: data/._.DS_Store
Untracked: data/._MetaDataSequencing.txt
Untracked: data/AnnotatedPAS/
Untracked: data/ApaByEgene/
Untracked: data/ApaByPgene/
Untracked: data/ApaByRgene/
Untracked: data/BadLines/
Untracked: data/BaseComp/
Untracked: data/Battle_pQTL/
Untracked: data/CheckSums/
Untracked: data/CompareOldandNew/
Untracked: data/DTmatrix/
Untracked: data/DiffIso/
Untracked: data/EncodeRNA/
Untracked: data/ExampleQTLPlots/
Untracked: data/ExampleQTLPlots_update/
Untracked: data/ExpressionIndependentapaQTLs.txt
Untracked: data/FiveMergedBW/
Untracked: data/FiveMergedBam/
Untracked: data/FlaggedPAS/
Untracked: data/GWAS_overlap/
Untracked: data/Geuvadis/
Untracked: data/GeuvadisRNA/
Untracked: data/GeuvadiseQTL/
Untracked: data/HMMqtls/
Untracked: data/LDSR_annotations/
Untracked: data/LZ_both/
Untracked: data/Li_eQTLs/
Untracked: data/NMD/
Untracked: data/NascentRNA/
Untracked: data/NucSpeceQTLeffect/
Untracked: data/PAS/
Untracked: data/PAS_postFlag/
Untracked: data/PolyA_DB/
Untracked: data/PreTerm_pheno/
Untracked: data/PrematureQTLNominal/
Untracked: data/PrematureQTLPermuted/
Untracked: data/QTLGenotypes/
Untracked: data/QTLoverlap/
Untracked: data/QTLoverlap_inclusive/
Untracked: data/QTLoverlap_nonNorm/
Untracked: data/README.md
Untracked: data/RNAseq/
Untracked: data/Reads2UTR/
Untracked: data/SNPinSS/
Untracked: data/SignalSiteFiles/
Untracked: data/TF_motifdisruption/
Untracked: data/TSS/
Untracked: data/ThirtyNineIndQtl_nominal/
Untracked: data/TissueData/
Untracked: data/Version15bp6As/
Untracked: data/Version15bp7As/
Untracked: data/apaQTLNominal/
Untracked: data/apaQTLNominal_4pc/
Untracked: data/apaQTLNominal_inclusive/
Untracked: data/apaQTLPermuted/
Untracked: data/apaQTLPermuted_4pc/
Untracked: data/apaQTLs/
Untracked: data/assignedPeaks/
Untracked: data/assignedPeaks_15Up/
Untracked: data/bam/
Untracked: data/bam_clean/
Untracked: data/bam_waspfilt/
Untracked: data/bed_10up/
Untracked: data/bed_clean/
Untracked: data/bed_clean_sort/
Untracked: data/bed_waspfilter/
Untracked: data/bedsort_waspfilter/
Untracked: data/bothFrac_FC/
Untracked: data/bw/
Untracked: data/bw_norm/
Untracked: data/coloc/
Untracked: data/coloc_PM/
Untracked: data/eCLip/
Untracked: data/eQTL_LZ/
Untracked: data/eQTLs/
Untracked: data/exampleQTLs/
Untracked: data/exosome/
Untracked: data/fastq/
Untracked: data/filterPeaks/
Untracked: data/fourSU/
Untracked: data/h3k27ac/
Untracked: data/highdiffsiggenes.txt
Untracked: data/inclusivePeaks/
Untracked: data/inclusivePeaks_FC/
Untracked: data/intronRNAratio/
Untracked: data/intron_analysis/
Untracked: data/locusZoom/
Untracked: data/mergedBG/
Untracked: data/mergedBW_byfrac/
Untracked: data/mergedBW_norm/
Untracked: data/mergedBam/
Untracked: data/mergedbyFracBam/
Untracked: data/miRNAbinding/
Untracked: data/molPhenos/
Untracked: data/molQTLs/
Untracked: data/motifdistrupt/
Untracked: data/nPAS/
Untracked: data/netseq/
Untracked: data/nonNorm_pheno/
Untracked: data/nuc_10up/
Untracked: data/nuc_10upclean/
Untracked: data/oldPASfiles/
Untracked: data/overlapeQTL_try2/
Untracked: data/overlapeQTLs/
Untracked: data/pQTLoverlap/
Untracked: data/pacbio/
Untracked: data/peakCoverage/
Untracked: data/peaks_5perc/
Untracked: data/phenotype/
Untracked: data/phenotype_5perc/
Untracked: data/phenotype_inclusivePAS/
Untracked: data/phylop/
Untracked: data/pttQTL/
Untracked: data/pttQTLplots/
Untracked: data/sigDiffGenes.txt
Untracked: data/sort/
Untracked: data/sort_clean/
Untracked: data/sort_waspfilter/
Untracked: data/splicesite/
Untracked: data/twoMech/
Untracked: data/vareQTLvarAPAqtl/
Untracked: data/verifyBAM/
Untracked: data/verifyBAM_full/
Untracked: nohup.out
Untracked: output/._.DS_Store
Untracked: output/._AverageDiffHeatmap.Nuclear.png
Untracked: output/._AverageDiffHeatmap.Total.png
Untracked: output/._GeneswithAPApotential.png
Untracked: output/._GeneswithAPApotentialAllPAS.png
Untracked: output/._PASlocation.png
Untracked: output/._SignalSitePlot.png
Untracked: output/._meanCorrelationPhenotypes.svg
Untracked: output/._qqplot_Nuclear_APAperm.png
Untracked: output/._qqplot_Nuclear_APAperm_4pc.png
Untracked: output/._qqplot_Total_APAperm.png
Untracked: output/._qqplot_Total_APAperm_4pc.png
Untracked: output/AverageDiffHeatmap.Nuclear.png
Untracked: output/AverageDiffHeatmap.Total.png
Untracked: output/GeneswithAPApotential.png
Untracked: output/GeneswithAPApotentialAllPAS.png
Untracked: output/PASlocation.png
Untracked: output/SignalSitePlot.png
Untracked: output/SignalSitePlotbyLoc.png
Untracked: output/dtPlots/
Untracked: output/fastqc/
Untracked: output/meanCorrelationPhenotypes.svg
Untracked: output/newnuc.png
Untracked: output/newtot.png
Untracked: output/oldnuc.png
Untracked: output/oldtot.png
Untracked: output/qqplot_Nuclear_APAperm.png
Untracked: output/qqplot_Nuclear_APAperm_4pc.png
Untracked: output/qqplot_Total_APAperm.png
Untracked: output/qqplot_Total_APAperm_4pc.png
Untracked: run_verifybam517N.err
Untracked: run_verifybam517N.out
Unstaged changes:
Modified: analysis/NuclearSpecIncludeNotTested.Rmd
Modified: analysis/PASdescriptiveplots.Rmd
Modified: analysis/Readdistagainstfeatures.Rmd
Modified: analysis/TSS.Rmd
Modified: analysis/apabyeQTLstatus.Rmd
Modified: analysis/decayAndStability.Rmd
Modified: analysis/miRNAdisrupt.Rmd
Modified: analysis/nascenttranscription.Rmd
Modified: analysis/nucSpecinEQTLs.Rmd
Modified: analysis/overlapapaqtlsandeqtls.Rmd
Modified: analysis/pQTLexampleplot.Rmd
Modified: analysis/riboQQplot.Rmd
Modified: analysis/splicesitestrength.Rmd
Modified: analysis/version15bpfilter.Rmd
Modified: code/DistPAS2Sig.py
Modified: code/Script4NuclearQTLexamples.sh
Modified: code/Script4TotalQTLexamples.sh
Modified: code/apaQTLsnake.err
Modified: code/environment.yaml
Modified: code/run_qtlFacetBoxplots.sh
Deleted: code/test.txt
Deleted: reads_graphs.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view them.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | f1bc9fb | brimittleman | 2020-03-10 | TPM |
html | 98842ca | brimittleman | 2020-02-03 | Build site. |
Rmd | 147dc38 | brimittleman | 2020-02-03 | add tpm cutoffs |
html | ddfe841 | brimittleman | 2020-01-31 | Build site. |
Rmd | a1607df | brimittleman | 2020-01-31 | look at tissue specifcity |
library(workflowr)
This is workflowr version 1.6.0
Run ?workflowr for help getting started
library(ggpubr)
Loading required package: ggplot2
Loading required package: magrittr
library(tidyverse)
── Attaching packages ────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
✔ tibble 2.1.1 ✔ purrr 0.3.2
✔ tidyr 0.8.3 ✔ dplyr 0.8.0.1
✔ readr 1.3.1 ✔ stringr 1.3.1
✔ tibble 2.1.1 ✔ forcats 0.3.0
── Conflicts ───────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ tidyr::extract() masks magrittr::extract()
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
✖ purrr::set_names() masks magrittr::set_names()
In this analysis I will answer the reviewer questions related to the number of PAS per gene.
First I want to get the number of PAS used at 5% per gene. I am doing this with the nuclear results.
PAS=read.table("../data/PAS/APApeak_Peaks_GeneLocAnno.Nuclear.5perc.sort.bed",col.names = c("chr","start","end","name","score","strand")) %>% separate(name,into=c("pas", 'gene','loc'), sep=":") %>% group_by(gene) %>% summarise(nPAS=n())
ggplot(PAS,aes(x=nPAS)) + geom_bar()
Version | Author | Date |
---|---|---|
ddfe841 | brimittleman | 2020-01-31 |
The data goes from 1-10 PAS per gene.
I will start with AllTranscriptsbyName.Grouped.bed. For this I will used the longest annotated transcript by transcription start and end.
genes=read.table("../../genome_anotation_data/RefSeq_annotations/Hg19_refseq_genes.txt",header = T,stringsAsFactors = F) %>%
mutate(Genelength=txEnd-txStart) %>%
group_by(name2) %>%
arrange(desc(Genelength)) %>%
dplyr::slice(1)%>%
dplyr::select(name2, Genelength) %>%
dplyr::rename("gene"= name2)
PAS_wLength= PAS %>% inner_join(genes, by="gene")
Check for correlation
cor.test(PAS_wLength$Genelength, PAS_wLength$nPAS)
Pearson's product-moment correlation
data: PAS_wLength$Genelength and PAS_wLength$nPAS
t = 22.118, df = 15041, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.1619636 0.1929179
sample estimates:
cor
0.1774846
PAS_wLength$nPAS=as.factor(PAS_wLength$nPAS)
ggplot(PAS_wLength, aes(x=nPAS,y=log10(Genelength), fill=nPAS)) + geom_boxplot() + labs(x="Number of PAS", y="log10(Length of Gene)", title="Relationship between gene length and Number of PAS")
Version | Author | Date |
---|---|---|
ddfe841 | brimittleman | 2020-01-31 |
Seperate by only 1 pas vs multiple.
PAS_wLength$nPAS=as.numeric(as.character(PAS_wLength$nPAS))
PAS_wLength_apa= PAS_wLength %>% mutate(APA=ifelse(nPAS>1,"Yes","No"))
ggplot(PAS_wLength_apa,aes(x=APA, y=log10(Genelength))) + geom_boxplot() + stat_compare_means()
Version | Author | Date |
---|---|---|
ddfe841 | brimittleman | 2020-01-31 |
PAS_wLength_apa %>% group_by(APA) %>% summarise(meanL=median(Genelength))
# A tibble: 2 x 2
APA meanL
<chr> <dbl>
1 No 11312
2 Yes 34413
I also will look at length of the longest annotated 3’ UTR:
UTR=read.table("../../genome_anotation_data/RefSeq_annotations/ncbiRefSeq_UTR3.sort.bed",col.names = c('chr','start','end','utr','gene', 'score','strand'),stringsAsFactors = F) %>%
mutate(UTRlength=end-start) %>%
group_by(gene) %>%
arrange(desc(UTRlength)) %>%
dplyr::slice(1) %>%
select(gene, UTRlength)
PAS_wUTRLength= PAS %>% inner_join(UTR, by="gene")
Check for correlation
cor.test(PAS_wLength$Genelength, PAS_wLength$nPAS)
Pearson's product-moment correlation
data: PAS_wLength$Genelength and PAS_wLength$nPAS
t = 22.118, df = 15041, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.1619636 0.1929179
sample estimates:
cor
0.1774846
PAS_wUTRLength$nPAS=as.factor(PAS_wUTRLength$nPAS)
ggplot(PAS_wUTRLength, aes(x=nPAS,y=log10(UTRlength), fill=nPAS)) + geom_boxplot() + labs(x="Number of PAS", y="log10(Length of UTR)", title="Relationship between UTR length and Number of PAS")
Version | Author | Date |
---|---|---|
ddfe841 | brimittleman | 2020-01-31 |
Seperate by only 1 pas vs multiple.
PAS_wUTRLength$nPAS=as.numeric(as.character(PAS_wUTRLength$nPAS))
PAS_wUTRLength_apa= PAS_wUTRLength %>% mutate(APA=ifelse(nPAS>1,"Yes","No"))
ggplot(PAS_wUTRLength_apa,aes(x=APA, y=log10(UTRlength))) + geom_boxplot() + stat_compare_means()
Version | Author | Date |
---|---|---|
ddfe841 | brimittleman | 2020-01-31 |
ggplot(PAS_wUTRLength_apa,aes(by=APA, fill=APA, x=log10(UTRlength))) + geom_density(alpha=.5)
Version | Author | Date |
---|---|---|
ddfe841 | brimittleman | 2020-01-31 |
Only the UTR pas:
PAS_Utr= read.table("../data/PAS/APApeak_Peaks_GeneLocAnno.Nuclear.5perc.sort.bed",col.names = c("chr","start","end","name","score","strand")) %>%
separate(name,into=c("pas", 'gene','loc'), sep=":") %>%
filter(loc=="utr3") %>%
group_by(gene) %>%
summarise(nUTRPAS=n())
UTRPAS_wUTRLength= PAS_Utr %>% inner_join(UTR, by="gene")
Check for correlation
cor.test(UTRPAS_wUTRLength$UTRlength, UTRPAS_wUTRLength$nUTRPAS)
Pearson's product-moment correlation
data: UTRPAS_wUTRLength$UTRlength and UTRPAS_wUTRLength$nUTRPAS
t = 52.33, df = 12406, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.4107098 0.4395393
sample estimates:
cor
0.4252324
summary(lm(log10(UTRPAS_wUTRLength$UTRlength) ~ UTRPAS_wUTRLength$nUTRPAS))
Call:
lm(formula = log10(UTRPAS_wUTRLength$UTRlength) ~ UTRPAS_wUTRLength$nUTRPAS)
Residuals:
Min 1Q Median 3Q Max
-1.80819 -0.28373 0.03181 0.31534 1.40264
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.579931 0.008362 308.54 <2e-16 ***
UTRPAS_wUTRLength$nUTRPAS 0.269648 0.004808 56.09 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.4431 on 12406 degrees of freedom
Multiple R-squared: 0.2023, Adjusted R-squared: 0.2022
F-statistic: 3146 on 1 and 12406 DF, p-value: < 2.2e-16
UTRPAS_wUTRLength$nUTRPAS=as.factor(UTRPAS_wUTRLength$nUTRPAS)
ggplot(UTRPAS_wUTRLength, aes(x=nUTRPAS,y=log10(UTRlength), fill=nUTRPAS)) + geom_boxplot() + labs(x="Number of 3' UTR PAS", y="log10(Length of UTR)", title="Relationship between UTR length and Number of PAS")
Version | Author | Date |
---|---|---|
ddfe841 | brimittleman | 2020-01-31 |
Expression level by number of PAS
Calculate mean normalized gene expression values per gene.
geneNames=read.table("../../genome_anotation_data/ensemble_to_genename.txt", sep="\t", col.names = c('gene_id', 'gene', 'source' ),stringsAsFactors = F, header = T) %>% select(gene_id, gene)
Rnames=colnames(read.table("../data/molPhenos/RNAhead.txt", header = T))
Expression=read.table("../data/molPhenos/fastqtl_qqnorm_RNAseq_phase2.fixed.noChr.txt.gz",col.names = Rnames) %>%
separate(ID,into=c("gene_id","extra"), sep="\\.") %>%
inner_join(geneNames,by = "gene_id") %>%
select(-Chr,-start,-end,-gene_id, -extra) %>%
gather("ind", "exp", -gene) %>%
group_by(gene) %>%
summarise(MeanExp=mean(exp))
PAS_wExp= PAS %>% inner_join(Expression, by="gene")
cor.test(PAS_wExp$MeanExp, PAS_wExp$nPAS)
Pearson's product-moment correlation
data: PAS_wExp$MeanExp and PAS_wExp$nPAS
t = 5.7763, df = 10337, p-value = 7.859e-09
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.03748661 0.07591472
sample estimates:
cor
0.05672167
summary(lm(PAS_wExp$MeanExp ~ PAS_wExp$nPAS))
Call:
lm(formula = PAS_wExp$MeanExp ~ PAS_wExp$nPAS)
Residuals:
Min 1Q Median 3Q Max
-0.15218 -0.03221 0.00099 0.03222 0.15212
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0032918 0.0008791 -3.744 0.000182 ***
PAS_wExp$nPAS 0.0015122 0.0002618 5.776 7.86e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.04714 on 10337 degrees of freedom
Multiple R-squared: 0.003217, Adjusted R-squared: 0.003121
F-statistic: 33.37 on 1 and 10337 DF, p-value: 7.859e-09
PAS_wExp$nPAS=as.factor(PAS_wExp$nPAS)
ggplot(PAS_wExp, aes(x=nPAS,y=MeanExp, fill=nPAS)) + geom_boxplot() + labs(x="Number of PAS", y="Mean normalized expression", title="Relationship between expression and Number of PAS")
Version | Author | Date |
---|---|---|
ddfe841 | brimittleman | 2020-01-31 |
No aparent difference here. I will remove the 12 and test correlation again.
PAS_wExpFilt= PAS %>% inner_join(Expression, by="gene") %>% filter(nPAS <10)
cor.test(PAS_wExpFilt$MeanExp, PAS_wExpFilt$nPAS)
Pearson's product-moment correlation
data: PAS_wExpFilt$MeanExp and PAS_wExpFilt$nPAS
t = 5.4617, df = 10320, p-value = 4.825e-08
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.03442997 0.07290264
sample estimates:
cor
0.05368623
summary(lm(PAS_wExpFilt$MeanExp ~ PAS_wExpFilt$nPAS))
Call:
lm(formula = PAS_wExpFilt$MeanExp ~ PAS_wExpFilt$nPAS)
Residuals:
Min 1Q Median 3Q Max
-0.151852 -0.032263 0.000969 0.032277 0.152032
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0031448 0.0008866 -3.547 0.000391 ***
PAS_wExpFilt$nPAS 0.0014524 0.0002659 5.462 4.82e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.04715 on 10320 degrees of freedom
Multiple R-squared: 0.002882, Adjusted R-squared: 0.002786
F-statistic: 29.83 on 1 and 10320 DF, p-value: 4.825e-08
PAS_wExpFilt$nPAS=as.numeric(as.character(PAS_wExpFilt$nPAS))
PAS_wExpFilt_apa= PAS_wExpFilt %>% mutate(APA=ifelse(nPAS>1,"Yes","No"))
ggplot(PAS_wExpFilt_apa,aes(x=APA, y=MeanExp)) + geom_boxplot() + stat_compare_means()
Version | Author | Date |
---|---|---|
ddfe841 | brimittleman | 2020-01-31 |
ggplot(PAS_wExpFilt_apa,aes(by=APA, fill=APA, x=MeanExp)) + geom_density(alpha=.5)
Version | Author | Date |
---|---|---|
ddfe841 | brimittleman | 2020-01-31 |
PAS_wExpFilt_apa %>% group_by(APA) %>% summarize(Mean=mean(MeanExp))
# A tibble: 2 x 2
APA Mean
<chr> <dbl>
1 No -0.00390
2 Yes 0.00292
It looks like there is a significant difference here between genes with APA and those without, but visualy it doesnt look like number of PAS is driven by expression.
Redo this analysis with Geuvadis TPM (easier to interpret)
MetaExp=read.table("../data/Geuvadis/metadata.txt", stringsAsFactors = F, header = T) %>% filter(Characteristics.population.=="YRI")
Gevadis=read.table("../data/Geuvadis/kallisto.txt.gz",header = T,stringsAsFactors = F) %>% select(gene,MetaExp$ENA_RUN) %>% separate(gene, into=c("base", 'transcript'), sep="-") %>% select(-transcript)
Warning: Expected 2 pieces. Additional pieces discarded in 54 rows [16,
829, 2467, 4560, 5461, 6191, 6391, 9123, 9561, 10748, 11817, 13305, 13536,
14616, 16893, 16960, 16967, 20631, 21427, 22233, ...].
Warning: Expected 2 pieces. Missing pieces filled with `NA` in 30924 rows
[7, 11, 13, 15, 21, 22, 24, 25, 27, 29, 40, 43, 50, 51, 53, 55, 56, 57, 58,
60, ...].
Gevadis_gather=Gevadis %>% gather('indiv', 'TPM',-base) %>% group_by(base) %>% summarise(MeanTranscriptTPM=mean(TPM)) %>% rename('gene'=base)
Gevadis_gatherFilt=Gevadis %>% gather('indiv', 'TPM',-base) %>% group_by(base) %>% summarise(MeanTranscriptTPM=mean(TPM)) %>% filter(MeanTranscriptTPM >=1)%>% rename('gene'=base)
PAS_wTPM= PAS %>% inner_join(Gevadis_gather, by="gene")
PAS_wTPM$nPAS=as.factor(PAS_wTPM$nPAS)
ggplot(PAS_wTPM, aes(x=nPAS,y=log10(MeanTranscriptTPM), fill=nPAS)) + geom_boxplot() + labs(x="Number of PAS", y="Mean TPM", title="Relationship between expression and Number of PAS")
Warning: Removed 136 rows containing non-finite values (stat_boxplot).
PAS_wTPM$nPAS=as.numeric(as.character(PAS_wTPM$nPAS))
PAS_wTPM_apa= PAS_wTPM %>% mutate(APA=ifelse(nPAS>1,"Yes","No"))
ggplot(PAS_wTPM_apa,aes(x=APA, y=log10(MeanTranscriptTPM))) + geom_boxplot() + stat_compare_means()
Warning: Removed 136 rows containing non-finite values (stat_boxplot).
Warning: Removed 136 rows containing non-finite values
(stat_compare_means).
PAS_wTPM_apa %>% group_by(APA) %>% summarize(Mean=mean(MeanTranscriptTPM))
# A tibble: 2 x 2
APA Mean
<chr> <dbl>
1 No 87.6
2 Yes 38.0
Filtered 1 tpm
PAS_wTPMFilt= PAS %>% inner_join(Gevadis_gatherFilt, by="gene")
PAS_wTPMFilt_apa= PAS_wTPMFilt %>% mutate(APA=ifelse(nPAS>1,"Yes","No"))
ggplot(PAS_wTPMFilt_apa,aes(x=APA, y=log10(MeanTranscriptTPM))) + geom_boxplot() + stat_compare_means()
PAS_wTPMFilt_apa %>% group_by(APA) %>% summarize(Mean=mean(MeanTranscriptTPM))
# A tibble: 2 x 2
APA Mean
<chr> <dbl>
1 No 133.
2 Yes 41.0
cor.test(PAS_wTPMFilt_apa$MeanTranscriptTPM, PAS_wTPMFilt_apa$nPAS)
Pearson's product-moment correlation
data: PAS_wTPMFilt_apa$MeanTranscriptTPM and PAS_wTPMFilt_apa$nPAS
t = -10.549, df = 10991, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.11858725 -0.08157417
sample estimates:
cor
-0.1001153
PAS_wTPMFilt_apa$nPAS=as.factor(PAS_wTPMFilt_apa$nPAS)
ggplot(PAS_wTPMFilt_apa, aes(x=nPAS,y=log10(MeanTranscriptTPM), fill=nPAS)) + geom_boxplot() + labs(x="Number of PAS", y="Mean TPM", title="Relationship between expression and Number of PAS \n Genes with >1 TPM ")
I will write out seperate lists for genes with 1 PAS and those with more than one. I will use GOrilla to test for gene set inforamtion
PAS_noapa= PAS %>% filter(nPAS==1) %>% select(gene)
PAS_apa= PAS %>% filter(nPAS>1)%>%arrange(desc(nPAS)) %>% select(gene)
I will use 1 PAS as backgroun and with APA as the set.
mkdir ../data/nPAS/
write.table(PAS_noapa,"../data/nPAS/GenesNoAPA.txt", col.names = F, row.names = F, quote = F)
write.table(PAS_apa,"../data/nPAS/GenesAPA.txt", col.names = F, row.names = F, quote = F)
Significant processes : FDR q <10^-9:
regulation of nucleobase-containing compound metabolic process
regulation of cellular macromolecule biosynthetic process
nucleic acid metabolic process
regulation of macromolecule biosynthetic process
regulation of cellular biosynthetic process regulation of nucleic acid-templated transcription
regulation of RNA biosynthetic process
regulation of transcription, DNA-templated
regulation of biosynthetic process
regulation of nitrogen compound metabolic process
regulation of primary metabolic process regulation of cellular metabolic process
RNA processing
Significant function : FDR q <10^-9:
heterocyclic compound binding
organic cyclic compound binding nucleic acid binding
DNA binding
Significant component : FDR q <10^-9:
intracellular part nucleoplasm nuclear part
intracellular organelle nucleus intracellular membrane-bounded organelle
intracellular organelle part
nucleoplasm part
organelle part
organelle
Not really sure what to do with this. I don’t have an expectation for this. These are key ceullualar processes, functions, and regions. Most genes in this analysis have APA.
Median gene-level TPM by tissue. Median expression was calculated from the file GTEx_Analysis_2017-06-05_v8_RNASeQCv1.1.9_gene_tpm.gct.gz.
I will download information from gtex. I can then set a TPM cutoff and look at for each gene how many tissues it is expressed.
GTEX_test<-read.table("../data/nPAS/GTEx_Analysis_2017-06-05_v8_RNASeQCv1.1.9_gene_median_tpm.gct", header = T, skip=2, sep = '\t') %>%
separate(Name,into=c("gene_id","extra"), sep="\\.") %>%
inner_join(geneNames, by="gene_id") %>%
select(-gene_id,-Description,-extra) %>%
gather("tissue", "TPM",-gene)
ggplot(GTEX_test,aes(y=log10(TPM), by=tissue, fill=tissue)) + geom_boxplot()+theme(legend.position = "none")
Warning: Removed 1456429 rows containing non-finite values (stat_boxplot).
Try logTPM of 2 - 100
Filter genes that come up with more than 54 due to gene name issues.
GTEX=read.table("../data/nPAS/GTEx_Analysis_2017-06-05_v8_RNASeQCv1.1.9_gene_median_tpm.gct", header = T, skip=2, sep = '\t') %>%
separate(Name,into=c("gene_id","extra"), sep="\\.") %>%
inner_join(geneNames, by="gene_id") %>%
select(-gene_id,-Description,-extra) %>%
gather("tissue", "TPM",-gene) %>%
filter(TPM >=100 )%>%
group_by(gene) %>%
summarise(nTissue=n()) %>%
filter(nTissue<=54)
Join this with the PAS info:
PAS_tissue=PAS %>% inner_join(GTEX,by="gene")
cor.test(PAS_tissue$nPAS, PAS_tissue$nTissue)
Pearson's product-moment correlation
data: PAS_tissue$nPAS and PAS_tissue$nTissue
t = -6.8873, df = 3589, p-value = 6.685e-12
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.14637420 -0.08180869
sample estimates:
cor
-0.114212
PAS_tissue$nPAS= as.factor(PAS_tissue$nPAS)
ggplot(PAS_tissue, aes(x=nPAS,y=nTissue, fill=nPAS)) + geom_boxplot() + labs(x="Number of PAS", y="Number of tissues median TPM>100", title="Relationship tissue specificity and Number of PAS")
With and without APA
PAS_tissue$nPAS=as.numeric(as.character(PAS_tissue$nPAS))
PAS_tissue_apa= PAS_tissue %>% mutate(APA=ifelse(nPAS>1,"Yes","No"))
ggplot(PAS_tissue_apa,aes(by=APA, x=nTissue,fill=APA)) + geom_density(alpha=.4)
Version | Author | Date |
---|---|---|
98842ca | brimittleman | 2020-02-03 |
ggplot(PAS_tissue_apa,aes(by=APA, x=APA,y=nTissue)) + geom_boxplot() + stat_compare_means()
Looks like genes with apa are a bit more specific.
Try log(TPM)>1
GTEX_10=read.table("../data/nPAS/GTEx_Analysis_2017-06-05_v8_RNASeQCv1.1.9_gene_median_tpm.gct", header = T, skip=2, sep = '\t') %>%
separate(Name,into=c("gene_id","extra"), sep="\\.") %>%
inner_join(geneNames, by="gene_id") %>%
select(-gene_id,-Description,-extra) %>%
gather("tissue", "TPM",-gene) %>%
filter(TPM >=10 )%>%
group_by(gene) %>%
summarise(nTissue=n()) %>%
filter(nTissue<=54)
Join this with the PAS info:
PAS_tissue10=PAS %>% inner_join(GTEX_10,by="gene")
cor.test(PAS_tissue10$nPAS, PAS_tissue10$nTissue)
Pearson's product-moment correlation
data: PAS_tissue10$nPAS and PAS_tissue10$nTissue
t = -8.7503, df = 11916, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.09771705 -0.06203879
sample estimates:
cor
-0.07990351
PAS_tissue10$nPAS= as.factor(PAS_tissue10$nPAS)
ggplot(PAS_tissue10, aes(x=nPAS,y=nTissue, fill=nPAS)) + geom_boxplot() + labs(x="Number of PAS", y="Number of tissues median TPM>10", title="Relationship tissue specificity and Number of PAS")
With and without APA
PAS_tissue10$nPAS=as.numeric(as.character(PAS_tissue10$nPAS))
PAS_tissue10_apa= PAS_tissue10 %>% mutate(APA=ifelse(nPAS>1,"Yes","No"))
ggplot(PAS_tissue10_apa,aes(by=APA, x=nTissue,fill=APA)) + geom_density(alpha=.4)
Version | Author | Date |
---|---|---|
98842ca | brimittleman | 2020-02-03 |
Try log(TPM)>3
GTEX_1000=read.table("../data/nPAS/GTEx_Analysis_2017-06-05_v8_RNASeQCv1.1.9_gene_median_tpm.gct", header = T, skip=2, sep = '\t') %>%
separate(Name,into=c("gene_id","extra"), sep="\\.") %>%
inner_join(geneNames, by="gene_id") %>%
select(-gene_id,-Description,-extra) %>%
gather("tissue", "TPM",-gene) %>%
filter(TPM >=1000 )%>%
group_by(gene) %>%
summarise(nTissue=n()) %>%
filter(nTissue<=54)
Join this with the PAS info:
PAS_tissue1000=PAS %>% inner_join(GTEX_1000,by="gene")
cor.test(PAS_tissue1000$nPAS, PAS_tissue1000$nTissue)
Pearson's product-moment correlation
data: PAS_tissue1000$nPAS and PAS_tissue1000$nTissue
t = -2.2996, df = 278, p-value = 0.02221
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.24984963 -0.01972422
sample estimates:
cor
-0.1366298
PAS_tissue1000$nPAS= as.factor(PAS_tissue1000$nPAS)
ggplot(PAS_tissue1000, aes(x=nPAS,y=nTissue, fill=nPAS)) + geom_boxplot() + labs(x="Number of PAS", y="Number of tissues median TPM>1000", title="Relationship tissue specificity and Number of PAS")
With and without APA
PAS_tissue1000$nPAS=as.numeric(as.character(PAS_tissue1000$nPAS))
PAS_tissue1000_apa= PAS_tissue1000 %>% mutate(APA=ifelse(nPAS>1,"Yes","No"))
ggplot(PAS_tissue1000_apa,aes(by=APA, x=nTissue,fill=APA)) + geom_density(alpha=.4)
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.4 (Nitrogen)
Matrix products: default
BLAS/LAPACK: /software/openblas-0.2.19-el7-x86_64/lib/libopenblas_haswellp-r0.2.19.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] forcats_0.3.0 stringr_1.3.1 dplyr_0.8.0.1 purrr_0.3.2
[5] readr_1.3.1 tidyr_0.8.3 tibble_2.1.1 tidyverse_1.2.1
[9] ggpubr_0.2 magrittr_1.5 ggplot2_3.1.1 workflowr_1.6.0
loaded via a namespace (and not attached):
[1] tidyselect_0.2.5 haven_1.1.2 lattice_0.20-38 colorspace_1.3-2
[5] generics_0.0.2 htmltools_0.3.6 yaml_2.2.0 utf8_1.1.4
[9] rlang_0.4.0 later_0.7.5 pillar_1.3.1 glue_1.3.0
[13] withr_2.1.2 modelr_0.1.2 readxl_1.1.0 plyr_1.8.4
[17] munsell_0.5.0 gtable_0.2.0 cellranger_1.1.0 rvest_0.3.2
[21] evaluate_0.12 labeling_0.3 knitr_1.20 httpuv_1.4.5
[25] fansi_0.4.0 broom_0.5.1 Rcpp_1.0.2 promises_1.0.1
[29] scales_1.0.0 backports_1.1.2 jsonlite_1.6 fs_1.3.1
[33] hms_0.4.2 digest_0.6.18 stringi_1.2.4 grid_3.5.1
[37] rprojroot_1.3-2 cli_1.1.0 tools_3.5.1 lazyeval_0.2.1
[41] crayon_1.3.4 whisker_0.3-2 pkgconfig_2.0.2 xml2_1.2.0
[45] lubridate_1.7.4 assertthat_0.2.0 rmarkdown_1.10 httr_1.3.1
[49] rstudioapi_0.10 R6_2.3.0 nlme_3.1-137 git2r_0.26.1
[53] compiler_3.5.1