Skip to content
process_some_data_test.Rout 10.7 KiB
Newer Older

R version 4.0.4 (2021-02-15) -- "Lost Library Book"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> 
> # Author: Almut Luetge, Anthony Sonrel
> 
> ## Instructions: 
> # Modify here the steps of processing that will be applied to the raw data
> 
> ### ---------- Normalization and preprocessing ----------- #### 
>   
> args <- (commandArgs(trailingOnly = TRUE))
> for (i in seq_len(length(args))) {
+     eval(parse(text = args[[i]]))
+ }
> 
> set.seed(1234)
> 
> ## Input Arguments
> # counts in mtx format
> print(count_file)
[1] "data/some_data_test/counts_some_data_test.mtx.gz"
> # cells metadata in json format
> print(meta_file)
[1] "data/some_data_test/meta_some_data_test.json"
> # dataset name used to naming the outputs
> print(dataset_name)
[1] "some_data_test"
> # output path
> print(out_path)
[1] "data/some_data_test_process"
> 
> ## Libraries
> library(utils)
> library(SingleCellExperiment)
Loading required package: SummarizedExperiment
Loading required package: MatrixGenerics
Loading required package: matrixStats

Attaching package: ‘MatrixGenerics’

The following objects are masked from ‘package:matrixStats’:

    colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
    colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
    colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
    colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
    colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
    colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
    colWeightedMeans, colWeightedMedians, colWeightedSds,
    colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
    rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
    rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
    rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
    rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
    rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
    rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
    rowWeightedSds, rowWeightedVars

Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min

Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:

    expand.grid

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.


Attaching package: ‘Biobase’

The following object is masked from ‘package:MatrixGenerics’:

    rowMedians

The following objects are masked from ‘package:matrixStats’:

    anyMissing, rowMedians

> library(scater)
Loading required package: ggplot2
> library(scran)
> library(R.utils)
Loading required package: R.oo
Loading required package: R.methodsS3
R.methodsS3 v1.8.1 (2020-08-26 16:20:06 UTC) successfully loaded. See ?R.methodsS3 for help.
R.oo v1.24.0 (2020-08-26 16:11:58 UTC) successfully loaded. See ?R.oo for help.

Attaching package: ‘R.oo’

The following object is masked from ‘package:R.methodsS3’:

    throw

The following object is masked from ‘package:SummarizedExperiment’:

    trim

The following object is masked from ‘package:GenomicRanges’:

    trim

The following object is masked from ‘package:IRanges’:

    trim

The following objects are masked from ‘package:methods’:

    getClasses, getMethods

The following objects are masked from ‘package:base’:

    attach, detach, load, save

R.utils v2.10.1 (2020-08-26 22:50:31 UTC) successfully loaded. See ?R.utils for help.

Attaching package: ‘R.utils’

The following object is masked from ‘package:utils’:

    timestamp

The following objects are masked from ‘package:base’:

    cat, commandArgs, getOption, inherits, isOpen, nullfile, parse,
    warnings

> library(Matrix)

Attaching package: ‘Matrix’

The following object is masked from ‘package:S4Vectors’:

    expand

> library(jsonlite)

Attaching package: ‘jsonlite’

The following object is masked from ‘package:R.utils’:

    validate

> library(uwot)
> 
> ## Load data
> counts <- readMM(count_file)
> meta <- fromJSON(meta_file)
> colnames(counts) <- meta$cell_id 
> 
> ## Generate sce
> sce <- SingleCellExperiment(list(counts = counts),
+                                  colData = DataFrame(meta))
> 
> 
> ## Normalize 
> clusters <- quickCluster(sce, use.ranks=FALSE)
Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth = TRUE,  :
  You're computing too large a percentage of total singular values, use a standard svd instead.
Warning message:
In check_numbers(k = k, nu = nu, nv = nv, limit = min(dim(x)) -  :
  more singular values/vectors requested than available
> table(clusters)
clusters
  1 
200 
> sce <- computeSumFactors(sce, min.mean=0.1, cluster=clusters)
> sce <-  logNormCounts(sce)
> 
> 
> ## Select highly variable genes
> dec <- modelGeneVar(sce)
> dec <- dec[order(dec$bio, decreasing = TRUE),] 
> hvg_sig <- getTopHVGs(dec, fdr.threshold=0.05)
> hvg <- getTopHVGs(dec, var.threshold = 0)
> length(hvg)
[1] 45
> length(hvg_sig)
[1] 7
> hvg_tab <- data.frame("all" = hvg,
+                       "sig" = hvg %in% hvg_sig)
> 
> 
> ## Run dimensional reduction
> sce <- runPCA(sce, ncomponents = 20, ntop = length(hvg))
> sce <- runUMAP(sce)
> 
> 
> ## Save normalized counts as gziped mtx file
> mtx <- as.matrix(logcounts(sce))
> mtx <- Matrix(mtx)
> mtx <- as(mtx, "dgTMatrix")
> matrix_out <- paste0(out_path, "/norm_counts_", dataset_name, ".mtx")
> writeMM(obj = mtx, matrix_out)
NULL
> gzip(matrix_out, overwrite=TRUE)
> 
> 
> 
> ## Save reduced Dimensions as gziped mtx
> colnames(reducedDims(sce)[["UMAP"]]) <- c("UMAP1", "UMAP2")
> red_tab <- cbind(reducedDims(sce)[["PCA"]], reducedDims(sce)[["UMAP"]])
> red_mtx <- as.matrix(red_tab)
> red_mtx <- Matrix(red_mtx)
> red_mtx <- as(red_mtx, "dgTMatrix")
> red_out <- paste0(out_path, "/dim_red_", dataset_name, ".mtx")
> writeMM(obj = red_mtx, red_out)
NULL
> gzip(red_out, overwrite=TRUE)
> 
> 
> ## Save highly variable genes as json
> jsonlite::write_json(hvg_tab, paste0(out_path, "/hvg_", dataset_name, ".json"), 
+                      matrix = "columnmajor")
> 
> sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C             
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] uwot_0.1.10                 jsonlite_1.7.2             
 [3] Matrix_1.3-2                R.utils_2.10.1             
 [5] R.oo_1.24.0                 R.methodsS3_1.8.1          
 [7] scran_1.18.7                scater_1.18.6              
 [9] ggplot2_3.3.3               SingleCellExperiment_1.12.0
[11] SummarizedExperiment_1.20.0 Biobase_2.50.0             
[13] GenomicRanges_1.42.0        GenomeInfoDb_1.26.7        
[15] IRanges_2.24.1              S4Vectors_0.28.1           
[17] BiocGenerics_0.36.1         MatrixGenerics_1.2.1       
[19] matrixStats_0.58.0         

loaded via a namespace (and not attached):
 [1] locfit_1.5-9.4            Rcpp_1.0.6               
 [3] rsvd_1.0.3                lattice_0.20-41          
 [5] FNN_1.1.3                 assertthat_0.2.1         
 [7] utf8_1.2.1                RSpectra_0.16-0          
 [9] R6_2.5.0                  bluster_1.0.0            
[11] pillar_1.5.1              sparseMatrixStats_1.2.1  
[13] zlibbioc_1.36.0           rlang_0.4.10             
[15] irlba_2.3.3               BiocNeighbors_1.8.2      
[17] statmod_1.4.35            BiocParallel_1.24.1      
[19] igraph_1.2.6              RCurl_1.98-1.3           
[21] munsell_0.5.0             beachmat_2.6.4           
[23] DelayedArray_0.16.3       compiler_4.0.4           
[25] vipor_0.4.5               BiocSingular_1.6.0       
[27] pkgconfig_2.0.3           ggbeeswarm_0.6.0         
[29] tidyselect_1.1.0          tibble_3.1.0             
[31] gridExtra_2.3             GenomeInfoDbData_1.2.4   
[33] edgeR_3.32.1              fansi_0.4.2              
[35] viridisLite_0.3.0         crayon_1.4.1             
[37] dplyr_1.0.5               withr_2.4.1              
[39] bitops_1.0-6              grid_4.0.4               
[41] gtable_0.3.0              lifecycle_1.0.0          
[43] DBI_1.1.1                 magrittr_2.0.1           
[45] scales_1.1.1              dqrng_0.2.1              
[47] scuttle_1.0.4             XVector_0.30.0           
[49] viridis_0.5.1             limma_3.46.0             
[51] DelayedMatrixStats_1.12.3 ellipsis_0.3.1           
[53] generics_0.1.0            vctrs_0.3.7              
[55] tools_4.0.4               glue_1.4.2               
[57] beeswarm_0.3.1            purrr_0.3.4              
[59] colorspace_2.0-0         
> 
> proc.time()
   user  system elapsed 
 18.214   5.202  22.415