\name{read10xCounts}
\alias{read10xCounts}

\title{Load in data from 10x experiment}

\description{
Creates a SingleCellExperiment from the CellRanger output directories for 10X Genomics data.
}

\usage{
read10xCounts(samples, col.names=FALSE, type=c("auto", "sparse", "HDF5"), 
    group=NULL) 
}

\arguments{
\item{samples}{A character vector containing one or more directory names, each corresponding to a 10X sample.
Each directory should contain the \code{"matrix.mtx"}, \code{"genes.tsv"} and \code{"barcodes.tsv"} files generated by CellRanger.
}
\item{col.names}{A logical scalar indicating whether the columns of the output object should be named with the cell barcodes.}
\item{type}{String specifying the type of 10x format to read data from.}
\item{group}{String specifying the group name if \code{type="HDF5"}.}
}

\value{
A SingleCellExperiment object containing count data for each gene (row) and cell (column) across all \code{samples}.
\itemize{
    \item Row metadata will contain the fields \code{"ID"} and \code{"Symbol"}.
        The former is the gene identifier (usually Ensembl), while the latter is the gene name.
    \item Column metadata will contain the fields \code{"Sample"} and \code{"Barcode"}.
        The former contains the value of \code{samples} from which each column was obtained.
        The latter refers to the cell barcode sequence and GEM group for each library. 
    \item Rows are named with the gene identifier.
    Columns are named with the cell barcode in certain settings, see Details.
}
}

\details{
This function was originally developed from the \code{Read10X} function from the \pkg{Seurat} package.
It was then taken from the \code{read10xResults} implementation in the \pkg{scater} package.

If \code{type="auto"}, the format is automatically detected for each \code{samples} based on whether it ends with \code{".h5"}.
If so, \code{type} is set to \code{"HDF5"}; otherwise it is set to \code{"sparse"}.
\itemize{
    \item If \code{type="sparse"}, count data are loaded as a \linkS4class{dgCMatrix} object.
        This is a conventional column-sparse compressed matrix format produced by the CellRanger pipeline.
    \item If \code{type="HDF5"}, count data are assumed to follow the 10X sparse HDF5 format for large data sets.
        It is loaded as a \linkS4class{TENxMatrix} object, which is a stub object that refers back to the file in \code{samples}.
        Users may need to set \code{group} if it cannot be automatically determined.
}

Matrices are combined by column if multiple \code{samples} were specified.
This will throw an error if the gene information is not consistent across \code{samples}.

If \code{col.names=TRUE} and \code{length(sample)==1}, each column is named by the cell barcode.
For multiple samples, the columns are unnamed to avoid problems with non-unique barcodes across samples.

Note that user-level manipulation of sparse matrices requires loading of the \pkg{Matrix} package.
Otherwise, calculation of \code{rowSums}, \code{colSums}, etc. will result in errors.
}

\author{
Davis McCarthy, with modifications from Aaron Lun
}

\seealso{
\code{\link{write10xCounts}}
}

\examples{
# Mocking up some 10X genomics output.
example(write10xCounts)

# Reading it in.
sce10x <- read10xCounts(tmpdir)

# Column names are dropped with multiple 'samples'.
sce10x2 <- read10xCounts(c(tmpdir, tmpdir))
}

\references{
Zheng GX, Terry JM, Belgrader P, and others (2017).
Massively parallel digital transcriptional profiling of single cells. 
\emph{Nat Commun} 8:14049.

10X Genomics (2017).
Gene-Barcode Matrices.
\url{https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/output/matrices}

10X Genomics (2018).
HDF5 Gene-Barcode Matrix Format.
\url{https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/h5_matrices}
}