Skip to content
exercise_1.ipynb 69.3 KiB
Newer Older
Inma Hernandez's avatar
Inma Hernandez committed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Wed May 27 01:01:23 UTC 2020\n"
     ]
    }
   ],
   "source": [
    "!date"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Introduction\n",
    "\n",
    "In this part of the course, we will discuss how to quantify droplet-based single-cell RNA-seq data using _alevin_. We will cover the details about the various command-line flags used by the [_alevin_](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1670-y) tool in its indexing & quantification stages, and quantify a small subset data for the experiment done by [Hermann et. al](https://pubmed.ncbi.nlm.nih.gov/30404016/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Reference Transcriptome\n",
    "\n",
    "Alevin uses the transcriptome-alignment strategy to generate the alignments of the dscRNA-seq reads.\n",
    "Under the hood, alevin uses [Salmon's](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5600148/) [selective-alignment](https://www.biorxiv.org/content/10.1101/657874v2) infrastructure to generate the alignments and starts by first _indexing_ the reference transcriptome.\n",
    "In this tutorial we will use a small reference transcriptome, which we generate by subsampling all the transcripts from the Chromosome 18 & 19\n",
    "of the mouse transcriptome and it is already copied in your environment.  \n",
    "**NOTE**: A user can download the full transcriptome from https://www.gencodegenes.org/ .\n",
    "\n",
    "Let's first start by checking if we can access salmon and the required data through our environment.  \n",
    "**NOTE**: `!` enables the bash command mode for a line in the ipython notebook  \n",
    "**NOTE**: `%%bash` enables the bash command mode for the cell in the ipython notebook"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "salmon v1.2.1\n",
      "\n",
      "Usage:  salmon -h|--help or \n",
      "        salmon -v|--version or \n",
      "        salmon -c|--cite or \n",
      "        salmon [--no-version-check] <COMMAND> [-h | options]\n",
      "\n",
      "Commands:\n",
      "     index Create a salmon index\n",
      "     quant Quantify a sample\n",
      "     alevin single cell analysis\n",
      "     swim  Perform super-secret operation\n",
      "     quantmerge Merge multiple quantifications into a single file\n"
     ]
    }
   ],
   "source": [
    "!salmon --help"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "AdultMouseRep3sub1M_S1_L001_R1_001.fastq.gz\n",
      "AdultMouseRep3sub1M_S1_L001_R2_001.fastq.gz\n",
      "GRCm38.gencode.vM21.chr18.chr19.genome.fa\n",
      "GRCm38.gencode.vM21.chr18.chr19.gtf\n",
      "GRCm38.gencode.vM21.chr18.chr19.tgMap.txt\n",
      "GRCm38.gencode.vM21.chr18.chr19.txome.fa\n",
      ">ENSMUST00000234132.1|ENSMUSG00000117547.1|OTTMUSG00000072753.1|OTTMUST00000176063.1|AC125218.3-201|AC125218.3|252|processed_pseudogene|\n",
      "CCTTAACCATAGGTACAGGTAATCAACTCAGAATGAAAAGCCAGTAGCTATGAACAAGGCGGAGGTGCCACTGCTAACCC\n",
      "TGTGGCCACAGCACCCTTACCGCAGCTCTCAAGTGAGATTGAACGCCTCATGAGTCAGGGTTATTACTACCAGGACATTC\n",
      "AGAAATCTCTGGTCATTGCCCAAAACAACATTGAGATTGCTAAAAACATCCTCCAGGAATTTGTTTCTATTTCTTCTCCT\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "ls data/spermatogenesis_subset\n",
    "head -4 data/spermatogenesis_subset/GRCm38.gencode.vM21.chr18.chr19.txome.fa"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Salmon Indexing\n",
    "\n",
    "Indexing is the process by which salmon preprocess the reference sequences and store them into an efficient data-structure which is designed specifically to optimize the alignment speed & accuracy. Salmon follows a kmer-based indexing approach (more discussion to follow) which is enable by `salmon index` command. Understanding the command-line flags of a tool is very important to tweak the efficiency and customize the tool according to your usecase. Let's look into detail to some of the frequently used command-line flags and index the subsampled transcriptome."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Version Info: This is the most recent version of salmon.\n",
      "\n",
      "Index\n",
      "==========\n",
      "Creates a salmon index.\n",
      "\n",
      "Command Line Options:\n",
      "  -v [ --version ]              print version string\n",
      "  -h [ --help ]                 produce help message\n",
      "  -t [ --transcripts ] arg      Transcript fasta file.\n",
      "  -k [ --kmerLen ] arg (=31)    The size of k-mers that should be used for the \n",
      "                                quasi index.\n",
      "  -i [ --index ] arg            salmon index.\n",
      "  --gencode                     This flag will expect the input transcript \n",
      "                                fasta to be in GENCODE format, and will split \n",
      "                                the transcript name at the first '|' character.\n",
      "                                These reduced names will be used in the output \n",
      "                                and when looking for these transcripts in a \n",
      "                                gene to transcript GTF.\n",
      "  --features                    This flag will expect the input reference to be\n",
      "                                in the tsv file format, and will split the \n",
      "                                feature name at the first 'tab' character. \n",
      "                                These reduced names will be used in the output \n",
      "                                and when looking for the sequence of the \n",
      "                                features.GTF.\n",
      "  --keepDuplicates              This flag will disable the default indexing \n",
      "                                behavior of discarding sequence-identical \n",
      "                                duplicate transcripts.  If this flag is passed,\n",
      "                                then duplicate transcripts that appear in the \n",
      "                                input will be retained and quantified \n",
      "                                separately.\n",
      "  -p [ --threads ] arg (=2)     Number of threads to use during indexing.\n",
      "  --keepFixedFasta              Retain the fixed fasta file (without short \n",
      "                                transcripts and duplicates, clipped, etc.) \n",
      "                                generated during indexing\n",
      "  -f [ --filterSize ] arg (=-1) The size of the Bloom filter that will be used \n",
      "                                by TwoPaCo during indexing. The filter will be \n",
      "                                of size 2^{filterSize}. The default value of -1\n",
      "                                means that the filter size will be \n",
      "                                automatically set based on the number of \n",
      "                                distinct k-mers in the input, as estimated by \n",
      "                                nthll.\n",
      "  --tmpdir arg                  The directory location that will be used for \n",
      "                                TwoPaCo temporary files; it will be created if \n",
      "                                need be and be removed prior to indexing \n",
      "                                completion. The default value will cause a \n",
      "                                (temporary) subdirectory of the salmon index \n",
      "                                directory to be used for this purpose.\n",
      "  --sparse                      Build the index using a sparse sampling of \n",
      "                                k-mer positions This will require less memory \n",
      "                                (especially during quantification), but will \n",
      "                                take longer to construct and can slow down \n",
      "                                mapping / alignment\n",
      "  -d [ --decoys ] arg           Treat these sequences ids from the reference as\n",
      "                                the decoys that may have sequence homologous to\n",
      "                                some known transcript. for example in case of \n",
      "                                the genome, provide a list of chromosome name \n",
      "                                --- one per line\n",
      "  --type arg (=puff)            The type of index to build; the only option is \n",
      "                                \"puff\" in this version of salmon.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "!salmon index --help"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Some papers about indexing reference sequences.\n",
    "* rapmap paper: https://academic.oup.com/bioinformatics/article/32/12/i192/2288985\n",
    "* pufferfish paper: https://academic.oup.com/bioinformatics/article/34/13/i169/5045749\n",
    "* selective-alignment paper: https://www.biorxiv.org/content/10.1101/138800v2\n",
    "\n",
    "### Brief about kmer and indexing reference sequence.\n",
    "* slide: https://github.com/fmicompbio/adv_scrnaseq_2020/blob/master/scrna-seq-quantification/exercise.pdf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Version Info: This is the most recent version of salmon.\n",
      "index [\"data/spermatogenesis_subset/salmon_index\"] did not previously exist  . . . creating it\n",
      "[2020-05-27 14:36:24.134] [jLog] [warning] The salmon index is being built without any decoy sequences.  It is recommended that decoy sequence (either computed auxiliary decoy sequence or the genome of the organism) be provided during indexing. Further details can be found at https://salmon.readthedocs.io/en/latest/salmon.html#preparing-transcriptome-indices-mapping-based-mode.\n",
      "[2020-05-27 14:36:24.135] [jLog] [info] building index\n",
      "out : data/spermatogenesis_subset/salmon_index\n",
      "\u001b[00m[2020-05-27 14:36:24.135] [puff::index::jointLog] [info] Running fixFasta\n",
      "\u001b[00m\n",
      "[Step 1 of 4] : counting k-mers\n",
      "\n",
      "\u001b[33m\u001b[1m[2020-05-27 14:36:24.876] [puff::index::jointLog] [warning] Removed 16 transcripts that were sequence duplicates of indexed transcripts.\n",
      "\u001b[00m\u001b[33m\u001b[1m[2020-05-27 14:36:24.876] [puff::index::jointLog] [warning] If you wish to retain duplicate transcripts, please use the `--keepDuplicates` flag\n",
      "\u001b[00m\u001b[00m[2020-05-27 14:36:24.877] [puff::index::jointLog] [info] Replaced 0 non-ATCG nucleotides\n",
      "\u001b[00m\u001b[00m[2020-05-27 14:36:24.877] [puff::index::jointLog] [info] Clipped poly-A tails from 67 transcripts\n",
      "\u001b[00mwrote 7879 cleaned references\n",
      "\u001b[00m[2020-05-27 14:36:25.036] [puff::index::jointLog] [info] Filter size not provided; estimating from number of distinct k-mers\n",
      "\u001b[00m\u001b[00m[2020-05-27 14:36:25.216] [puff::index::jointLog] [info] ntHll estimated 6976516 distinct k-mers, setting filter size to 2^27\n",
      "\u001b[00mThreads = 2\n",
      "Vertex length = 31\n",
      "Hash functions = 5\n",
      "Filter size = 134217728\n",
      "Capacity = 2\n",
      "Files: \n",
      "data/spermatogenesis_subset/salmon_index/ref_k31_fixed.fa\n",
      "--------------------------------------------------------------------------------\n",
      "Round 0, 0:134217728\n",
      "Pass\tFilling\tFiltering\n",
      "1\t2\t5\t\n",
      "2\t0\t0\n",
      "True junctions count = 23456\n",
      "False junctions count = 42601\n",
      "Hash table size = 66057\n",
      "Candidate marks count = 190374\n",
      "--------------------------------------------------------------------------------\n",
      "Reallocating bifurcations time: 0\n",
      "True marks count: 121832\n",
      "Edges construction time: 1\n",
      "--------------------------------------------------------------------------------\n",
      "Distinct junctions = 23456\n",
      "\n",
      "allowedIn: 21\n",
      "Max Junction ID: 30505\n",
      "seen.size():244049 kmerInfo.size():30506\n",
      "approximateContigTotalLength: 5297227\n",
      "counters for complex kmers:\n",
      "(prec>1 & succ>1)=569 | (succ>1 & isStart)=20 | (prec>1 & isEnd)=14 | (isStart & isEnd)=2\n",
      "contig count: 35630 element count: 8051951 complex nodes: 605\n",
      "# of ones in rank vector: 35629\n",
      "\u001b[00m[2020-05-27 14:36:34.483] [puff::index::jointLog] [info] Starting the Pufferfish indexing by reading the GFA binary file.\n",
      "\u001b[00m\u001b[00m[2020-05-27 14:36:34.483] [puff::index::jointLog] [info] Setting the index/BinaryGfa directory data/spermatogenesis_subset/salmon_index\n",
      "\u001b[00msize = 8051951\n",
      "-----------------------------------------\n",
      "| Loading contigs | Time = 905.94 us\n",
      "-----------------------------------------\n",
      "size = 8051951\n",
      "-----------------------------------------\n",
      "| Loading contig boundaries | Time = 423.51 us\n",
      "-----------------------------------------\n",
      "Number of ones: 35636\n",
      "Number of ones per inventory item: 512\n",
      "Inventory entries filled: 70\n",
      "35629\n",
      "\u001b[00m[2020-05-27 14:36:34.506] [puff::index::jointLog] [info] Done wrapping the rank vector with a rank9sel structure.\n",
      "\u001b[00m\u001b[00m[2020-05-27 14:36:34.506] [puff::index::jointLog] [info] contig count for validation: 35,629\n",
      "\u001b[00m\u001b[00m[2020-05-27 14:36:34.520] [puff::index::jointLog] [info] Total # of Contigs : 35,629\n",
      "\u001b[00m\u001b[00m[2020-05-27 14:36:34.520] [puff::index::jointLog] [info] Total # of numerical Contigs : 35,629\n",
      "\u001b[00m\u001b[00m[2020-05-27 14:36:34.521] [puff::index::jointLog] [info] Total # of contig vec entries: 115,983\n",
      "\u001b[00m\u001b[00m[2020-05-27 14:36:34.521] [puff::index::jointLog] [info] bits per offset entry 17\n",
      "\u001b[00m\u001b[00m[2020-05-27 14:36:34.524] [puff::index::jointLog] [info] Done constructing the contig vector. 35630\n",
      "\u001b[00m\u001b[00m[2020-05-27 14:36:34.537] [puff::index::jointLog] [info] # segments = 35,629\n",
      "\u001b[00m\u001b[00m[2020-05-27 14:36:34.537] [puff::index::jointLog] [info] total length = 8,051,951\n",
      "\u001b[00m\u001b[00m[2020-05-27 14:36:34.542] [puff::index::jointLog] [info] Reading the reference files ...\n",
      "\u001b[00m\u001b[00m[2020-05-27 14:36:34.618] [puff::index::jointLog] [info] positional integer width = 23\n",
      "\u001b[00m\u001b[00m[2020-05-27 14:36:34.618] [puff::index::jointLog] [info] seqSize = 8,051,951\n",
      "\u001b[00m\u001b[00m[2020-05-27 14:36:34.618] [puff::index::jointLog] [info] rankSize = 8,051,951\n",
      "\u001b[00m\u001b[00m[2020-05-27 14:36:34.618] [puff::index::jointLog] [info] edgeVecSize = 0\n",
      "\u001b[00m\u001b[00m[2020-05-27 14:36:34.618] [puff::index::jointLog] [info] num keys = 6,983,081\n",
      "\u001b[00mfor info, total work write each  : 2.331    total work inram from level 3 : 4.322  total work raw : 25.000 \n",
      "[Building BooPHF]  100  %   elapsed:   0 min 1  sec   remaining:   0 min 0  sec\n",
      "Bitarray        36594880  bits (100.00 %)   (array + ranks )\n",
      "final hash             0  bits (0.00 %) (nb in final hash 0)\n",
      "\u001b[00m[2020-05-27 14:36:35.274] [puff::index::jointLog] [info] mphf size = 4.36245 MB\n",
      "\u001b[00m\u001b[00m[2020-05-27 14:36:35.274] [puff::index::jointLog] [info] chunk size = 4,025,976\n",
      "\u001b[00m\u001b[00m[2020-05-27 14:36:35.274] [puff::index::jointLog] [info] chunk 0 = [0, 4,025,976)\n",
      "\u001b[00m\u001b[00m[2020-05-27 14:36:35.274] [puff::index::jointLog] [info] chunk 1 = [4,025,976, 8,051,921)\n",
      "\u001b[00m\u001b[00m[2020-05-27 14:36:36.332] [puff::index::jointLog] [info] finished populating pos vector\n",
      "\u001b[00m\u001b[00m[2020-05-27 14:36:36.332] [puff::index::jointLog] [info] writing index components\n",
      "\u001b[00m\u001b[00m[2020-05-27 14:36:36.413] [puff::index::jointLog] [info] finished writing dense pufferfish index\n",
      "\u001b[00m[2020-05-27 14:36:36.418] [jLog] [info] done building index\n"
     ]
    }
   ],
   "source": [
    "! salmon index -t data/spermatogenesis_subset/GRCm38.gencode.vM21.chr18.chr19.txome.fa -k 31 -i data/spermatogenesis_subset/salmon_index --gencode -p 2 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "complete_ref_lens.bin\tinfo.json\t  rank.bin\t       refseq.bin\n",
      "ctable.bin\t\tmphf.bin\t  refAccumLengths.bin  seq.bin\n",
      "ctg_offsets.bin\t\tpos.bin\t\t  ref_indexing.log     versionInfo.json\n",
      "duplicate_clusters.tsv\tpre_indexing.log  reflengths.bin\n"
     ]
    }
   ],
   "source": [
    "! ls data/spermatogenesis_subset/salmon_index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "RetainedRef\tDuplicateRef\n",
      "ENSMUST00000198203.1\tENSMUST00000199618.1\n",
      "ENSMUST00000235145.1\tENSMUST00000237994.1\n",
      "ENSMUST00000236485.1\tENSMUST00000237580.1\n"
     ]
    }
   ],
   "source": [
    "! head -4 data/spermatogenesis_subset/salmon_index/duplicate_clusters.tsv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Understanding the Input data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Droplet-based single-cell sequencing experiments like Drop-seq, 10x Chromium, typically generate a set of paired-end (PE) FASTQ file. Based on the requirements of an experiment, a library is generated with fixed Cellular Barcode (CB) and UMI length, typically 16 & 10 for 10x V2, 16 & 12 for 10x V3 and 14 & 10 for Drop-seq single-cell protocol.  \n",
    "The PE FASTQ files are generated in a set of two files, typically recognized through `R1` and `R2` tags in their name. `R1` file contains the concatenated sequence of CB & UMI while `R2` file contains the transcript read sequence. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "@J00167:56:HK2GNBBXX:6:1227:13352:9684 1:N:0:0\n",
      "TTGACTTGTGAGGGAGTGCCCTGCTG\n",
      "+\n",
      "AAFFFJJJJJJJJJJJJJJJJJJJJJ\n",
      "\n",
      "gzip: stdout: Broken pipe\n"
     ]
    }
   ],
   "source": [
    "!zcat data/spermatogenesis_subset/AdultMouseRep3sub1M_S1_L001_R1_001.fastq.gz | head -4 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "@J00167:56:HK2GNBBXX:6:1227:13352:9684 3:N:0:0\n",
      "AGAAGAGCCTGGACAGATGTTATACAGACACTAAGAGAACACAAATTCCAGCCCAGGCTACTATACCCAGCCAACTCTCAATTACCATAGATGGAGAAAC\n",
      "+\n",
      "AAFFFJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJ<JJJJJJJJJJJJJJJJJJJJJJJFJJFFJJFFJJJJJJJJJJJ\n",
      "\n",
      "gzip: stdout: Broken pipe\n"
     ]
    }
   ],
   "source": [
    "!zcat data/spermatogenesis_subset/AdultMouseRep3sub1M_S1_L001_R2_001.fastq.gz | head -4 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ENSMUST00000234132.1\tAC125218.3\n",
      "ENSMUST00000176956.1\tVmn1r-ps151\n",
      "ENSMUST00000176452.1\tVmn1r-ps152\n",
      "ENSMUST00000234774.1\tAC125218.2\n"
     ]
    }
   ],
   "source": [
    "! head -4 data/spermatogenesis_subset/GRCm38.gencode.vM21.chr18.chr19.tgMap.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## dscRNA-seq Quantification w/ alevin\n",
    "\n",
    "As we now have basic understanding of some of the inputs required by alevin for the quantification of dscRNA-seq data, let's take a deeper dive into some of frequently used command-line flag (options) for the `salmon alevin` command."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### some useful links\n",
    "* libtype: https://salmon.readthedocs.io/en/latest/salmon.html#what-s-this-libtype\n",
    "* single-cell protocol type: https://github.com/COMBINE-lab/salmon/blob/master/include/SingleCellProtocols.hpp#L28-L84"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Version Info: This is the most recent version of salmon.\n",
      "\n",
      "alevin\n",
      "==========\n",
      "salmon-based processing of single-cell RNA-seq data.\n",
      "\n",
      "alevin options:\n",
      "\n",
      "\n",
      "mapping input options:\n",
      "  -l [ --libType ] arg                  Format string describing the library \n",
      "                                        type\n",
      "  -i [ --index ] arg                    salmon index\n",
      "  -r [ --unmatedReads ] arg             List of files containing unmated reads \n",
      "                                        of (e.g. single-end reads)\n",
      "  -1 [ --mates1 ] arg                   File containing the #1 mates\n",
      "  -2 [ --mates2 ] arg                   File containing the #2 mates\n",
      "\n",
      "\n",
      "alevin-specific Options:\n",
      "  -v [ --version ]                      print version string\n",
      "  -h [ --help ]                         produce help message\n",
      "  -o [ --output ] arg                   Output quantification directory.\n",
      "  -p [ --threads ] arg (=2)             The number of threads to use \n",
      "                                        concurrently.\n",
      "  --tgMap arg                           transcript to gene map tsv file\n",
      "  --hash arg                            Secondary input point for Alevin using \n",
      "                                        Big freaking Hash (bfh.txt) file. Works\n",
      "                                        Only with --chromium\n",
      "  --dropseq                             Use DropSeq Single Cell protocol for \n",
      "                                        the library\n",
      "  --chromiumV3                          Use 10x chromium v3 Single Cell \n",
      "                                        protocol for the library.\n",
      "  --chromium                            Use 10x chromium v2 Single Cell \n",
      "                                        protocol for the library.\n",
      "  --gemcode                             Use 10x gemcode v1 Single Cell protocol\n",
      "                                        for the library.\n",
      "  --citeseq                             Use CITESeq Single Cell protocol for \n",
      "                                        the library, 16 CB, 12 UMI and \n",
      "                                        features.\n",
      "  --celseq                              Use CEL-Seq Single Cell protocol for \n",
      "                                        the library.\n",
      "  --celseq2                             Use CEL-Seq2 Single Cell protocol for \n",
      "                                        the library.\n",
      "  --quartzseq2                          Use Quartz-Seq2 v3.2 Single Cell \n",
      "                                        protocol for the library assumes 15 \n",
      "                                        length barcode and 8 length UMI.\n",
      "  --whitelist arg                       File containing white-list barcodes\n",
      "  --featureStart arg                    This flag should be used with citeseq \n",
      "                                        and specifies the starting index of the\n",
      "                                        feature barcode on Read2.\n",
      "  --featureLength arg                   This flag should be used with citeseq \n",
      "                                        and specifies the length of the feature\n",
      "                                        barcode.\n",
      "  --noQuant                             Don't run downstream barcode-salmon \n",
      "                                        model.\n",
      "  --numCellBootstraps arg (=0)          Generate mean and variance for cell x \n",
      "                                        gene matrix quantification estimates.\n",
      "  --forceCells arg (=0)                 Explicitly specify the number of cells.\n",
      "  --expectCells arg (=0)                define a close upper bound on expected \n",
      "                                        number of cells\n",
      "  --mrna arg                            path to a file containing mito-RNA \n",
      "                                        gene, one per line\n",
      "  --rrna arg                            path to a file containing ribosomal \n",
      "                                        RNA, one per line\n",
      "  --keepCBFraction arg (=0)             fraction of CB to keep, value must be \n",
      "                                        in range (0,1], use 1 to quantify all \n",
      "                                        CB.\n",
      "  --end arg                             Cell-Barcodes end (5 or 3) location in \n",
      "                                        the read sequence from where barcode \n",
      "                                        has tobe extracted. (end, umiLength, \n",
      "                                        barcodeLength) should all be provided \n",
      "                                        if using this option\n",
      "  --umiLength arg                       umi length Parameter for unknown \n",
      "                                        protocol. (end, umiLength, \n",
      "                                        barcodeLength) should all be provided \n",
      "                                        if using this option\n",
      "  --barcodeLength arg                   umi length Parameter for unknown \n",
      "                                        protocol. (end, umiLength, \n",
      "                                        barcodeLength) should all be provided \n",
      "                                        if using this option\n",
      "  --noem                                do not run em\n",
      "  --freqThreshold arg (=10)             threshold for the frequency of the \n",
      "                                        barcodes\n",
      "  --umiEditDistance arg (=1)            Maximum allowble edit distance to \n",
      "                                        collapse UMIs, Expect delay in running \n",
      "                                        time if != 1\n",
      "  --dumpfq                              Dump barcode modified fastq file for \n",
      "                                        downstream analysis by using coin toss \n",
      "                                        for multi-mapping.\n",
      "  --dumpBfh                             dump the big hash with all the barcodes\n",
      "                                        and the UMI sequence.\n",
      "  --dumpArborescences                   dump the gene-v-cell matrix for the \n",
      "                                        total number of fragments used in the \n",
      "                                        UMI deduplicaiton.\n",
      "  --dumpUmiGraph                        dump the per cell level Umi Graph.\n",
      "  --dumpFeatures                        Dump features for whitelist and \n",
      "                                        downstream analysis.\n",
      "  --dumpMtx                             Dump cell v transcripts count matrix in\n",
      "                                        sparse mtx format.\n",
      "  --lowRegionMinNumBarcodes arg (=200)  Minimum Number of CB to use for \n",
      "                                        learning Low confidence region \n",
      "                                        (Default: 200).\n",
      "  --maxNumBarcodes arg (=100000)        Maximum allowable limit to process the \n",
      "                                        cell barcodes. (Default: 100000)\n",
      "\n"
     ]
    }
   ],
   "source": [
    "! salmon alevin --help"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Version Info: This is the most recent version of salmon.\n",
      "Logs will be written to data/spermatogenesis_subset/alevin_output/logs\n",
      "\u001b[00m[2020-05-27 15:04:51.366] [jointLog] [info] setting maxHashResizeThreads to 2\n",
      "\u001b[00m\u001b[00m[2020-05-27 15:04:51.366] [jointLog] [info] Fragment incompatibility prior below threshold.  Incompatible fragments will be ignored.\n",
      "\u001b[00m\u001b[00m[2020-05-27 15:04:51.366] [jointLog] [info] The --mimicBT2, --mimicStrictBT2 and --hardFilter flags imply mapping validation (--validateMappings). Enabling mapping validation.\n",
      "\u001b[00m\u001b[00m[2020-05-27 15:04:51.366] [jointLog] [info] Usage of --validateMappings implies use of minScoreFraction. Since not explicitly specified, it is being set to 0.65\n",
      "\u001b[00m\u001b[00m[2020-05-27 15:04:51.366] [jointLog] [info] The use of range-factorized equivalence classes does not make sense in conjunction with --hardFilter.  Disabling range-factorized equivalence classes. \n",
      "\u001b[00m\u001b[00m[2020-05-27 15:04:51.366] [jointLog] [info] Usage of --validateMappings implies a default consensus slack of 0.2. Setting consensusSlack to 0.35.\n",
      "\u001b[00m\u001b[00m[2020-05-27 15:04:51.366] [jointLog] [info] Using default value of 0.87 for minScoreFraction in Alevin\n",
      "Using default value of 0.6 for consensusSlack in Alevin\n",
      "\u001b[00m\u001b[00m[2020-05-27 15:04:51.384] [alevinLog] [info] Found 7879 transcripts(+0 decoys, +0 short and +0 duplicate names in the index)\n",
      "\u001b[00m### alevin (dscRNA-seq quantification) v1.2.1\n",
      "### [ program ] => salmon \n",
      "### [ command ] => alevin \n",
      "### [ libType ] => { ISR }\n",
      "### [ mates1 ] => { data/spermatogenesis_subset/AdultMouseRep3sub1M_S1_L001_R1_001.fastq.gz }\n",
      "### [ mates2 ] => { data/spermatogenesis_subset/AdultMouseRep3sub1M_S1_L001_R2_001.fastq.gz }\n",
      "### [ chromium ] => { }\n",
      "### [ index ] => { data/spermatogenesis_subset/salmon_index }\n",
      "### [ threads ] => { 2 }\n",
      "### [ output ] => { data/spermatogenesis_subset/alevin_output }\n",
      "### [ tgMap ] => { data/spermatogenesis_subset/GRCm38.gencode.vM21.chr18.chr19.tgMap.txt }\n",
      "### [ expectCells ] => { 1000 }\n",
      "\n",
      "\n",
      "\u001b[00m[2020-05-27 15:04:51.444] [alevinLog] [info] Filled with 7879 txp to gene entries \n",
      "\u001b[00m\u001b[00m[2020-05-27 15:04:51.445] [alevinLog] [info] Found all transcripts to gene mappings\n",
      "\u001b[00m\u001b[00m[2020-05-27 15:04:51.561] [alevinLog] [info] Processing barcodes files (if Present) \n",
      "\n",
      " \n",
      "\u001b[32mprocessed\u001b[31m 1 Million \u001b[32mbarcodes\u001b[0m\n",
      "\n",
      "\u001b[00m[2020-05-27 15:04:53.462] [alevinLog] [info] Done barcode density calculation.\n",
      "\u001b[00m\u001b[00m[2020-05-27 15:04:53.462] [alevinLog] [info] # Barcodes Used: \u001b[32m1000000\u001b[0m / \u001b[31m1000000\u001b[0m.\n",
      "\u001b[00m\u001b[00m[2020-05-27 15:04:53.561] [alevinLog] [info] Total \u001b[32m2103\u001b[0m(has \u001b[32m699\u001b[0m low confidence) barcodes\n",
      "\u001b[00m\u001b[00m[2020-05-27 15:04:53.691] [alevinLog] [info] Done True Barcode Sampling\n",
      "\u001b[00m\u001b[00m[2020-05-27 15:04:53.715] [alevinLog] [info] Total 41.4697% reads will be thrown away because of noisy Cellular barcodes.\n",
      "\u001b[00m\u001b[00m[2020-05-27 15:04:53.834] [alevinLog] [info] Done populating Z matrix\n",
      "\u001b[00m\u001b[00m[2020-05-27 15:04:53.834] [alevinLog] [info] Total 86 CB got sequence corrected\n",
      "\u001b[00m\u001b[00m[2020-05-27 15:04:53.834] [alevinLog] [info] Done indexing Barcodes\n",
      "\u001b[00m\u001b[00m[2020-05-27 15:04:53.834] [alevinLog] [info] Total Unique barcodes found: 73076\n",
      "\u001b[00m\u001b[00m[2020-05-27 15:04:53.834] [alevinLog] [info] Used Barcodes except Whitelist: 86\n",
      "\u001b[00m\u001b[00m[2020-05-27 15:04:53.856] [alevinLog] [info] Done with Barcode Processing; Moving to Quantify\n",
      "\n",
      "\u001b[00m\u001b[00m[2020-05-27 15:04:53.856] [alevinLog] [info] parsing read library format\n",
      "\u001b[00m\u001b[00m[2020-05-27 15:04:53.856] [jointLog] [info] There is 1 library.\n",
      "\u001b[00m-----------------------------------------\n",
      "| Loading contig table | Time = 106.78 ms\n",
      "-----------------------------------------\n",
      "size = 35630\n",
      "-----------------------------------------\n",
      "| Loading contig offsets | Time = 25.398 ms\n",
      "-----------------------------------------\n",
      "-----------------------------------------\n",
      "| Loading reference lengths | Time = 8.8671 ms\n",
      "-----------------------------------------\n",
      "-----------------------------------------\n",
      "| Loading mphf table | Time = 119.65 ms\n",
      "-----------------------------------------\n",
      "\u001b[00m[2020-05-27 15:04:54.966] [jointLog] [info] Loading pufferfish index\n",
      "\u001b[00m\u001b[00m[2020-05-27 15:04:54.977] [jointLog] [info] Loading dense pufferfish index.\n",
      "\u001b[00msize = 8051951\n",
      "Number of ones: 35636\n",
      "Number of ones per inventory item: 512\n",
      "Inventory entries filled: 70\n",
      "-----------------------------------------\n",
      "| Loading contig boundaries | Time = 54.41 ms\n",
      "-----------------------------------------\n",
      "size = 8051951\n",
      "-----------------------------------------\n",
      "| Loading sequence | Time = 66.411 ms\n",
      "-----------------------------------------\n",
      "size = 6983081\n",
      "-----------------------------------------\n",
      "| Loading positions | Time = 337.98 ms\n",
      "-----------------------------------------\n",
      "size = 13593212\n",
      "-----------------------------------------\n",
      "| Loading reference sequence | Time = 124.57 ms\n",
      "-----------------------------------------\n",
      "-----------------------------------------\n",
      "| Loading reference accumulative lengths | Time = 19.051 ms\n",
      "-----------------------------------------\n",
      "\u001b[00m[2020-05-27 15:04:55.862] [jointLog] [info] done\n",
      "\u001b[00m\u001b[00m[2020-05-27 15:04:55.862] [jointLog] [info] Index contained 7,879 targets\n",
      "\u001b[00m\u001b[00m[2020-05-27 15:04:55.893] [jointLog] [info] Number of decoys : 0\n",
      "\u001b[00m\n",
      "\n",
      "\n",
      "\n",
      "\u001b[32mprocessed\u001b[31m 0 Million \u001b[32mfragments\u001b[0m\n",
      "\u001b[32mprocessed\u001b[31m 1 Million \u001b[32mfragments\u001b[0m\n",
      "hits: 1724, hits per frag:  0.00173266\n",
      "\n",
      "\n",
      "\n",
      "\u001b[00m[2020-05-27 15:05:03.681] [jointLog] [info] Computed 75 rich equivalence classes for further processing\n",
      "\u001b[00m\u001b[00m[2020-05-27 15:05:03.681] [jointLog] [info] Counted 551 total reads in the equivalence classes \n",
      "\u001b[00m\u001b[00m[2020-05-27 15:05:03.681] [jointLog] [info] Number of fragments discarded because they are best-mapped to decoys : 0\n",
      "\u001b[00m\u001b[00m[2020-05-27 15:05:03.682] [jointLog] [info] Mapping rate = 0.0551%\n",
      "\n",
      "\u001b[00m\u001b[00m[2020-05-27 15:05:03.682] [jointLog] [info] finished quantifyLibrary()\n",
      "\u001b[00m\u001b[00m[2020-05-27 15:05:03.696] [alevinLog] [info] Starting optimizer\n",
      "\n",
      "\n",
      "\u001b[32mAnalyzed 1 cells (\u001b[31m0%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 2 cells (\u001b[31m0%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 3 cells (\u001b[31m0%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 5 cells (\u001b[31m0%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 6 cells (\u001b[31m0%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 9 cells (\u001b[31m0%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 11 cells (\u001b[31m1%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 13 cells (\u001b[31m1%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 15 cells (\u001b[31m1%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 20 cells (\u001b[31m1%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 21 cells (\u001b[31m1%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 24 cells (\u001b[31m1%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 28 cells (\u001b[31m1%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 29 cells (\u001b[31m1%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 31 cells (\u001b[31m1%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 33 cells (\u001b[31m2%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 37 cells (\u001b[31m2%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 38 cells (\u001b[31m2%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 40 cells (\u001b[31m2%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 41 cells (\u001b[31m2%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 43 cells (\u001b[31m2%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 46 cells (\u001b[31m2%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 47 cells (\u001b[31m2%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 49 cells (\u001b[31m2%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 54 cells (\u001b[31m3%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 56 cells (\u001b[31m3%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 58 cells (\u001b[31m3%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 60 cells (\u001b[31m3%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 64 cells (\u001b[31m3%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 65 cells (\u001b[31m3%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 67 cells (\u001b[31m3%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 70 cells (\u001b[31m3%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 71 cells (\u001b[31m3%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 73 cells (\u001b[31m3%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 76 cells (\u001b[31m4%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 78 cells (\u001b[31m4%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 80 cells (\u001b[31m4%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 81 cells (\u001b[31m4%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 86 cells (\u001b[31m4%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 90 cells (\u001b[31m4%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 91 cells (\u001b[31m4%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 93 cells (\u001b[31m4%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 96 cells (\u001b[31m5%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 102 cells (\u001b[31m5%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 104 cells (\u001b[31m5%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 106 cells (\u001b[31m5%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 109 cells (\u001b[31m5%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 113 cells (\u001b[31m5%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 114 cells (\u001b[31m5%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 116 cells (\u001b[31m6%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 117 cells (\u001b[31m6%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 119 cells (\u001b[31m6%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 121 cells (\u001b[31m6%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 129 cells (\u001b[31m6%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 134 cells (\u001b[31m6%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 135 cells (\u001b[31m6%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 136 cells (\u001b[31m6%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 141 cells (\u001b[31m7%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 146 cells (\u001b[31m7%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 150 cells (\u001b[31m7%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 151 cells (\u001b[31m7%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 152 cells (\u001b[31m7%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 155 cells (\u001b[31m7%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 156 cells (\u001b[31m7%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 157 cells (\u001b[31m7%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 160 cells (\u001b[31m8%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 164 cells (\u001b[31m8%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 167 cells (\u001b[31m8%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 169 cells (\u001b[31m8%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 170 cells (\u001b[31m8%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 171 cells (\u001b[31m8%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 176 cells (\u001b[31m8%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 182 cells (\u001b[31m9%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 184 cells (\u001b[31m9%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 186 cells (\u001b[31m9%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 189 cells (\u001b[31m9%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 190 cells (\u001b[31m9%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 193 cells (\u001b[31m9%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 195 cells (\u001b[31m9%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 197 cells (\u001b[31m9%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 199 cells (\u001b[31m9%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 200 cells (\u001b[31m10%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 204 cells (\u001b[31m10%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 207 cells (\u001b[31m10%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 208 cells (\u001b[31m10%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 209 cells (\u001b[31m10%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 213 cells (\u001b[31m10%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 216 cells (\u001b[31m10%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 218 cells (\u001b[31m10%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 224 cells (\u001b[31m11%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 229 cells (\u001b[31m11%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 230 cells (\u001b[31m11%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 232 cells (\u001b[31m11%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 238 cells (\u001b[31m11%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 239 cells (\u001b[31m11%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 241 cells (\u001b[31m11%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 242 cells (\u001b[31m12%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 248 cells (\u001b[31m12%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 249 cells (\u001b[31m12%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 250 cells (\u001b[31m12%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 254 cells (\u001b[31m12%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 264 cells (\u001b[31m13%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 266 cells (\u001b[31m13%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 272 cells (\u001b[31m13%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 280 cells (\u001b[31m13%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 300 cells (\u001b[31m14%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 301 cells (\u001b[31m14%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 304 cells (\u001b[31m14%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 305 cells (\u001b[31m15%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 306 cells (\u001b[31m15%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 307 cells (\u001b[31m15%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 308 cells (\u001b[31m15%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 310 cells (\u001b[31m15%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 314 cells (\u001b[31m15%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 317 cells (\u001b[31m15%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 319 cells (\u001b[31m15%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 320 cells (\u001b[31m15%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 324 cells (\u001b[31m15%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 327 cells (\u001b[31m16%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 328 cells (\u001b[31m16%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 332 cells (\u001b[31m16%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 333 cells (\u001b[31m16%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 338 cells (\u001b[31m16%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 340 cells (\u001b[31m16%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 343 cells (\u001b[31m16%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 347 cells (\u001b[31m17%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 349 cells (\u001b[31m17%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 350 cells (\u001b[31m17%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 359 cells (\u001b[31m17%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 363 cells (\u001b[31m17%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 367 cells (\u001b[31m17%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 373 cells (\u001b[31m18%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 374 cells (\u001b[31m18%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 376 cells (\u001b[31m18%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 380 cells (\u001b[31m18%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 383 cells (\u001b[31m18%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 393 cells (\u001b[31m19%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 395 cells (\u001b[31m19%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 398 cells (\u001b[31m19%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 400 cells (\u001b[31m19%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 401 cells (\u001b[31m19%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 405 cells (\u001b[31m19%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 406 cells (\u001b[31m19%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 407 cells (\u001b[31m19%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 420 cells (\u001b[31m20%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 421 cells (\u001b[31m20%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 427 cells (\u001b[31m20%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 430 cells (\u001b[31m20%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 432 cells (\u001b[31m21%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 435 cells (\u001b[31m21%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 441 cells (\u001b[31m21%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 443 cells (\u001b[31m21%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 444 cells (\u001b[31m21%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 445 cells (\u001b[31m21%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 450 cells (\u001b[31m21%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 451 cells (\u001b[31m21%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 467 cells (\u001b[31m22%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 470 cells (\u001b[31m22%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 472 cells (\u001b[31m22%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 473 cells (\u001b[31m23%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 480 cells (\u001b[31m23%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 481 cells (\u001b[31m23%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 483 cells (\u001b[31m23%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 484 cells (\u001b[31m23%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 492 cells (\u001b[31m23%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 494 cells (\u001b[31m24%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 505 cells (\u001b[31m24%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 506 cells (\u001b[31m24%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 512 cells (\u001b[31m24%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 515 cells (\u001b[31m25%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 523 cells (\u001b[31m25%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 525 cells (\u001b[31m25%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 531 cells (\u001b[31m25%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 534 cells (\u001b[31m25%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 542 cells (\u001b[31m26%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 545 cells (\u001b[31m26%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 547 cells (\u001b[31m26%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 552 cells (\u001b[31m26%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 560 cells (\u001b[31m27%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 563 cells (\u001b[31m27%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 566 cells (\u001b[31m27%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 569 cells (\u001b[31m27%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 573 cells (\u001b[31m27%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 575 cells (\u001b[31m27%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 586 cells (\u001b[31m28%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 591 cells (\u001b[31m28%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 597 cells (\u001b[31m28%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 598 cells (\u001b[31m28%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 601 cells (\u001b[31m29%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 603 cells (\u001b[31m29%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 608 cells (\u001b[31m29%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 609 cells (\u001b[31m29%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 610 cells (\u001b[31m29%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 612 cells (\u001b[31m29%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 613 cells (\u001b[31m29%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 614 cells (\u001b[31m29%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 615 cells (\u001b[31m29%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 616 cells (\u001b[31m29%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 618 cells (\u001b[31m29%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 633 cells (\u001b[31m30%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 659 cells (\u001b[31m31%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 660 cells (\u001b[31m31%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 662 cells (\u001b[31m31%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 663 cells (\u001b[31m32%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 664 cells (\u001b[31m32%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 668 cells (\u001b[31m32%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 671 cells (\u001b[31m32%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 673 cells (\u001b[31m32%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 674 cells (\u001b[31m32%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 695 cells (\u001b[31m33%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 702 cells (\u001b[31m33%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 704 cells (\u001b[31m33%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 710 cells (\u001b[31m34%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 711 cells (\u001b[31m34%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 719 cells (\u001b[31m34%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 722 cells (\u001b[31m34%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 730 cells (\u001b[31m35%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 734 cells (\u001b[31m35%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 739 cells (\u001b[31m35%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 741 cells (\u001b[31m35%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 743 cells (\u001b[31m35%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 748 cells (\u001b[31m36%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 752 cells (\u001b[31m36%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 756 cells (\u001b[31m36%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 758 cells (\u001b[31m36%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 759 cells (\u001b[31m36%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 760 cells (\u001b[31m36%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 771 cells (\u001b[31m37%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 780 cells (\u001b[31m37%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 781 cells (\u001b[31m37%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 784 cells (\u001b[31m37%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 787 cells (\u001b[31m37%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 797 cells (\u001b[31m38%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 798 cells (\u001b[31m38%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 807 cells (\u001b[31m38%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 825 cells (\u001b[31m39%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 833 cells (\u001b[31m40%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 835 cells (\u001b[31m40%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 836 cells (\u001b[31m40%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 837 cells (\u001b[31m40%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 840 cells (\u001b[31m40%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 842 cells (\u001b[31m40%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 845 cells (\u001b[31m40%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 849 cells (\u001b[31m40%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 850 cells (\u001b[31m40%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 858 cells (\u001b[31m41%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 868 cells (\u001b[31m41%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 869 cells (\u001b[31m41%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 871 cells (\u001b[31m41%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 881 cells (\u001b[31m42%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 884 cells (\u001b[31m42%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 890 cells (\u001b[31m42%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 893 cells (\u001b[31m42%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 903 cells (\u001b[31m43%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 905 cells (\u001b[31m43%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 909 cells (\u001b[31m43%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 914 cells (\u001b[31m43%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 915 cells (\u001b[31m44%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 917 cells (\u001b[31m44%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 921 cells (\u001b[31m44%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 941 cells (\u001b[31m45%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 953 cells (\u001b[31m45%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 969 cells (\u001b[31m46%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 975 cells (\u001b[31m46%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 991 cells (\u001b[31m47%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 992 cells (\u001b[31m47%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1001 cells (\u001b[31m48%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1004 cells (\u001b[31m48%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1009 cells (\u001b[31m48%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1015 cells (\u001b[31m48%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1019 cells (\u001b[31m48%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1038 cells (\u001b[31m49%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1043 cells (\u001b[31m50%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1050 cells (\u001b[31m50%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1084 cells (\u001b[31m52%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1090 cells (\u001b[31m52%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1097 cells (\u001b[31m52%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1106 cells (\u001b[31m53%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1111 cells (\u001b[31m53%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1133 cells (\u001b[31m54%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1135 cells (\u001b[31m54%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1136 cells (\u001b[31m54%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1146 cells (\u001b[31m55%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1162 cells (\u001b[31m55%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1208 cells (\u001b[31m57%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1210 cells (\u001b[31m58%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1215 cells (\u001b[31m58%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1224 cells (\u001b[31m58%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1228 cells (\u001b[31m58%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1249 cells (\u001b[31m59%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1276 cells (\u001b[31m61%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1282 cells (\u001b[31m61%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1315 cells (\u001b[31m63%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1391 cells (\u001b[31m66%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1404 cells (\u001b[31m67%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1417 cells (\u001b[31m67%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1418 cells (\u001b[31m67%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1427 cells (\u001b[31m68%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1428 cells (\u001b[31m68%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1430 cells (\u001b[31m68%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1442 cells (\u001b[31m69%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1447 cells (\u001b[31m69%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1472 cells (\u001b[31m70%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1474 cells (\u001b[31m70%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1475 cells (\u001b[31m70%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1482 cells (\u001b[31m71%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1485 cells (\u001b[31m71%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1502 cells (\u001b[31m71%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1511 cells (\u001b[31m72%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1519 cells (\u001b[31m72%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1520 cells (\u001b[31m72%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1522 cells (\u001b[31m72%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1524 cells (\u001b[31m73%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1546 cells (\u001b[31m74%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1550 cells (\u001b[31m74%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1561 cells (\u001b[31m74%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1577 cells (\u001b[31m75%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1584 cells (\u001b[31m75%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1613 cells (\u001b[31m77%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1633 cells (\u001b[31m78%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1666 cells (\u001b[31m79%\u001b[32m of all).\u001b[0m\n",
      "\u001b[32mAnalyzed 1711 cells (\u001b[31m81%\u001b[32m of all).\u001b[0m\n",