ax.scatter(x = group["Sepal.Length"], y = group["Petal.Length"], label=name)
ax.legend()
ax.set_xlabel('Sepal length (mm)', fontsize=12)
ax.set_ylabel('Petal length (mm)', fontsize=12)
```
## Arrays
Here is a quick summary of how R and Python differ in the way they represent arrays.
For a more detailed discussion see this [link](https://rstudio.github.io/reticulate/articles/arrays.html).
Dense data (for example a matrix or 2-dimensional array) are stored contiguously in memory,
addressed by a single index (the memory address). Array memory ordering schemes translate that
single index into multiple indices corresponding to the array coordinates. For example, matrices
have two indices: rows and columns.
R and Python differ in their memory ordering schemes: R is so called "column-major", meaning
that data is layed-out in memory such that the first coordinate is the one changing fastest.
To matrix element `x[i, j]` (1-based indices) for example can be found at index
`a = i + (j - 1) * nrow(x)`.
Python/NumPy can store arrays both in "column-major" and also "row-major" form, but
it defaults to the "row-major" format.
In a row-major layout, the above example element `x[i, j]` would be found at
index `a = j + (i - 1) * ncol(x)`.
If you are using/addressing arrays both in Python and R, it is good to know:
- Dense R arrays are presented to Python/NumPy as column-major NumPy arrays.
- All NumPy arrays (column-major, row-major, otherwise) are presented to R as column-major arrays, because that is the only kind of dense array that R understands.
- R and Python print arrays differently.
The different order of printing may be especially confusing, as illustrated in the code below.
In this example, we use reticulate's `import` function to access a Python module and
create a NumPy array, and `py_to_r()` to explicitely convert it to an R array:
```{r, class.source="rchunk"}
# create a numpy array
np <- import("numpy", convert=FALSE)
aP <- np$arange(1, 9)$reshape(2L, 2L, 2L)
class(aP)
# print it
aP
# convert it to R
aR <- py_to_r(aP)
class(aR)
# print it
aR
```
In Python, the array `.flags` tell us if the storage layout is row-major (`C_CONTIGUOUS`, from C-style)
or column-major (`F_CONTIGUOUS`, from Fortran-style):
```{python, class.source="pythonchunk"}
np.arange(1, 9).reshape(2, 2, 2).flags
```
While the two arrays `aP` and `aR` may look different at first, the indeed are the same.
For illustration, let’s pick out values along just the first "row", that is values with
a first index of 1 (R) or 0 (Python):
```{r, class.source="rchunk"}
# compare the sub-arrays at the first index of the first dimension
aR[1, , ]
aP[0]
```
As mentioned, more details are discussed [here](https://rstudio.github.io/reticulate/articles/arrays.html).
## Sparse matrices
If [scipy](https://scipy.org/) is installed, [reticulate](https://rstudio.github.io/reticulate/) will
automatically convert R `dgcMatrix` sparse matrix objects to [SciPy CSC
matrix](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csc_matrix.html) objects. Without a working [scipy](https://scipy.org/) installation, the code below will
throw a "cannot convert object" error.
```{r, class.source="rchunk"}
# load Matrix package and create sparse matrix A
library(Matrix)
i <- c(1,3:8)
j <- c(2,9,6:10)
x <- 7 * (1:7)
A <- sparseMatrix(i, j, x = x)
A
```
```{python, class.source="pythonchunk"}
# automatic conversion to SciPy CSC matrix (requires scipy)
r.A
r.A.shape
```
# Final remarks
- When sharing objects between Python and R, the object is often copied. In specific cases (e.g., numpy arrays), the object may not be copied (R and Python point to the same object in memory).
- R and Python use different default numeric types - if Python expects an integer, be sure to add L in R (e.g. `3L`).
- Keep in mind that Python uses 0-based indices, while R uses 1-based.
- Dots ('.') are not allowed in object names in Python. To avoid any issues, it is best to use names for your R objects without any dots.
# Get help on python functions
```{python}
help(os.listdir)
```
# Session info
It's good practise to include information on all used software and their version
at the end of the document. For R, we can use the built-in `sessionInfo()` function,
for python we can for example use the `sinfo` package: