Commit a479be9f authored by Pauline Maury Laribière's avatar Pauline Maury Laribière
Browse files

adding bash file

parent 31ae70e8
......@@ -19,9 +19,11 @@ import fso_metadata
```
## Functionnalities
Based on the metadata that you want, you will call certain functions and parameters. We describe here the list of possibilities:
Based on the metadata that you want, you will call certain functions and parameters.
In the first part, we describe the API available from everywhere, then we describe the API available only from within the confederation network.
### Codelists
### Available everywhere with the interoperability plateform (i14y)
#### Codelists
1. Export a codelist based on an identifier
```
response = get_codelist(identifier, export_format="SDMX-ML", version_format=2.1, annotations=False)
......@@ -29,15 +31,17 @@ response = get_codelist(identifier, export_format="SDMX-ML", version_format=2.1,
Parameters:
- identifier (str): the codelist's identifier
- export_format (str, default="SDMX-ML"): the export's format. Available are CSV, XLSX, SDMX-ML or SDMX-JSON.
- version_format (float, default=2.1): the export format's version (2.0 or 2.1 when format is SDMX-ML).
- export_format (str, default="SDMX-ML"): the export's format.
Available are CSV, XLSX, SDMX-ML or SDMX-JSON.
- version_format (float, default=2.1): the export format's version
(2.0 or 2.1 when format is SDMX-ML).
- annotations (bool, default=False): flag to include annotations
Returns:
- response (pd.DataFrame or dict) based on the export format
- a pd.DataFrame if export_format was CSV or XLSX
- a dictionnary if export_format was SDMX-ML or SDMX-JSON.
### ContentConfigurations
#### ContentConfigurations
1. Return the display information for the available configured content
```
......@@ -56,7 +60,47 @@ response = get_identifier_content(identifier)
Returns:
- response (dict): the nomenclature's information
### Nomenclatures
#### Datasets
1. Get the dataset description
```
response = get_dataset_description(identifier, language='fr')
```
Parameters:
- identifier (str): the nomenclature's identifier
- language (str, default='fr'): the language of the response data.
Available are 'fr', 'de', 'it', 'en'.
Returns:
- response: description's dictionnary
2. Get the dataset information
```
response = get_dataset_information(identifier, language='fr')
```
Parameters:
- identifier (str): the nomenclature's identifier
- language (str, default='fr'): the language of the response data.
Available are 'fr', 'de', 'it', 'en'.
Returns:
- response: information's dictionnary
#### Data Structures
1. Get the data structure
```
response = get_data_structure(identifier, language='fr')
```
Parameters:
- identifier (str): the nomenclature's identifier
- language (str, default='fr'): the language of the response data.
Available are 'fr', 'de', 'it', 'en'.
Returns:
- response: data structure's dictionnary
#### Nomenclatures
1. Get the nodes of a path within a nomenclature
```
......@@ -67,7 +111,8 @@ response = get_nomenclature_path_nodes(identifier, path, filters={}, language='f
- identifier (str): the nomenclature's identifier
- path (str): the path leading to the nodes
- filters (dict, default={}): the filters to apply
- language (str, default='fr'): the language of the response data. Available are 'fr', 'de', 'it', 'en'.
- language (str, default='fr'): the language of the response data.
Available are 'fr', 'de', 'it', 'en'.
Returns:
- response: dictionnary of the nodes
......@@ -80,10 +125,12 @@ response = get_nomenclature_one_level(identifier, level_number, filters={}, lang
- identifier (str): nomenclature's identifier
- level_number (int): level to export
- filter (default={}): additionnal filters
- language (str, default='fr'): response data's language Available are 'fr', 'de', 'it', 'en'.
- language (str, default='fr'): response data's language
Available are 'fr', 'de', 'it', 'en'.
- annotations (bool, default=False): flag to include annotations
Returns:
- response (pd.DataFrame): dataframe with 3 columns (Code, Parent and Name in the selected language)
- response (pd.DataFrame): dataframe with 3 columns
(Code, Parent and Name in the selected language)
3. Export multiple levels of a nomenclature (from `level_from` to `level_to`)
......@@ -96,10 +143,11 @@ response = get_nomenclature_multiple_levels(identifier, level_from, level_to, fi
- level_from (int): the 1st level to include
- level_to (int): the last level to include
- filter (default={}): additionnal filters
- language (str, default='fr'): response data's language Available are 'fr', 'de', 'it', 'en'.
- language (str, default='fr'): response data's language
Available are 'fr', 'de', 'it', 'en'.
- annotations (bool, default=False): flag to include annotations
Returns:
- response (pd.DataFrame): dataframe columns from `level_from` to `level_to` codes and additionnal data
- response (pd.DataFrame): dataframe columns from `level_from` to `level_to` codes
4. Search query within a nomenclature
......@@ -113,16 +161,20 @@ response = query_nomenclature(identifier, query, page_number, page_size, fiters=
- page_number (int): the number of the result page to return
- page_size (int): the size of each page result
- filter (default={}): additionnal filters
- language (str, default='fr'): response data's language Available are 'fr', 'de', 'it', 'en'.
- language (str, default='fr'): response data's language
Available are 'fr', 'de', 'it', 'en'.
Returns:
- response (dict): the query result
### Available only internally (Intern to confederation or via VPN)
All these function start with `dcat_`
### Agents
#### Agents
1. List all agents
```
response = list_all_agents()
response = dcat_list_all_agents()
```
Returns:
......@@ -130,7 +182,7 @@ response = list_all_agents()
2. Get the agent with the corresponding agent id
```
response = get_agent_from_id(agent_id)
response = dcat_get_agent_from_id(agent_id)
```
Parameters:
......@@ -138,10 +190,10 @@ response = get_agent_from_id(agent_id)
Returns:
- response (dict): agent with this id
### Datasets
#### Datasets
1. List all datasets
```
response = list_all_datasets()
response = dcat_list_all_datasets()
```
Returns:
......@@ -150,7 +202,7 @@ response = list_all_datasets()
2. Get all distributions for the dataset with the corresponding dataset id
```
response = get_distributions_from_dataset_id(dataset_id)
response = dcat_get_distributions_from_dataset_id(dataset_id)
```
Parameters:
......@@ -160,7 +212,7 @@ response = get_distributions_from_dataset_id(dataset_id)
3. Get the dataset with the corresponding id
```
response = get_dataset_from_id(dataset_id)
response = dcat_get_dataset_from_id(dataset_id)
```
Parameters:
......@@ -171,7 +223,7 @@ response = get_dataset_from_id(dataset_id)
4. Get the dataset with the corresponding identifier
```
response = get_dataset_from_identifier(identifier: str)
response = dcat_get_dataset_from_identifier(identifier: str)
```
Parameters:
......@@ -181,18 +233,18 @@ response = get_dataset_from_identifier(identifier: str)
5. Get all distributions for the dataset with the corresponding dataset identifier.
```
response = get_distributions_from_dataset_identifier(identifier)
response = dcat_get_distributions_from_dataset_identifier(identifier)
```
Parameters:
- identidier (str): dataset's identifier
Returns:
- response (dict): all distributions for the dataset with the corresponding dataset identifier
- response (dict): all distributions for the dataset with the corresponding identifier
### Distribution
#### Distribution
1. List all distributions
```
response = list_all_distributions()
response = dcat_list_all_distributions()
```
Returns:
......@@ -200,7 +252,7 @@ response = list_all_distributions()
2. Get the distribution with the corresponding id
```
response = get_distribution_from_id(distribution_id)
response = dcat_get_distribution_from_id(distribution_id)
```
Parameters:
......@@ -219,4 +271,4 @@ All the APIs made available in this library are also documented in Swagger UI sh
## Example
Examples for each API are provided in the notebook [examples.ipynb](https://renkulab.io/gitlab/pauline.maury-laribiere/meatadata-auto/-/blob/class_apis/fso_metadata/examples.ipynb).
Examples for each API are provided in the notebook [examples.ipynb](https://renkulab.io/gitlab/pauline.maury-laribiere/meatadata-auto/-/blob/class_apis/examples.ipynb).
\ No newline at end of file
from fso_metadata.api_call import (
get_agent_from_id,
dcat_get_agent_from_id,
dcat_get_dataset_from_id,
dcat_get_dataset_from_identifier,
dcat_get_distributions_from_dataset_id,
dcat_get_distributions_from_dataset_identifier,
dcat_get_distribution_from_id,
dcat_list_all_agents,
dcat_list_all_datasets,
dcat_list_all_distributions,
get_codelist,
get_content_creation,
get_dataset_from_id,
get_dataset_from_identifier,
get_distributions_from_dataset_id,
get_distributions_from_dataset_identifier,
get_distribution_from_id,
get_content_configuration,
get_data_structure,
get_dataset_description,
get_dataset_information,
get_identifier_content,
get_nomenclature_path_nodes,
get_nomenclature_one_level,
get_nomenclature_multiple_levels,
list_all_agents,
list_all_datasets,
list_all_distributions,
query_nomenclature
)
......@@ -75,7 +75,11 @@ def get_dataset_description(identifier: str, language: str = 'fr') -> dict:
Returns:
- response: description's dictionnary
'''
api = Api(api_type = 'dcat_dataset_description', _id = identifier, language = language)
api = Api(
api_type = 'dcat_dataset_description',
_id = identifier,
language = language
)
return api.api_call()
......@@ -89,7 +93,11 @@ def get_dataset_information(identifier: str, language: str = 'fr') -> dict:
Returns:
- response: information's dictionnary
'''
api = Api(api_type = 'dcat_dataset_information', _id = identifier, language = language)
api = Api(
api_type = 'dcat_dataset_information',
_id = identifier,
language = language
)
return api.api_call()
......@@ -151,7 +159,7 @@ def get_nomenclature_one_level(
- annotations (bool, default=False): flag to include annotations
Returns:
- response (pd.DataFrame): dataframe with 3 columns
(Code, Parent and Name in the selected language)
(Code, Parent and Name in the selected language)
'''
parameters = (
f'language={language}'
......@@ -187,7 +195,8 @@ def get_nomenclature_multiple_levels(
Available are 'fr', 'de', 'it', 'en'.
- annotations (bool, default=False): flag to include annotations
Returns:
- response (pd.DataFrame): dataframe columns from `level_from` to `level_to` codes
- response (pd.DataFrame): dataframe columns
from `level_from` to `level_to` codes
'''
# Call api with appropriate url and parameters
parameters = (
......@@ -205,7 +214,8 @@ def get_nomenclature_multiple_levels(
)
df = api.api_call()
# Post-processing: fill sub groups rows with parent group's values (instead of NaN)
# Post-processing:
# fill sub groups rows with parent group's values (instead of NaN)
group_columns = [*itertools.takewhile(lambda col: col != 'Code', df.columns)]
df[group_columns[0]] = df[group_columns[0]].fillna(method="ffill")
df[group_columns[1:]] = (
......@@ -297,7 +307,11 @@ def dcat_get_distributions_from_dataset_id(dataset_id: str):
Returns:
- response (dict): distributions for the dataset with dataset's id
'''
api = Api(api_type = 'dataset_id_distributions', _id = dataset_id, root_url = DCAT_URL)
api = Api(
api_type = 'dataset_id_distributions',
_id = dataset_id,
root_url = DCAT_URL
)
return api.api_call()
......@@ -331,9 +345,14 @@ def dcat_get_distributions_from_dataset_identifier(identifier: str):
Parameters:
- identidier (str): dataset's identifier
Returns:
- response (dict): all distributions for the dataset with the corresponding dataset identifier
- response (dict): all distributions for the dataset with
the corresponding dataset identifier
'''
api = Api(api_type = 'dataset_identifier_distributions', _id = identifier, root_url = DCAT_URL)
api = Api(
api_type = 'dataset_identifier_distributions',
_id = identifier,
root_url = DCAT_URL
)
return api.api_call()
......@@ -355,5 +374,9 @@ def dcat_get_distribution_from_id(distribution_id: str):
Returns:
- response (dict): the distribution
'''
api = Api(api_type = 'distribution_id', _id = 'distribution_id', root_url = DCAT_URL)
api = Api(
api_type = 'distribution_id',
_id = 'distribution_id',
root_url = DCAT_URL
)
return api.api_call()
from abc import ABC, abstractmethod
from typing import Type, Union
from typing import Union
import pandas as pd
......@@ -29,38 +28,60 @@ class Api():
self.parameters = parameters
self._id = _id
self.version_format = version_format
self.version = version_format
self.language = language
self.path = path
self.api_url = get_url(api_type, self)
def api_call(self) -> Union[dict, pd.DataFrame]:
api_function = OUTPUT_FUNCTION_MAPPING[self.export_format]
return api_function(f'{self.root_url}/api/{self.api_url}', self.parameters)
return api_function(
f'{self.root_url}/api/{self.api_url}',
self.parameters
)
def get_url(_type, self):
url_mapping = {
'codelist': f'CodeLists/{self._id}/exports/{self.export_format}/{self.version_format}',
'content_configuration': 'ContentConfigurations',
'content_configuration_identifier': f'ContentConfigurations/{self._id}',
'dcat_dataset_description': f'Datasets/{self._id}/{self.language}/description',
'dcat_dataset_information': f'Datasets/{self._id}/{self.language}/distributions',
'data_structure': f'DataStructures/{self._id}/{self.language}',
'nomenclature_path_nodes': f'Nomenclatures/Childnodes/{self._id}/{self.language}/{self.path}',
'nomenclature_one_level': f'Nomenclatures/{self._id}/levelexport/CSV',
'nomenclature_multiple_levels': f'Nomenclatures/{self._id}/multiplelevels/CSV',
'nomenclature_search': f'Nomenclatures/{self._id}/search',
'agents_list': 'Agent',
'agent_id': f'Agent/{self._id}',
'dataset_list': 'Dataset',
'dataset_id_distributions': f'Dataset/{self._id}/distributions',
'dataset_id': f'Dataset/{self._id}',
'dataset_identifier': f'Datataset/identifier/{self._id}',
'dataset_identifier_distributions': f'Datataset/identifier/{self._id}/distributions',
'distributions_list': f'Distribution',
'distribution_id': f'Distribution/{self._id}'
'codelist':
f'CodeLists/{self._id}/exports/{self.export_format}/{self.version}',
'content_configuration':
'ContentConfigurations',
'content_configuration_identifier':
f'ContentConfigurations/{self._id}',
'dcat_dataset_description':
f'Datasets/{self._id}/{self.language}/description',
'dcat_dataset_information':
f'Datasets/{self._id}/{self.language}/distributions',
'data_structure':
f'DataStructures/{self._id}/{self.language}',
'nomenclature_path_nodes':
f'Nomenclatures/Childnodes/{self._id}/{self.language}/{self.path}',
'nomenclature_one_level':
f'Nomenclatures/{self._id}/levelexport/CSV',
'nomenclature_multiple_levels':
f'Nomenclatures/{self._id}/multiplelevels/CSV',
'nomenclature_search':
f'Nomenclatures/{self._id}/search',
'agents_list':
'Agent',
'agent_id':
f'Agent/{self._id}',
'dataset_list':
'Dataset',
'dataset_id_distributions':
f'Dataset/{self._id}/distributions',
'dataset_id':
f'Dataset/{self._id}',
'dataset_identifier':
f'Datataset/identifier/{self._id}',
'dataset_identifier_distributions':
f'Datataset/identifier/{self._id}/distributions',
'distributions_list':
f'Distribution',
'distribution_id':
f'Distribution/{self._id}'
}
return url_mapping[_type]
%% Cell type:markdown id:fcad4b17 tags:
# Example notebook
In this notebook, we show one example per possible API call.
%% Cell type:code id:a529fab5-af2f-4439-b98e-13d814b00a94 tags:
``` python
#import fso_metadata
```
%% Cell type:code id:486bc684-f4a1-4b26-80ff-c156c51fdb97 tags:
``` python
from api_call import (
#from fso_metadata import (
dcat_get_agent_from_id,
dcat_get_dataset_from_id,
dcat_get_dataset_from_identifier,
dcat_get_distributions_from_dataset_id,
dcat_get_distributions_from_dataset_identifier,
dcat_get_distribution_from_id,
dcat_list_all_agents,
dcat_list_all_datasets,
dcat_list_all_distributions,
get_codelist,
get_content_configuration,
get_data_structure,
get_dataset_description,
get_dataset_information,
get_identifier_content,
get_nomenclature_path_nodes,
get_nomenclature_one_level,
get_nomenclature_multiple_levels,
query_nomenclature
)
```
%%%% Output: stream
/opt/conda/lib/python3.9/site-packages/pandasdmx/remote.py:11: RuntimeWarning: optional dependency requests_cache is not installed; cache options to Session() have no effect
warn(
%% Cell type:markdown id:94312182-0616-4938-8d82-d666611bf64d tags:
## Available everywhere with the interoperability plateform (i14y)
%% Cell type:markdown id:bdd766a5-c013-449c-9fd4-7356835396af tags:
[i14y Swagger UI](https://www.i14y.admin.ch/api/index.html)
%% Cell type:markdown id:446b07a4 tags:
### Code List
%% Cell type:code id:317c3e55 tags:
``` python
# Get a codelist pd.Serie based on an identifier
codelist = get_codelist(identifier='CL_HGDE_KT', export_format="SDMX-ML", version_format=2.1, annotations=False)
codelist.head(5)
```
%%%% Output: stream
https://www.i14y.admin.ch/api/CodeLists/CL_HGDE_KT/exports/SDMX-ML/2.1?annotations=false
%%%% Output: execute_result
CL_HGDE_KT
1 Zürich
10 Fribourg / Freiburg
11 Solothurn
12 Basel-Stadt
13 Basel-Landschaft
Name: Canton (2021-07-01), dtype: object
%% Cell type:markdown id:b0029468 tags:
### Content Creation
%% Cell type:code id:5bf187c8 tags:
``` python
# Get the display information for the available configured content
content = get_content_configuration()
content
```
%%%% Output: execute_result
[{'default': True,
'identifier': 'HCL_CH_ISCO_19_PROF',
'items': [],
'label': 'CH-ISCO-19',
'skipRoute': False}]
%% Cell type:code id:88581bea tags:
``` python
# Get a nomenclature information based on its identifier
identifier_content = get_identifier_content(identifier='HCL_CH_ISCO_19_PROF')
identifier_content
```
%%%% Output: execute_result
{'agencyIdentifier': 'FSO',
'controllerName': 'Nomenclatures',
'descriptionIdentifier': 'HCL_CH_ISCO_19_PROF',
'exportFormats': ['CSV', 'XLSX'],
'exportLanguages': ['de', 'fr', 'it', 'en', 'rm'],
'exportLevels': ['1', '2', '3', '4', '5', '6'],
'exportTypes': {'Single': 'levelexport', 'Multi': 'multilevels'},
'filters': [{'identifier': 'AF_ACTIVE', 'values': ['0', '1']},
{'identifier': 'AFC_ISCO_REDUCED_LIST', 'values': ['1']},
{'identifier': 'AFC_ISCO_DUPLICATE', 'values': ['0']},
{'identifier': 'AF_LEARNED_OR_PRACTICED', 'values': ['1', '2']},
{'identifier': 'AF_AVAM', 'values': ['1']}],
'hasAnnotations': True,
'identifier': 'HCL_CH_ISCO_19_PROF',
'type': 'Nomenclature'}
%% Cell type:markdown id:42d2d24c-3e0b-40b6-a4b3-5e409345b449 tags:
### Datasets
%% Cell type:code id:b1d312b6-4b66-41aa-8d2e-4441e803ba07 tags:
``` python
# Get the dcat dataset description
dataset_description = get_dataset_description(identifier='HCL_NOGA', language='de')
dataset_description['contactPoint'][0]
```
%%%% Output: execute_result
{'adrWork': {'cultureCode': 'de',
'text': "Unternehmensregisterdaten URD\nEspace de l'Europe 10\nCH-2010 Neuchâtel\nSchweiz"},
'child': 0,
'emailInternet': 'noga@bfs.admin.ch',
'fn': {'cultureCode': 'de', 'text': 'Bundesamt für Statistik'},
'note': {'cultureCode': 'de',
'text': 'Von Montag bis Freitag\n8.30–11.30 Uhr und 14.00–16.00 Uhr'},
'org': {'cultureCode': None, 'text': None},
'telWorkVoice': '+41 58 463 65 23'}
%% Cell type:code id:36e4a8c2-097f-4c1d-9dc3-b1c86f855ffa tags:
``` python
# Get the dcat dataset information
dataset_information = get_dataset_information(identifier='HCL_CH_ISCO_19_PROF', language='fr')
dataset_information[0]['accessUrl']
```
%%%% Output: execute_result
[{'href': 'https://www.i14y.admin.ch/api/nomenclatures/HCL_CH_ISCO_19_PROF/levelexport/XLSX?level=6&annotations=true',
'label': 'https://www.i14y.admin.ch/api/nomenclatures/HCL_CH_ISCO_19_PROF/levelexport/XLSX?level=6&annotations=true'}]
%% Cell type:markdown id:c945eee9-8908-4012-b022-af419d5999b9 tags:
### Data Structures
%% Cell type:code id:56e92700-881f-48af-81d4-1ed622b87400 tags:
``` python
# Get the data structure
data_structure = get_data_structure(identifier='HCL_CH_ISCO_19_PROF', language='it')
data_structure
```
%%%% Output: execute_result
{'type': 'https://httpstatuses.com/404',
'title': 'Not Found',
'status': 404,
'detail': 'DataStructure with type Nomenclature and identifiers HCL_CH_ISCO_19_PROF/HR_CH_ISCO_19_PROF is not supported.',
'traceId': '|f51d7ab8-4dd7c2b42061dbb5.'}
'traceId': '|f51d7ad7-4dd7c2b42061dbb5.'}
%% Cell type:markdown id:99f3ee98 tags:
### Nomenclature
%% Cell type:code id:ddd5eb9f tags:
``` python
# Get the nodes of a path within a nomenclature, add filters to get more specific results
filters = {'code': '1'} # TODO: ask what filters are and how they work
path_nodes = get_nomenclature_path_nodes(identifier='HCL_CH_ISCO_19_PROF', path='.', filters=filters, language='fr')
path_nodes[0]
```
%%%% Output: execute_result
{'annotations': [],
'code': '0',
'hasChilds': True,
'name': {'cultureCode': 'fr', 'text': 'Professions militaires'}}