Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found
Select Git revision
  • main
1 result

Target

Select target project
  • dlcm/community/dlcm-jupyter-notebooks
1 result
Select Git revision
  • main
1 result
Show changes
Commits on Source (2)
%% Cell type:markdown id:1d566082-9af5-487f-ae8c-fcb74f922e1c tags:
## Make a single deposit on DLCM
This notebook showcases how to create a deposit on a DLCM instance (such as Yareta or OLOS) using the official [DLCM API Python package](https://gitlab.unige.ch/dlcm/community/dlcm-python-api) to interact with a DLCM server.
You can install it using the command indicated at the top of the [package's pypi page](https://pypi.org/project/dlcm-api/). If you're using a jupyterhub instance administrated by the UNIGE, run the cell below, if no error message is launched, that means the package is already installed.
%% Cell type:code id:ab50530a-91cd-4dc4-b74e-2fad47d71b8b tags:
``` python
import dlcmapi_client
```
%% Cell type:markdown id:317a6f86-667b-4a66-a57e-34bf03158931 tags:
A deposit on DLCM requires several pre-existing information at minimum before being created:
* Its `Organizational Unit ID` that has defaults
* `Submission Policy ID`
* `Preservation Policy ID`
* One or several `Contributors ID`
And although not mandatory, it is also recommended to prepare your deposit with the following fields as well:
* A `License ID`
* A `Language ID`
Those are all identifiers of pre-existing entities that need to be retrieved from your DLCM's instance server, hence before dwelling in the actual Deposit creation process, we are going to show you how to use the DLCM python client in order to fetch all those IDs.
### Preparation: retrieve your access token
Some interactions we are going to make with DLCM's backend require authentication, which is possible with the python client using an _access token_.
To retrieve this token, go to your DLCM instance's home page, log in to your account, and then click on your account icon on the top right of the page. Select the "Token" submenu.
<center><img src="imgs/dlcm_token_select.png" alt="dlcm-token-select" style="width:500px;"></center>
![dlcm-token-select](imgs/dlcm_token_select.png "Select your token")
In the resulting pop-up click on the "Copy to clipboard" button to retrieve the token's value.
<center><img src="imgs/dlcm_copy_token.png" alt="dlcm-token-copy" style="width:500px;"></center>
![dlcm-token-copy](imgs/dlcm_copy_token.png "Copy sour token")
Then run the cell below and once prompted, paste it in the form and hit the ENTER key.
%% Cell type:code id:84022790-7590-46cd-93e6-505a0326c651 tags:
``` python
import getpass
my_access_token = getpass.getpass("User token?")
```
%% Cell type:markdown id:32950a4d-8aa4-4a7b-853f-446ef8108130 tags:
Your token is now saved in the variable `my_access_token`, as we are going to use this token several times through the notebook. Note that this token is only valid for 24 hours, you'll need to go back to your user profile if you need to interact with the backend through the `dlcmapi_client` library after that time span.
%% Cell type:markdown id:0a6ef378-c1bb-4af2-9926-808830b68909 tags:
### Finding your Organizational Unit Id
The first required ID relates to the Organizational Unit within which the data has to be deposited. The python library provides a function called `admin_authorized_organizational_units_get` which returns a collection of all the organizational units your user account has access to. This is an administration function, meaning you need to target the `Admin` module URL of your DLCM's instance. The list of DLCM modules' URLs can be accessed through the "about" button at the bottom right corner of your DLCM's instance home page:
<center><img src="imgs/dlcm_about_button.png" alt="dlcm-about-button" style="width:800px;"></center>
![dlcm-about-button](imgs/dlcm_about_button.png "DLCM About Button")
Loading the following pages shows all the different modules, identify the one called admin and copy the URL **up until the last slash**:
<center><img src="imgs/dlcm_admin_modules_list.png" alt="dlcm-modules-list" style="width:750px;"></center>
![dlcm-modules-list](imgs/dlcm_admin_modules_list.png "DLCM modules you can target")
Combined with the access token, this is sufficient to instantiate a `dlcmapi_client.Configuration` object that will be useful to interact with the administration backend of DLCM:
%% Cell type:code id:3096f6f4-11cf-4a5e-acf3-e693a0c5d6d7 tags:
``` python
import dlcmapi_client
admin_url = 'https://sandbox.dlcm.ch/administration'
admin_conf = dlcmapi_client.Configuration(admin_url)
admin_conf.access_token = my_access_token
```
%% Cell type:markdown id:c448adbf-c051-473e-b9f0-1955f3de826f tags:
This configuration object is used to instantiate an ApiClient object, which in turns is what can help us produce and AdminApi object that is used to call the function `admin_authorized_organizational_units_get` :
%% Cell type:code id:50c48fd6-0261-42e7-8a9e-00c9b0b65f54 tags:
``` python
with dlcmapi_client.ApiClient(admin_conf) as api_client:
admin_api = dlcmapi_client.AdminApi(api_client)
admin_data = admin_api.admin_authorized_organizational_units_get(size=20)
admin_data.to_dict()['data'][:2] #showing only the first two values to avoid the notebook to become too cluttered.
```
%% Cell type:markdown id:960071e3-96e8-48e2-85a6-9b7358e2c0a1 tags:
As can be seen, the collection that is returned is quite voluminous and contains various fields relating to organizational units (Such as their default submission and preservation policies ID). In the cell below, we wrote a small python routine down below that will list all your organizational unit by name and ask you to select it. The routine will save the identifiers of the Organizational Unit, and its default preservation and submission policies in the three variables at the bottom of the cell:
%% Cell type:code id:a7aeb468-6d9c-4f6c-8cd8-317864aeab7d tags:
``` python
data = admin_data.to_dict()['data']
print('\033[1mList of your Organizational Units:\033[0m')
for i, elem in enumerate(data):
print(f'{"%2.2d"%i} -> {elem["name"]}')
print()
sel = input('Which organizational unit would you like to deposit data in?')
sel = input('Which organizational unit would you like to deposit data in (enter its index number located on the left of the arrows)?')
while not sel.isnumeric() or int(sel) < 0 or int(sel) >= len(data):
sel = input(f'Please enter a positive integer below {len(data)}')
data_selected = data[int(sel)]
#saving orgunit, preservation and submission policies identifiers in dedicated variables
orgunit_id, orgunit_name = data_selected['resId'], data_selected['name']
preservation_id, preservation_name = data_selected['defaultPreservationPolicy']['resId'],data_selected['defaultPreservationPolicy']['name']
submission_id, submission_name = data_selected['defaultSubmissionPolicy']['resId'],data_selected['defaultSubmissionPolicy']['name']
print(f'\nOrganizational Unit "{orgunit_name}" with defaults submission policy "{submission_name}" and preservation policy "{preservation_name}" chosen')
```
%% Cell type:markdown id:c7a8be94-3e10-4912-987e-adaece8f31af tags:
### Finding the other IDs
Except for the contributors, similar methods (`admin_licenses_get`,`admin_languages_get`) can be used to retrieve the others identifiers, using the same configuration and API objects. To ease the extraction of data, we have defined below a function that produces a readable shortlist according to the object we want to list and the parameters we want to retrieve from those. The identifiers you are prompted to select will be saved to the variables at the bottom of the cell:
%% Cell type:code id:e8c41bb3-4800-40cb-9b88-e9c7f5c056cf tags:
``` python
from typing import Dict
def ids_and_vals_of_dlcm_resource(conf: dlcmapi_client.Configuration,
ApiType: type,
resource_get_func_name: str,
val: str,
amount_of_results: int = 20,
) -> Dict[str,str]:
with dlcmapi_client.ApiClient(conf) as api_client:
curr_api = ApiType(api_client)
resource_get_function = getattr(curr_api, resource_get_func_name)
data = resource_get_function(size=amount_of_results)
return {el['resId']: el[val] for el in data.to_dict()['data']}
data_fetched = []
for func, val in {"admin_licenses_get": "title",
"admin_languages_get": "code"}.items():
resource_name = func.replace('admin_', '').replace('_get','')
#pretty printing the correspondance between id and the value to extract from the current resource to be listed
print("\n\033[1m%s" % (resource_name + " list"))
ids_vals = ids_and_vals_of_dlcm_resource(admin_conf, dlcmapi_client.AdminApi, func, val).items()
for i, (identifier, val) in enumerate(ids_vals):
print('\033[0m%2.2d'%i, ' -> ', val)
sel = input(f"Which {resource_name} would you like to attribute to your deposit?")
sel = input(f"Which {resource_name} would you like to attribute to your deposit (enter its index number located on the left of the arrows)?")
while not sel.isnumeric() or int(sel) < 0 or int(sel) >= len(ids_vals):
sel = input(f'Please enter a positive integer below {len(ids_vals)}')
data_selected = list(ids_vals)[int(sel)]
print(f'"{data_selected[1]}" selected.')
data_fetched.append(data_selected[0])
#saving the two identifiers retrieved in dedicated variables for later use.
license_id, language_id = data_fetched
```
%% Cell type:markdown id:b3940b4d-2276-4757-9630-629ed513a066 tags:
### Fetching Contributors IDs
A similar process is used to retrieve the identifiers of contributors, except we are not targeting the administration module of DLCM, but its preingestion module. That means we need to instantiate a different configuration whom the preingestion module URL is fed to.
Go back to your modules list, and copy paste the preingestion URL **up until the last slash**:
<center><img src="imgs/dlcm_preingest_modules_list.png" alt="dlcm-modules-list" style="width:750px;"></center>
![dlcm-modules-list-preing](imgs/dlcm_preingest_modules_list.png "DLCM modules with preingest highlighted")
Create a new configuration object using this URL and reusing your access token:
%% Cell type:code id:5afd8280-ace5-465d-ae64-ee6d2a97772f tags:
``` python
preingest_conf = dlcmapi_client.Configuration('https://sandbox.dlcm.ch/ingestion')
preingest_conf.access_token = my_access_token
```
%% Cell type:markdown id:6cfe8a68-1295-46f0-8cd7-420eb0b7e5fb tags:
We can reuse the previously defined `ids_and_vals_of_dlcm_resource` function giving it the `PreingestApi` API type, the function `preingest_contributors_get`, and targeting the `fullName` attribute of the contributor object.
%% Cell type:code id:b15c9b39-2eb4-4fb7-9480-d0caa41f633b tags:
``` python
contributor_data = ids_and_vals_of_dlcm_resource(preingest_conf, dlcmapi_client.PreingestApi, 'preingest_contributors_get', 'fullName', amount_of_results=20)
contributor_data
```
%% Cell type:markdown id:7dc1bcf4-f006-441b-bfb2-7a57d68494ec tags:
Note the function's parameter `amount_of_results` is set to the default amount of results retrieved by the server which is 20. There is likely many more contributors than 20 registered in your DLCM instance, so we encourage you to change this parameter to a larger amount you see fit to increase the odd of finding the contributors you seek (note that the maximum value is 2000).
Similarly to previous examples, below is a routine that lists the contributors fetched from the previous data and ask you to select the ones you want to save:
%% Cell type:code id:6163d2e6-a4e4-4266-9864-52bc53e0ddfe tags:
``` python
number_of_contributors = int(input("How many contributors you'd like to add?"))
while number_of_contributors < 1:
number_of_contributors = int(input("Enter a positive number above 1:"))
contributors_list = []
cont_ids_name = list(contributor_data.items())
#printing it once before asking to enter all the select
for i, (identifier, val) in enumerate(cont_ids_name):
print('\033[0m%2.2d'%i, ' -> ', val)
for n in range(number_of_contributors):
sel = input(f"Which person would you like to set as contributor #{n+1} to your deposit? (enter its select. number left of the arrow) ")
while not sel.isnumeric() or int(sel) < 0 or int(sel) >= len(cont_ids_name):
sel = input(f'Please enter a positive integer below {len(cont_ids_name)}')
contributors_list.append(cont_ids_name[int(sel)])
print('\nContributors chosen:')
for i, (_, contributor_name) in enumerate(contributors_list):
print(f'Contributor #{i+1}: {contributor_name}')
```
%% Cell type:markdown id:f2a3b294-3b53-4d62-ad7a-0f747dffbd7e tags:
### Create and fill in an _offline_ Deposit
A deposit object can be instantiated using the library this way:
%% Cell type:code id:411493d1-f867-46ef-b655-9753fd939466 tags:
``` python
import dlcmapi_client
deposit = dlcmapi_client.Deposit()
```
%% Cell type:markdown id:8063381c-2568-4ebb-98db-95c0f1729ae7 tags:
This deposit object for now is an offline abstraction of an online DLCM deposit, it has several fields that can be set, such as its organizational unit, its title, its description and so forth. To see all the available fields of a Deposit object you can interact with, the following command can be used:
%% Cell type:code id:7352fe6c-fb42-4731-b442-be7cece7ab62 tags:
``` python
[attr for attr in dir(deposit) if not attr.startswith('_')] #removing some uninteresting default methods that starts with "_"
```
%% Cell type:markdown id:61acd91b-4d7e-4f1a-b035-69e8e0dd96e2 tags:
After choosing the right identifiers from previous listings, run the cell below to fill in the required values we saved onto the deposit object's fields (you can also directly add code in the cell for any other fields available in the deposit you'd like to be set). Likewise with the fetching of identifiers, setting the contributors in a deposit requires a different process explained later in the notebook.
%% Cell type:code id:931582e1-7059-4dbf-9810-384c220f162f tags:
``` python
#can be set to whatever title and description fits the kind of deposit you'd like to do
deposit.title = input("Enter the title of your deposit:")
deposit.description = input("Enter the description of your deposit:")
#madatory IDs to be set
deposit.organizational_unit_id = orgunit_id
deposit.submission_policy_id = submission_id
deposit.preservation_policy_id = preservation_id
#optional IDs
deposit.license_id = license_id
deposit.language_id = language_id
#checking the values have been correctly set in the object:
print(deposit)
```
%% Cell type:markdown id:0ad2071d-9ac2-43c9-9511-847ccafa0d08 tags:
### Post a deposit online
To post the deposit, we reuse our `PreIngest` configuration to instantiate an `ApiClient`, which in turn is used to correctly instantiate a `PreingestApi` Object. This API object gives us access the the method `preingest_deposits_post` that can receive our deposit object previously prepared and post it to the server:
%% Cell type:code id:893495c9-be7c-4414-9ff3-01377027e1b5 tags:
``` python
with dlcmapi_client.ApiClient(preingest_conf) as api_client:
try:
preingest_api = dlcmapi_client.PreingestApi(api_client)
preingest_api.api_client.client_side_validation = False
res = preingest_api.preingest_deposits_post(deposit=deposit)
deposit_id = res.to_dict()['res_id']
except dlcmapi_client.ApiException as e:
print("Exception when calling method: %s\n" % e)
deposit_id
```
%% Cell type:markdown id:64b2adb4-495c-4fc0-ad25-d733bef39dbb tags:
Notice how we retrieved the return value of `preingest_deposits_post`, in order to retrieve the `res_id` value it holds. This value is the unique identifier the server attributed to the deposit we just posted. It is going to be needed in order to add contributors to our deposit, the last mandatory step to complete the deposit.
### Adding contributors to a preexisting deposit
Adding contributors to a deposit is a similar process to the post of a deposit, except the function `preingest_deposits_contributors_post` is used, and it expects the deposit's ID and a list of contributors IDs as parameters. Run the cell below to post the contributors that were selected in the "Fetching Contributors ID" section of the notebook:
%% Cell type:code id:25894453-1b87-4691-9c77-2c0411bde379 tags:
``` python
contributors_id = [i for i,v in contributors_list]
with dlcmapi_client.ApiClient(preingest_conf) as api_client:
try:
preingest_api = dlcmapi_client.PreingestApi(api_client)
preingest_api.api_client.client_side_validation = False
preingest_api.preingest_deposits_contributors_post(deposit_id=deposit_id, contributors_list=contributors_id)
except dlcmapi_client.ApiException as e:
print("Exception when calling method: %s\n" % e)
```
%% Cell type:markdown id:d0b5e84d-9059-4910-8d18-8de0229a126b tags:
You can go back to your preservation space on your DLCM's instance to check the deposit has correctly been created and set up.
%% Cell type:markdown id:f7a1fa88-3c6f-4009-aeee-26642a2b49a9 tags:
### Reserving a DOI (Experimental)
%% Cell type:code id:5480bc3e-216d-47ca-8f96-25a9f592ceb8 tags:
``` python
import dlcmapi_client
import requests
import json
def reserve_doi_for_deposit(preingest_conf: dlcmapi_client.Configuration, deposit_id: str)->str:
'''
Reserve and return a DOI for the deposit indicated by id "deposit_id" within the preingestion module configured in the parmater "preingest_conf"
'''
target_url = f'{preingest_conf.host}/preingest/deposits/{deposit_id}/reserve-doi'
headers = {"Authorization": f"Bearer {preingest_conf.access_token}"}
res = requests.post(target_url, headers=headers)
if res.status_code != 200:
raise Exception(f'Could not reserve DOI for deposits "{deposit_id}" in module "{preingest_conf}", received error code "{res.status_code}"')
else:
return json.loads(r.content.decode('utf-8'))['doi']
doi = reserve_doi_for_deposit(preingest_conf, deposit_id)
```
%% Cell type:markdown id:7036ceb9-2b67-4dbc-9114-58600eb5f206 tags:
## Prepare a CSV Template for batch deposits
All the previous steps can easily be done manually through the web UI interface of your DLCM instance. Using the API package truly gets interesting and time beneficial once it is used to create a large number of deposits. To do so, using all the ids we extracted from the previous cell, we are going to create a "template" CSV file holding those ids, which you can then manually edit to specify specific title and description for each sub deposit. In a later cell, we are going to read that CSV to parse deposits data and create them online.
%% Cell type:code id:b497cd72-2a60-4cb3-b2c5-3e5093185f20 tags:
``` python
import pandas as pd
if not contributors_id:
contributors_id = [i for i,v in contributors_list]
nmb_entries = 3
data = {
'Title': ['']*nmb_entries,
'Description': ['']*nmb_entries,
'OrgUnitId': [orgunit_id]*nmb_entries,
'PreservationPolicyId': [preservation_id]*nmb_entries,
'SubmisionPolicyId': [submission_id]*nmb_entries,
'LicenseId': [license_id]*nmb_entries,
'LanguageId': [language_id]*nmb_entries,
'ContributorsIds': [','.join(contributors_id)]*nmb_entries,
}
df = pd.DataFrame(data)
df.to_csv('MyDLCMTemplateForBatchUpload.csv', index=False)
df
```
%% Cell type:markdown id:9791aa22-f263-42c8-9ade-37de1865a0cb tags:
## Make multiple deposits on DLCM from data in a CSV
Customize the CSV file we created in the cell above so that each deposit has a title and a description. Feel free to reuse any of the of the several snippets of the code from previous cells if you need to fetch other types of identifiers or information you'd like to fill in the CSV. Then we can simply load the CSV with pandas (common python library for data treatment) iterate over each row, each time creating a brand new offline deposit, and then posting it online:
%% Cell type:code id:629ed8f7-49b3-4d47-b970-4b73e30db4b3 tags:
``` python
for idx, row in pd.read_csv('MyDLCMTemplateForBatchUpload.csv').iterrows():
for idx, row in pd.read_csv('MyDLCMTemplateForBatchUpload2.csv', sep=';').iterrows():
#another way to initialize data in the Deposit, pass them to the Deposit constructor directly!
curr_deposit = dlcmapi_client.Deposit(
title = row['Title'],
description = row['Description'],
organizational_unit_id = row['OrgUnitId'],
submission_policy_id = row['SubmisionPolicyId'],
preservation_policy_id = row['PreservationPolicyId'],
license_id = row['LicenseId'],
language_id = row['LanguageId'],
)
#Posting the deposit and the contributors using the same api client and preingest api to save some lignes of codes
with dlcmapi_client.ApiClient(preingest_conf) as api_client:
try:
preingest_api = dlcmapi_client.PreingestApi(api_client)
preingest_api.api_client.client_side_validation = False
res = preingest_api.preingest_deposits_post(deposit=curr_deposit)
deposit_id = res.to_dict()['res_id']
preingest_api.preingest_deposits_contributors_post(deposit_id=deposit_id, contributors_list=row['ContributorsIds'].split(','))
deposit_url = f'{preingest_conf.host.replace("ingestion", "")}deposit/{row["OrgUnitId"]}/detail/{deposit_id}'
print(f'Deposit #{idx} "({row["Title"]})" successfully posted online, you can consult it at {deposit_url}')
except dlcmapi_client.ApiException as e:
print("Exception when trying to post deposit #%d while calling method: %s\n" % (idx, e))
```
......