This notebook showcases how to create a deposit on a DLCM instance (such as Yareta or OLOS) using the official [DLCM API Python package](https://gitlab.unige.ch/dlcm/community/dlcm-python-api) to interact with a DLCM server.
You can install it using the command indicated at the top of the [package's pypi page](https://pypi.org/project/dlcm-api/). If you're using a jupyterhub instance administrated by the UNIGE, run the cell below, if no error message is launched, that means the package is already installed.
A deposit on DLCM requires several pre-existing information at minimum before being created:
* Its `Organizational Unit ID` that has defaults
*`Submission Policy ID`
*`Preservation Policy ID`
* One or several `Contributors ID`
And although not mandatory, it is also recommended to prepare your deposit with the following fields as well:
* A `License ID`
* A `Language ID`
Those are all identifiers of pre-existing entities that need to be retrieved from your DLCM's instance server, hence before dwelling in the actual Deposit creation process, we are going to show you how to use the DLCM python client in order to fetch all those IDs.
### Preparation: retrieve your access token
Some interactions we are going to make with DLCM's backend require authentication, which is possible with the python client using an _access token_.
To retrieve this token, go to your DLCM instance's home page, log in to your account, and then click on your account icon on the top right of the page. Select the "Token" submenu.
Your token is now saved in the variable `my_access_token`, as we are going to use this token several times through the notebook. Note that this token is only valid for 24 hours, you'll need to go back to your user profile if you need to interact with the backend through the `dlcmapi_client` library after that time span.
The first required ID relates to the Organizational Unit within which the data has to be deposited. The python library provides a function called `admin_authorized_organizational_units_get` which returns a collection of all the organizational units your user account has access to. This is an administration function, meaning you need to target the `Admin` module URL of your DLCM's instance. The list of DLCM modules' URLs can be accessed through the "about" button at the bottom right corner of your DLCM's instance home page:

Combined with the access token, this is sufficient to instantiate a `dlcmapi_client.Configuration` object that will be useful to interact with the administration backend of DLCM:
This configuration object is used to instantiate an ApiClient object, which in turns is what can help us produce and AdminApi object that is used to call the function `admin_authorized_organizational_units_get` :
As can be seen, the collection that is returned is quite voluminous and contains various fields relating to organizational units (Such as their default submission and preservation policies ID). In the cell below, we wrote a small python routine down below that will list all your organizational unit by name and ask you to select it. The routine will save the identifiers of the Organizational Unit, and its default preservation and submission policies in the three variables at the bottom of the cell:
print(f'\nOrganizational Unit "{orgunit_name}" with defaults submission policy "{submission_name}" and preservation policy "{preservation_name}" chosen')
Except for the contributors, similar methods (`admin_licenses_get`,`admin_languages_get`) can be used to retrieve the others identifiers, using the same configuration and API objects. To ease the extraction of data, we have defined below a function that produces a readable shortlist according to the object we want to list and the parameters we want to retrieve from those. The identifiers you are prompted to select will be saved to the variables at the bottom of the cell:
A similar process is used to retrieve the identifiers of contributors, except we are not targeting the administration module of DLCM, but its preingestion module. That means we need to instantiate a different configuration whom the preingestion module URL is fed to.
Go back to your modules list, and copy paste the preingestion URL **up until the last slash**:
We can reuse the previously defined `ids_and_vals_of_dlcm_resource` function giving it the `PreingestApi` API type, the function `preingest_contributors_get`, and targeting the `fullName` attribute of the contributor object.
Note the function's parameter `amount_of_results` is set to the default amount of results retrieved by the server which is 20. There is likely many more contributors than 20 registered in your DLCM instance, so we encourage you to change this parameter to a larger amount you see fit to increase the odd of finding the contributors you seek (note that the maximum value is 2000).
Similarly to previous examples, below is a routine that lists the contributors fetched from the previous data and ask you to select the ones you want to save:
This deposit object for now is an offline abstraction of an online DLCM deposit, it has several fields that can be set, such as its organizational unit, its title, its description and so forth. To see all the available fields of a Deposit object you can interact with, the following command can be used:
After choosing the right identifiers from previous listings, run the cell below to fill in the required values we saved onto the deposit object's fields (you can also directly add code in the cell for any other fields available in the deposit you'd like to be set). Likewise with the fetching of identifiers, setting the contributors in a deposit requires a different process explained later in the notebook.
To post the deposit, we reuse our `PreIngest` configuration to instantiate an `ApiClient`, which in turn is used to correctly instantiate a `PreingestApi` Object. This API object gives us access the the method `preingest_deposits_post` that can receive our deposit object previously prepared and post it to the server:
Notice how we retrieved the return value of `preingest_deposits_post`, in order to retrieve the `res_id` value it holds. This value is the unique identifier the server attributed to the deposit we just posted. It is going to be needed in order to add contributors to our deposit, the last mandatory step to complete the deposit.
### Adding contributors to a preexisting deposit
Adding contributors to a deposit is a similar process to the post of a deposit, except the function `preingest_deposits_contributors_post` is used, and it expects the deposit's ID and a list of contributors IDs as parameters. Run the cell below to post the contributors that were selected in the "Fetching Contributors ID" section of the notebook:
All the previous steps can easily be done manually through the web UI interface of your DLCM instance. Using the API package truly gets interesting and time beneficial once it is used to create a large number of deposits. To do so, using all the ids we extracted from the previous cell, we are going to create a "template" CSV file holding those ids, which you can then manually edit to specify specific title and description for each sub deposit. In a later cell, we are going to read that CSV to parse deposits data and create them online.
## Make multiple deposits on DLCM from data in a CSV
Customize the CSV file we created in the cell above so that each deposit has a title and a description. Feel free to reuse any of the of the several snippets of the code from previous cells if you need to fetch other types of identifiers or information you'd like to fill in the CSV. Then we can simply load the CSV with pandas (common python library for data treatment) iterate over each row, each time creating a brand new offline deposit, and then posting it online:
This notebook showcases how to create a deposit on a DLCM instance (such as Yareta or OLOS) using the official [DLCM API Python package](https://gitlab.unige.ch/dlcm/community/dlcm-python-api) to interact with a DLCM server.
You can install it using the command indicated at the top of the [package's pypi page](https://pypi.org/project/dlcm-api/). If you're using a jupyterhub instance administrated by the UNIGE, run the cell below, if no error message is launched, that means the package is already installed.
A deposit on DLCM requires several pre-existing information at minimum before being created:
* Its `Organizational Unit ID` that has defaults
*`Submission Policy ID`
*`Preservation Policy ID`
* One or several `Contributors ID`
And although not mandatory, it is also recommended to prepare your deposit with the following fields as well:
* A `License ID`
* A `Language ID`
Those are all identifiers of pre-existing entities that need to be retrieved from your DLCM's instance server, hence before dwelling in the actual Deposit creation process, we are going to show you how to use the DLCM python client in order to fetch all those IDs.
### Preparation: retrieve your access token
Some interactions we are going to make with DLCM's backend require authentication, which is possible with the python client using an _access token_.
To retrieve this token, go to your DLCM instance's home page, log in to your account, and then click on your account icon on the top right of the page. Select the "Token" submenu.
Your token is now saved in the variable `my_access_token`, as we are going to use this token several times through the notebook. Note that this token is only valid for 24 hours, you'll need to go back to your user profile if you need to interact with the backend through the `dlcmapi_client` library after that time span.
The first required ID relates to the Organizational Unit within which the data has to be deposited. The python library provides a function called `admin_authorized_organizational_units_get` which returns a collection of all the organizational units your user account has access to. This is an administration function, meaning you need to target the `Admin` module URL of your DLCM's instance. The list of DLCM modules' URLs can be accessed through the "about" button at the bottom right corner of your DLCM's instance home page:

Combined with the access token, this is sufficient to instantiate a `dlcmapi_client.Configuration` object that will be useful to interact with the administration backend of DLCM:
This configuration object is used to instantiate an ApiClient object, which in turns is what can help us produce and AdminApi object that is used to call the function `admin_authorized_organizational_units_get` :
As can be seen, the collection that is returned is quite voluminous and contains various fields relating to organizational units (Such as their default submission and preservation policies ID). In the cell below, we wrote a small python routine down below that will list all your organizational unit by name and ask you to select it. The routine will save the identifiers of the Organizational Unit, and its default preservation and submission policies in the three variables at the bottom of the cell:
print(f'\nOrganizational Unit "{orgunit_name}" with defaults submission policy "{submission_name}" and preservation policy "{preservation_name}" chosen')
Except for the contributors, similar methods (`admin_licenses_get`,`admin_languages_get`) can be used to retrieve the others identifiers, using the same configuration and API objects. To ease the extraction of data, we have defined below a function that produces a readable shortlist according to the object we want to list and the parameters we want to retrieve from those. The identifiers you are prompted to select will be saved to the variables at the bottom of the cell:
A similar process is used to retrieve the identifiers of contributors, except we are not targeting the administration module of DLCM, but its preingestion module. That means we need to instantiate a different configuration whom the preingestion module URL is fed to.
Go back to your modules list, and copy paste the preingestion URL **up until the last slash**:
We can reuse the previously defined `ids_and_vals_of_dlcm_resource` function giving it the `PreingestApi` API type, the function `preingest_contributors_get`, and targeting the `fullName` attribute of the contributor object.
Note the function's parameter `amount_of_results` is set to the default amount of results retrieved by the server which is 20. There is likely many more contributors than 20 registered in your DLCM instance, so we encourage you to change this parameter to a larger amount you see fit to increase the odd of finding the contributors you seek (note that the maximum value is 2000).
Similarly to previous examples, below is a routine that lists the contributors fetched from the previous data and ask you to select the ones you want to save:
This deposit object for now is an offline abstraction of an online DLCM deposit, it has several fields that can be set, such as its organizational unit, its title, its description and so forth. To see all the available fields of a Deposit object you can interact with, the following command can be used:
After choosing the right identifiers from previous listings, run the cell below to fill in the required values we saved onto the deposit object's fields (you can also directly add code in the cell for any other fields available in the deposit you'd like to be set). Likewise with the fetching of identifiers, setting the contributors in a deposit requires a different process explained later in the notebook.
To post the deposit, we reuse our `PreIngest` configuration to instantiate an `ApiClient`, which in turn is used to correctly instantiate a `PreingestApi` Object. This API object gives us access the the method `preingest_deposits_post` that can receive our deposit object previously prepared and post it to the server:
Notice how we retrieved the return value of `preingest_deposits_post`, in order to retrieve the `res_id` value it holds. This value is the unique identifier the server attributed to the deposit we just posted. It is going to be needed in order to add contributors to our deposit, the last mandatory step to complete the deposit.
### Adding contributors to a preexisting deposit
Adding contributors to a deposit is a similar process to the post of a deposit, except the function `preingest_deposits_contributors_post` is used, and it expects the deposit's ID and a list of contributors IDs as parameters. Run the cell below to post the contributors that were selected in the "Fetching Contributors ID" section of the notebook:
All the previous steps can easily be done manually through the web UI interface of your DLCM instance. Using the API package truly gets interesting and time beneficial once it is used to create a large number of deposits. To do so, using all the ids we extracted from the previous cell, we are going to create a "template" CSV file holding those ids, which you can then manually edit to specify specific title and description for each sub deposit. In a later cell, we are going to read that CSV to parse deposits data and create them online.
## Make multiple deposits on DLCM from data in a CSV
Customize the CSV file we created in the cell above so that each deposit has a title and a description. Feel free to reuse any of the of the several snippets of the code from previous cells if you need to fetch other types of identifiers or information you'd like to fill in the CSV. Then we can simply load the CSV with pandas (common python library for data treatment) iterate over each row, each time creating a brand new offline deposit, and then posting it online: