Troubleshooting Spreadsheet Ingests
Last Updated, May 22, 2023.
This page includes information on commonly encountered issues when ingesting a spreadsheet and how to resolve these errors. In Modern Islandora there is a significant improvement in the detail provided for errors that you may encounter during spreadsheet ingest as it often provides the field name and/or even row number to help you identify where the issue was encountered during the ingest process. When there are significant errors, the system will either not ingest the content at all or will ingest some of the metadata record information with the status of those items being unpublished.
Have an error you can't resolve or need a second set of eyes to help determine what the issue is? Message Michael Howser on the CTDA slack channel and provide the link to the migration group and he will look into what might be causing the error.
Dates
In Modern Islandora, the dates are in EDTF format and validated so you may encounter some messages from your ingests if the system receives a date which is not formatted correctly or is populated to the wrong field.
In the following example, for Create nodes from rows in the csv migration for ID2 we received the following message:
explode: DateTime::__set_state(array( 'date' => '1899-12-30 00:00:43.000000', 'timezone_type' => 3, 'timezone' => 'America/New_York', )) is not a string
This message indicates an issue with ID2 in our spreadsheet ingest with a significant issue with DateTime. Let's review the spreadsheet to determine what the issue may be causing this failed message.
As we can see in the column origin_information, ID2 cell Y3 in the above example includes the data in the correct EDTF format but the date is not populated into the correct subfield within origin_information. As we want to populate this as the date created value, the entry in cell Y3 should be ||2020-08-12. The spreadsheet instead was trying to populate this date value into the event_type field which is not a date field, hence the explode error.
Digital File
The path to the object you want to ingest in the CTDA sandbox is prepended with a the path /opt/ingest_data/csv_image_upload/ followed by the name of the file. This path name is only valid if the file has already been uploaded via SFTP by the CTDA on your behalf in the sandbox environment. If you have files you want to test ingesting via spreadsheet, just slack Michael Howser to have these uploaded so you can test.
In the following example in Messages of Create files from rows in the csv we have two messages for ID2 which are as follows "Destination Full Path: repo-bin://2023-02/095A8516.mp4" and "file_copy: File '/opt/ingest_data/csv_image_upload/095A8516.mp4' does not exist" which both result in a Failed status.
Let's look at the spreadsheet to review these messages. For ID2 in the digital_file column (cell I3) we see a file path populated which looks correct at first glance.
The file path is correct, but in this case the actual file name is incorrect as the extension .mp4 on the original file I was uploaded was actually .MP4. The file extensions as well as the file names and paths are case sensitive so it's a good idea to double check these are correct if you see messages like the ones illustrated above when ingesting content.
Explode
In Modern Islandora, the system provides strict validation of fields and the type of information expected for a field. If there is a text value entered in a field where the system expects a number, or a number not formatted as the system expects, the explode error message will appear in the create nodes from rows in the CSV error message for your spreadsheet ingest.
In the following example, in the Create nodes from rows in the csv migration ID1 and ID2 have failed status messages which indicate explode: 10426925 is not a string.
To troubleshoot this, let's look for this number in the spreadsheet that was ingested. As we can see when viewing the spreadsheet in Microsoft Excel, there is a green triangle in the top left corner of this cell, indicating we have a number entered in the field, and if we hover over the cell, a yellow exclamation alert will appear.
If we click on the yellow exclamation mark, we can change this field to a number by selecting the Convert to Number option. If you have multiple cells with this same issue in the column, highlight the cells to update and then click the yellow exclamation mark to update all the text to numbers in these cells.
Now you can re-upload and re-ingest the spreadsheet and this issue will be resolved.
---
Another example of an explode message you may encounter is a subdelimited_explode message in the Create nodes from rows in the csv migration ID1 has a failed status messages which indicates subdelimited_explode: The value provided to the subdelimited_explode must be a string; actual provided value: DateTime::.. which provides a clue that in a field there is a value that is entered is not what is expected for the field.
Let's take a look at the spreadsheet to see what may have caused this message. When looking at the spreadsheet in the origin_information column for ID1 we discover that the data value is listed and looks to be in EDTF format, but this field has multiple subfields this message clearly shows the information in this field is not populating to the correct field.
Upon closer examination we can see that the date created field value in the spreadsheet is populating to the first subdelimited field for this column, which is actually event type which would not be a date value. To resolve this issue, we will need to use the pipe | subdelimiter to populate this value to the data created subfield, The correct entry for the spreadsheet would be ||2021-06-11 to add the date created to the ingest.
Now we can save and re-ingest the spreadsheet and this message will be resolved.
Model & Bundle
In Modern Islandora, the system strictly validates many more fields than in the Islandora 7.x and this section provides some common errors you may encounter related to model and 'bundle' related issues.
In the following example, the message for the migration Create image media from rows in the csv. indicates as follows: No static mapping found for ''Paged Content'' and no default value provided for destination 'bundle'
When reviewing the ingest spreadsheet, the message indicates that the issue is with ID 1 (cell A2) we see the model was entered as Paged Content which is correct but I accidentally pasted a value for a digital_file which should be empty for Paged Content.
To fix this, the digital_file cell I2 in the example above for the Paged Content ensure this cell remains blank.
If you receive a message related to 'bundle' take a look at the row the message indicates, review the model, and determine if the digital_file is missing or should not be included for that content model.
The following message of the Create image media from rows in the csv "No static mapping found for ''Digital Document'' and no default value provided for destination 'bundle'." indicates
In this example the issue is the model "Document" is not a valid value for this field but rather it should be "Digital Document" for the model.
Notes
To review errors after ingesting a spreadsheet from the Islandora Spreadsheet Ingest page click View and then click on the Migration Group Name for your ingest. This will display the Migrations page which provides a list of important information and messages related to the spreadsheet you ingested. A good place to start is to look for any messages that may have been created from the ingest process. We will click on the number 4 in this example as this corresponds to the Messages column.
Next, we will review these messages
Based on the messages above, it is indicating the issue is with item ID2 in our spreadsheet and the message indications field_note_paragraph giving us a clue the issue is with the notes field and type note validation indicates the value is not valid or missing for the type note.
When we look at the original spreadsheet we ingested we see the following:
From above we can see there is no indication as to the type note for the information entered. To correct this we need to add one of the valid type note fields with the appropriate delimiter.
Below is the corrected version of this field in the spreadsheet that once updated we can conduct a new ingest and the messages will no longer appear and the item will ingest.
Reformatting Quality
Each group, collection (parent), and object within the repository has a node associated with it. The following is an example of an error which indicates an issue with the node number. The message of the Create nodes from rows in the csv states "[node: 504, revision: 1100]: field_reformatting_quality.0=The value you selected is not a valid choice.".
As the message indicates the field_reformatting_quality is not a valid choice, let's look at the spreadsheet to examine this further. As the spreadsheet image indicates below, for ID1 the value entered in the column reformatting_quality cell AC2 the valid value for this field is "access" as the fields are case sensitive. Once this is corrected the error message will be resolved. Valid values for this field are: access preservation replacement