-
Notifications
You must be signed in to change notification settings - Fork 33
feat: ingestor component for datasets #2040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat: ingestor component for datasets #2040
Conversation
…ent (SciCatProject#1673) * fix: optimize condition editing logic in DatasetsFilterSettingsComponent * if user creates duplicated condition, do nothing * add snackbar notification for duplicate condition in DatasetsFilterSettingsComponent * remove unused import * remove panelClass from snackBar * added e2e test for the change
* feat: add the new auth service to prepare for the new sdk * try to fix some ai-bot review suggestions * add the note for the good review suggestion from ai-bot * remove old sdk and adjust types against the new one * fix more types and issues against the new sdk * finalize type error fixes * remove prefix * add the new sdk generation script for local development * start fixing TODOs after newly generated sdk * fixed sdk local generation for linux * update the sdk package version and fix some more types * detect the OS and use the right current directory path * improve types and fix more TODOs * improve types and fix TODOs after backend improvements * finalize TODOs and FIXMEs fixes and type improvements with the new sdk * fix some sourcery-ai comments * fix some of the last TODOs * adapted sdk generation to unix environment * ignore the @SciCatProject that is generated with the sdk * start fixing tests with the new sdk * add needed stub classes and fix some more tests * continue fixing unit tests * try to fix e2e tests and revert some changes that need more attention for now * changes to just run the tests * use latest sdk * update package-lock file * fixing unit tests * fix more unit tests * continue fixing tests * update the sdk * fix last e2e test * fix thumbnail unit tests * revert some change * finalize fixing unit tests * revert the backend image changes after the tests pass * add some improvements in the mocked objects for unit tests based on ai bot suggestion * remove encodeURIComponent in the effects as it seems redundant * fix test files after some changes * try to use mock objects as much as possible * update the sdk version * update package-lock file * update the sdk to latest * BREAKING CHANGE: new sdk release --------- Co-authored-by: martintrajanovski <martin.trajanovski@gmail.com> Co-authored-by: Jay <b331998513@gmail.com>
…tails dashboard onDestroy
This commit rebases and squashes SciCatProject#1585 (7d2a872). It contains the following commits: - update job view to match release-jobs - update job schema and timestamp fields - update jobs-detail page - fix testing and linting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @sofyalaski, your pull request is larger than the review limit of 150000 diff characters
|
Great work putting this together @sofyalaski! |
sbliven
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sofyalaski, can you add some screenshots of the two options to the description? I think that would make it easier for people to review, since the ingestor isn't that easy to run yourself (yet).
We should also start working on user and operator docs for this feature.
I will leave off formally approving this, as I think that should be done by someone outside of OpenEM. It looks good though, and I'm excited to get this merged into main!
| @@ -0,0 +1,183 @@ | |||
| import { Injectable } from "@angular/core"; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually it is planned to publish the Ingestor SDK as a separate package. However, for the time being the API definition lives here in the frontend.
|
Here is an example of dataset ingestion from the main "Datasets" view |
|
And with the ingestor component connected on the backend. It's not configured properly, so the trsnafer hasn't succeeded Screen.Recording.2025-10-20.at.17.05.45.mov |
|
I ran it locally and it works great, thanks for the excellent work!
|
| @@ -0,0 +1,39 @@ | |||
| .stepper { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change this file to .scss?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
nitrosx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few observations and questions:
- why we have so many files for the ingestor page?
- why do we have so many custom renderers?
- is the ingestor service specific for your use case?
- I'm a little afraid to include the SDK here and not provide it as an external library?
- should we define an API for the ingestor service, so everybody can create their own if needed?
Unfortunately, I have hard time to provide a honest review on the code. I might be able to have a better understanding after a demo.
That said, I am not opposed to merge the PR into master, with the conditions that this feature is clearly marked as experimental and with the warning use at your own risk
package.json
Outdated
| "@jsonforms/angular": "^3.5.1", | ||
| "@jsonforms/angular-material": "^3.5.1", | ||
| "@jsonforms/core": "^3.5.1", | ||
| "@jsonforms/react": "^3.5.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we using a react package?
| @@ -0,0 +1,39 @@ | |||
| .stepper { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this file used for?
| "ingestorComponent": { | ||
| "ingestorEnabled": false, | ||
| "ingestorAutodiscoveryOptions": [ | ||
| { | ||
| "mailDomain": "university.ch", | ||
| "description": "University/facility of Choice", | ||
| "facilityBackend": "http://localhost:8888" | ||
| } | ||
| ] | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add the option to show the button on the datasets list here and , also, renamed it accordingly to the new options?
|
@sofyalaski and the CH team great work!!! |
|
Thanks for all your comments, @Junjiequan and @nitrosx! I will collect your comments here as either todo items (which I agree should be changed) or things I want to respond to. Minor things that should be added before merging
Bigger things that should have subsequent PRs
Yes, but this can be done after the current POC.
scientific metadata schema
Good idea. I suggest we open a follow-up issue for a configurable default scientificMetadataSchema URLs. The first item would probably be a general schema served from the backend.
The first page ('Dataset information') already should correspond to this, right? Or do you mean that you want a key/value schema for scientificMetadata, eg {
"$schema": "https://json-schema.org/draft/2020-12/schema"
"type": "object",
"additionalProperties": {
"anyOf": [
{ "type": "string" },
{ "type": "boolean" },
{ "type": "number" }
]
}
}
Comments/Responses
We will continue adding features, but I think it's ready to be merged. Since this PR is already reviewed, let's keep it together but try for more atomic PRs in the future.
Note that you do get the list of 'Last used facility backends' below this, and if your email matches one of the mailDomains in the configuration then the list will include that by default. We designed this way so that it would be possible for some labs or users to install the ingestor service locally and connect to it without registering centrally with scicat. However there are issues with this, eg due to SSL requirements and OIDC callbacks. We should discuss this again, and might want to go back to a drop-down as a clearer (if less flexible) user experience.
I though that other configurable action buttons would act on the selection. If so, it might make sense to visually distinguish selection-actions from non-selection-actions. However, configuring all of the frontend buttons consistently seems good. @sofyalaski Feel free to edit this comment and/or check off todos |





Description
This is a big PR that introduces two changes.
config.jsonfile by setting:addDatasetEnabled)Motivation
At PSI with OpenEM we have been working on a new ingestor backend that will allow data ingestion from sites different from the host of SciCat Catalog. This is represented by Point 1. An addition of the Ingestor backend repo into SciCatProject is planned as well.
Changes:
For point 2:
For point 1:
config.jsonchanges include this new object:The main option to turn off the component entirely is controlled by the
ingestorEnabledvalue. This will redirect call to ingestor to 404. When turned on, the ingestor component is available at/ingestor/with a link in the hamburger menu.ingestorAutodiscoveryOptionsis an optional argument and constitutes an array of available facilities running ingestor software.facilityBackendis a reachable backend of the ingestor service.mailDomainis used to match the email of logged-in user against themailDomainvalue as a regular expression and in case of success, automatically connect to the respective backend. A regular expression is used to connect to the email of form "staff.university.org" or similar.descriptionis optional, but in case of the match withmailDomainwill prefill the creationLocation property in the dataset schemaIngestor component ( when used with the backend ) looks similar to the Point 2 and represents a set of dialogs for SciCat dataset and scientific metadata ingestion, with most of the information prefilled. For this, it interacts with ingestor backend, which does all the hard work such as:
scientificMetadataTests included
Documentation
official documentation info
If you have updated the official documentation, please provide PR # and URL of the pages where the updates are included
Backend version
Ingestor backend:
https://github.com/SwissOpenEM/Ingestor