New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UNETR: Transformers for 3D Medical Image Segmentation #17309
Comments
Hello. What is the status of the implementation? I would like to contribute to it. |
Hey @Puranjay-del-Mishra, to the best of my knowledge nobody has started working on it. We'd be very happy for you to take a stab at adding it! You can follow the tutorial here: adding a new model. We especially recommend following the If you have not contributed to transformers yet, we also recommend reading the contributing guide. |
Sure! @LysandreJik |
Hey @Puranjay-del-Mishra @LysandreJik I was supposed to submit a PR last week but I came down with health problems. |
Hey @pri1311 , go ahead with the PR. All the best. |
I'm gonna try this out. Appreciate it. |
Hi @NielsRogge, |
Yes, sure! Do you need some help? |
Thanks! I'll get back to you if I have queries |
Hello @NielsRogge. I have been following all the steps depicted in the guide https://huggingface.co/docs/transformers/add_new_model. I have already done all previous step to create a PR. At this moment I have a fork on my github of the whole transformer-HuggingFace project and I have created my "draft" copying VIT by using the command "transformers-cli add-new-model-like". After that, I created a draft pull request from my dev-fork-branch to my main-fork-branch and I tried to include you as a reviewer, but It was not possible. Am I missing some steps? Should the pull request be done directly from my dev-fork-brach to some branch in the real repository? |
Hi @NielsRogge and @LysandreJik, I have been working on this task for the last few weeks and my code is doing the forward pass properly. Now I am implementing the tokenizer but I have a doubt. In the original repository they have created many functions to transform input images. Can I include this function/library as a requirement for the HuggingFace tokenizer or they must be implemented from scratch? Many thanks |
Hi, UNETR is a vision model so it probably doesn't require a tokenizer? You probably want to create an image processor for this model, is that right? In that case, image processors should be implemented to support minimal inference. They should perform the exact same transformations as the original implementation to prepare data for the model for inference. For computer vision models, this typically involves resizing to a particular size + normalization. An example of an image processor can be found here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/vit/image_processing_vit.py |
Thank you for the answer @NielsRogge. When I was talking about the tokenizer I was meaning in fact the image_processor. When you check how the original repository implements the model, you realize they are using some transformations not implemented in Hugging Face library. These transormations normalize, filter and resize the 3d image in particular ways, with an slightly complex hierarchy of functions that can not be implemented with the current functions you can find in the "image_processing_utils.py" As far as I can see there are three options to implement this part in the Hugging Face code:
What is the recommended option? |
Thanks for the nice suggestions! I'll ping @amyeroberts for this, as she's currently working on refactoring our image processing pipelines. |
Thank you Niels. Please let me know when you have some info. I'll be working in the refactor of the UNETR decoder since the forward pass is using currrently a dependency of the monai project (original project) as well. |
Discussed this offline with @amyeroberts, here's what she responded: I’d use the third party for now (with usual
so that we can remove easily if needs be. |
@caleb-vicente Thanks for all your work so far adding this model ❤️ Adding to Niels comment above: Regarding your suggestions, option 1 is the one I would go for: importing specific functionality from the MONAI project. I completely agree we don't want to reinvent the wheel! We already use third party packages for certain processing e.g. pytesseract for the LayoutLM models. Like the LayoutLM models, we can add MONAI as an optional dependency. Regarding transforms in the screenshot above, one thing to consider is the image processors don't perform augmentation, they are responsible for transforming the data so that it can be fed into the model i.e. the In the snippet:
there's also the consideration about input types. All of the current functions take in and return numpy arrays and it should be possible to disable any of the transforms e.g. Let me know if there's any other questions you have regarding this :) |
Hello @NielsRogge and @amyeroberts, Thank you so much for the answers. Please find a few comments below:
I will keep you updated about the progess or any doubt :) |
Model description
I would like to add a new model:
Proposed in the paper: UNETR: Transformers for 3D Medical Image Segmentation
UNEt TRansformers (UNETR) utilize a transformer as the encoder to learn sequence representations of the input volume and effectively capture the global multi-scale information, while also following the successful "U-shaped" network design for the encoder and decoder. The transformer encoder is directly connected to a decoder via skip connections at different resolutions to compute the final semantic segmentation output.
Open source status
Provide useful links for the implementation
Model Implementation: https://github.com/Project-MONAI/research-contributions/tree/master/UNETR
Pretrained Model: https://drive.google.com/file/d/1kR5QuRAuooYcTNLMnMj80Z9IgSs8jtLO/view?usp=sharing (Based on BTCV dataset)
The text was updated successfully, but these errors were encountered: