Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UNETR: Transformers for 3D Medical Image Segmentation #17309

Open
2 tasks done
pri1311 opened this issue May 17, 2022 · 18 comments · May be fixed by caleb-vicente/transformers#1
Open
2 tasks done

UNETR: Transformers for 3D Medical Image Segmentation #17309

pri1311 opened this issue May 17, 2022 · 18 comments · May be fixed by caleb-vicente/transformers#1

Comments

@pri1311
Copy link

pri1311 commented May 17, 2022

Model description

I would like to add a new model:

Proposed in the paper: UNETR: Transformers for 3D Medical Image Segmentation

UNEt TRansformers (UNETR) utilize a transformer as the encoder to learn sequence representations of the input volume and effectively capture the global multi-scale information, while also following the successful "U-shaped" network design for the encoder and decoder. The transformer encoder is directly connected to a decoder via skip connections at different resolutions to compute the final semantic segmentation output.

Open source status

  • The model implementation is available
  • The model weights are available

Provide useful links for the implementation

Model Implementation: https://github.com/Project-MONAI/research-contributions/tree/master/UNETR

Pretrained Model: https://drive.google.com/file/d/1kR5QuRAuooYcTNLMnMj80Z9IgSs8jtLO/view?usp=sharing (Based on BTCV dataset)

@Puranjay-del-Mishra
Copy link

Hello. What is the status of the implementation? I would like to contribute to it.

@LysandreJik
Copy link
Member

Hey @Puranjay-del-Mishra, to the best of my knowledge nobody has started working on it. We'd be very happy for you to take a stab at adding it!

You can follow the tutorial here: adding a new model.

We especially recommend following the add-new-model-like command and guide.

If you have not contributed to transformers yet, we also recommend reading the contributing guide.

@Puranjay-del-Mishra
Copy link

Puranjay-del-Mishra commented May 31, 2022

Sure! @LysandreJik
I'll go through it and give it a shot. Thanks.

@pri1311
Copy link
Author

pri1311 commented May 31, 2022

Hey @Puranjay-del-Mishra @LysandreJik I was supposed to submit a PR last week but I came down with health problems.
I will be sending a PR by the weekend.

@Puranjay-del-Mishra
Copy link

Hey @pri1311 , go ahead with the PR. All the best.

@Wernstrong67
Copy link

I'm gonna try this out. Appreciate it.

@arv-77
Copy link

arv-77 commented Oct 19, 2022

Hi @NielsRogge,
Can I have a shot at implementing this model?

@NielsRogge
Copy link
Contributor

Yes, sure! Do you need some help?

@arv-77
Copy link

arv-77 commented Oct 20, 2022

Thanks! I'll get back to you if I have queries

@caleb-vicente
Copy link

Hello @NielsRogge. I have been following all the steps depicted in the guide https://huggingface.co/docs/transformers/add_new_model. I have already done all previous step to create a PR. At this moment I have a fork on my github of the whole transformer-HuggingFace project and I have created my "draft" copying VIT by using the command "transformers-cli add-new-model-like". After that, I created a draft pull request from my dev-fork-branch to my main-fork-branch and I tried to include you as a reviewer, but It was not possible. Am I missing some steps? Should the pull request be done directly from my dev-fork-brach to some branch in the real repository?

Attaching snapshot of the problem:
error_adding_reviewers

@caleb-vicente caleb-vicente linked a pull request Nov 6, 2022 that will close this issue
5 tasks
@caleb-vicente
Copy link

Hi @NielsRogge and @LysandreJik,

I have been working on this task for the last few weeks and my code is doing the forward pass properly. Now I am implementing the tokenizer but I have a doubt. In the original repository they have created many functions to transform input images. Can I include this function/library as a requirement for the HuggingFace tokenizer or they must be implemented from scratch?

Many thanks

@NielsRogge
Copy link
Contributor

NielsRogge commented Dec 12, 2022

Hi,

UNETR is a vision model so it probably doesn't require a tokenizer? You probably want to create an image processor for this model, is that right?

In that case, image processors should be implemented to support minimal inference. They should perform the exact same transformations as the original implementation to prepare data for the model for inference. For computer vision models, this typically involves resizing to a particular size + normalization.

An example of an image processor can be found here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/vit/image_processing_vit.py

@caleb-vicente
Copy link

caleb-vicente commented Dec 12, 2022

Thank you for the answer @NielsRogge.

When I was talking about the tokenizer I was meaning in fact the image_processor. When you check how the original repository implements the model, you realize they are using some transformations not implemented in Hugging Face library. These transormations normalize, filter and resize the 3d image in particular ways, with an slightly complex hierarchy of functions that can not be implemented with the current functions you can find in the "image_processing_utils.py"
image

image

As far as I can see there are three options to implement this part in the Hugging Face code:

  • Use exactly the same functions they use in the original project (importing libraries of the monai project) https://github.com/Project-MONAI/MONAI
  • Copy/paste the code (of the monai project) in the image_processing_utils.py and addapt the style and names to make it more legible.
  • Implement from scratch the whole code. This could be time-consuming and pretty hard to obtain same results as in the original code.

What is the recommended option?

@NielsRogge
Copy link
Contributor

Thanks for the nice suggestions! I'll ping @amyeroberts for this, as she's currently working on refactoring our image processing pipelines.

@caleb-vicente
Copy link

caleb-vicente commented Dec 14, 2022

Thank you Niels.

Please let me know when you have some info. I'll be working in the refactor of the UNETR decoder since the forward pass is using currrently a dependency of the monai project (original project) as well.

@NielsRogge
Copy link
Contributor

Discussed this offline with @amyeroberts, here's what she responded:

I’d use the third party for now (with usual xxx_is_available checks) and wrap inside the image processor e.g.
import thirdparty

class MyImageProcessor:
    def transform_1(self, image, *args, **kwargs):
        image = thirdparty.transform_1(image, *args, **kwargs)
        ...

so that we can remove easily if needs be.
Looking at the MONAI library:
Torch is required. This is fine for implementing the first model, but shouldn’t be necessary for our TF model users. If the model turns out to be popular it would be good to remove this dependancy so we can port easily. Most of the transforms listed are compositions of standard logic we already have e.g. CropForeground would only require us implementing logic to calculate the bounding box.

@amyeroberts
Copy link
Collaborator

amyeroberts commented Dec 14, 2022

@caleb-vicente Thanks for all your work so far adding this model ❤️

Adding to Niels comment above:

Regarding your suggestions, option 1 is the one I would go for: importing specific functionality from the MONAI project. I completely agree we don't want to reinvent the wheel! We already use third party packages for certain processing e.g. pytesseract for the LayoutLM models. Like the LayoutLM models, we can add MONAI as an optional dependency.

Regarding transforms in the screenshot above, one thing to consider is the image processors don't perform augmentation, they are responsible for transforming the data so that it can be fed into the model i.e. the UterImageProcessor shouldn't have the random operations like RandFlipd.

In the snippet:

class MyImageProcessor:
    def transform_1(self, image, *args, **kwargs):
        image = thirdparty.transform_1(image, *args, **kwargs)
        ...

there's also the consideration about input types. All of the current functions take in and return numpy arrays and it should be possible to disable any of the transforms e.g. do_resize=False. As far as I can tell, MONAI will accept both torch and numpy, but always returns torch arrays. This is OK for a first implementation before removing the torch dependency as long as the ability to disable any of the transforms still applies.

Let me know if there's any other questions you have regarding this :)

@caleb-vicente
Copy link

Hello @NielsRogge and @amyeroberts,

Thank you so much for the answers. Please find a few comments below:

  • I will implement the optional dependency with the monai library.
  • For the first implementation I will use functions as they are in the library. For next iterations I could simplify some of them using work already done in the Hugging Face library.
  • About data augmentation I will review it again to see if I can find any of those in MONAI inference phase. In this case the function RandFlipd is used only in training mode in the Notebook from which I took the snapshot (Sorry for the confusion).
  • I will add a layer on top of MONAI's dependencies so that everything works with numpy arrays if necessary. Additionally the possibility to decide will be included.

I will keep you updated about the progess or any doubt :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants