UNETR: Transformers for 3D Medical Image Segmentation #17309

pri1311 · 2022-05-17T19:03:42Z

Model description

I would like to add a new model:

Proposed in the paper: UNETR: Transformers for 3D Medical Image Segmentation

UNEt TRansformers (UNETR) utilize a transformer as the encoder to learn sequence representations of the input volume and effectively capture the global multi-scale information, while also following the successful "U-shaped" network design for the encoder and decoder. The transformer encoder is directly connected to a decoder via skip connections at different resolutions to compute the final semantic segmentation output.

Open source status

The model implementation is available
The model weights are available

Provide useful links for the implementation

Model Implementation: https://github.com/Project-MONAI/research-contributions/tree/master/UNETR

Pretrained Model: https://drive.google.com/file/d/1kR5QuRAuooYcTNLMnMj80Z9IgSs8jtLO/view?usp=sharing (Based on BTCV dataset)

Puranjay-del-Mishra · 2022-05-25T12:46:28Z

Hello. What is the status of the implementation? I would like to contribute to it.

LysandreJik · 2022-05-31T07:23:55Z

Hey @Puranjay-del-Mishra, to the best of my knowledge nobody has started working on it. We'd be very happy for you to take a stab at adding it!

You can follow the tutorial here: adding a new model.

We especially recommend following the add-new-model-like command and guide.

If you have not contributed to transformers yet, we also recommend reading the contributing guide.

Puranjay-del-Mishra · 2022-05-31T07:38:50Z

Sure! @LysandreJik
I'll go through it and give it a shot. Thanks.

pri1311 · 2022-05-31T07:39:42Z

Hey @Puranjay-del-Mishra @LysandreJik I was supposed to submit a PR last week but I came down with health problems.
I will be sending a PR by the weekend.

Puranjay-del-Mishra · 2022-05-31T07:45:24Z

Hey @pri1311 , go ahead with the PR. All the best.

Wernstrong67 · 2022-06-26T10:18:12Z

I'm gonna try this out. Appreciate it.

arv-77 · 2022-10-19T20:48:05Z

Hi @NielsRogge,
Can I have a shot at implementing this model?

NielsRogge · 2022-10-20T06:18:29Z

Yes, sure! Do you need some help?

arv-77 · 2022-10-20T16:28:29Z

Thanks! I'll get back to you if I have queries

caleb-vicente · 2022-11-06T12:10:02Z

Hello @NielsRogge. I have been following all the steps depicted in the guide https://huggingface.co/docs/transformers/add_new_model. I have already done all previous step to create a PR. At this moment I have a fork on my github of the whole transformer-HuggingFace project and I have created my "draft" copying VIT by using the command "transformers-cli add-new-model-like". After that, I created a draft pull request from my dev-fork-branch to my main-fork-branch and I tried to include you as a reviewer, but It was not possible. Am I missing some steps? Should the pull request be done directly from my dev-fork-brach to some branch in the real repository?

Attaching snapshot of the problem:

caleb-vicente · 2022-12-12T12:28:14Z

Hi @NielsRogge and @LysandreJik,

I have been working on this task for the last few weeks and my code is doing the forward pass properly. Now I am implementing the tokenizer but I have a doubt. In the original repository they have created many functions to transform input images. Can I include this function/library as a requirement for the HuggingFace tokenizer or they must be implemented from scratch?

Many thanks

NielsRogge · 2022-12-12T12:50:03Z

Hi,

UNETR is a vision model so it probably doesn't require a tokenizer? You probably want to create an image processor for this model, is that right?

In that case, image processors should be implemented to support minimal inference. They should perform the exact same transformations as the original implementation to prepare data for the model for inference. For computer vision models, this typically involves resizing to a particular size + normalization.

An example of an image processor can be found here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/vit/image_processing_vit.py

caleb-vicente · 2022-12-12T16:03:25Z

Thank you for the answer @NielsRogge.

When I was talking about the tokenizer I was meaning in fact the image_processor. When you check how the original repository implements the model, you realize they are using some transformations not implemented in Hugging Face library. These transormations normalize, filter and resize the 3d image in particular ways, with an slightly complex hierarchy of functions that can not be implemented with the current functions you can find in the "image_processing_utils.py"

As far as I can see there are three options to implement this part in the Hugging Face code:

Use exactly the same functions they use in the original project (importing libraries of the monai project) https://github.com/Project-MONAI/MONAI
Copy/paste the code (of the monai project) in the image_processing_utils.py and addapt the style and names to make it more legible.
Implement from scratch the whole code. This could be time-consuming and pretty hard to obtain same results as in the original code.

What is the recommended option?

NielsRogge · 2022-12-12T16:24:59Z

Thanks for the nice suggestions! I'll ping @amyeroberts for this, as she's currently working on refactoring our image processing pipelines.

caleb-vicente · 2022-12-14T12:17:42Z

Thank you Niels.

Please let me know when you have some info. I'll be working in the refactor of the UNETR decoder since the forward pass is using currrently a dependency of the monai project (original project) as well.

NielsRogge · 2022-12-14T12:41:41Z

Discussed this offline with @amyeroberts, here's what she responded:

I’d use the third party for now (with usual xxx_is_available checks) and wrap inside the image processor e.g.
import thirdparty

class MyImageProcessor:
    def transform_1(self, image, *args, **kwargs):
        image = thirdparty.transform_1(image, *args, **kwargs)
        ...

so that we can remove easily if needs be.
Looking at the MONAI library:
Torch is required. This is fine for implementing the first model, but shouldn’t be necessary for our TF model users. If the model turns out to be popular it would be good to remove this dependancy so we can port easily. Most of the transforms listed are compositions of standard logic we already have e.g. CropForeground would only require us implementing logic to calculate the bounding box.

amyeroberts · 2022-12-14T13:41:52Z

@caleb-vicente Thanks for all your work so far adding this model ❤️

Adding to Niels comment above:

Regarding your suggestions, option 1 is the one I would go for: importing specific functionality from the MONAI project. I completely agree we don't want to reinvent the wheel! We already use third party packages for certain processing e.g. pytesseract for the LayoutLM models. Like the LayoutLM models, we can add MONAI as an optional dependency.

Regarding transforms in the screenshot above, one thing to consider is the image processors don't perform augmentation, they are responsible for transforming the data so that it can be fed into the model i.e. the UterImageProcessor shouldn't have the random operations like RandFlipd.

In the snippet:

class MyImageProcessor:
    def transform_1(self, image, *args, **kwargs):
        image = thirdparty.transform_1(image, *args, **kwargs)
        ...

there's also the consideration about input types. All of the current functions take in and return numpy arrays and it should be possible to disable any of the transforms e.g. do_resize=False. As far as I can tell, MONAI will accept both torch and numpy, but always returns torch arrays. This is OK for a first implementation before removing the torch dependency as long as the ability to disable any of the transforms still applies.

Let me know if there's any other questions you have regarding this :)

caleb-vicente · 2022-12-16T16:22:52Z

Hello @NielsRogge and @amyeroberts,

Thank you so much for the answers. Please find a few comments below:

I will implement the optional dependency with the monai library.
For the first implementation I will use functions as they are in the library. For next iterations I could simplify some of them using work already done in the Hugging Face library.
About data augmentation I will review it again to see if I can find any of those in MONAI inference phase. In this case the function RandFlipd is used only in training mode in the Notebook from which I took the snapshot (Sorry for the confusion).
I will add a layer on top of MONAI's dependencies so that everything works with numpy arrays if necessary. Additionally the possibility to decide will be included.

I will keep you updated about the progess or any doubt :)

pri1311 added the New model label May 17, 2022

NielsRogge added the Good First Issue label May 18, 2022

caleb-vicente linked a pull request Nov 6, 2022 that will close this issue

Add unetr caleb-vicente/transformers#1

Draft

5 tasks

sgugger removed the Good First Issue label Mar 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UNETR: Transformers for 3D Medical Image Segmentation #17309

UNETR: Transformers for 3D Medical Image Segmentation #17309

pri1311 commented May 17, 2022 •

edited

Puranjay-del-Mishra commented May 25, 2022

LysandreJik commented May 31, 2022

Puranjay-del-Mishra commented May 31, 2022 •

edited

pri1311 commented May 31, 2022

Puranjay-del-Mishra commented May 31, 2022

Wernstrong67 commented Jun 26, 2022

arv-77 commented Oct 19, 2022

NielsRogge commented Oct 20, 2022

arv-77 commented Oct 20, 2022

caleb-vicente commented Nov 6, 2022

caleb-vicente commented Dec 12, 2022

NielsRogge commented Dec 12, 2022 •

edited

caleb-vicente commented Dec 12, 2022 •

edited

NielsRogge commented Dec 12, 2022

caleb-vicente commented Dec 14, 2022 •

edited

NielsRogge commented Dec 14, 2022

amyeroberts commented Dec 14, 2022 •

edited

caleb-vicente commented Dec 16, 2022

UNETR: Transformers for 3D Medical Image Segmentation #17309

UNETR: Transformers for 3D Medical Image Segmentation #17309

Comments

pri1311 commented May 17, 2022 • edited

Model description

Open source status

Provide useful links for the implementation

Puranjay-del-Mishra commented May 25, 2022

LysandreJik commented May 31, 2022

Puranjay-del-Mishra commented May 31, 2022 • edited

pri1311 commented May 31, 2022

Puranjay-del-Mishra commented May 31, 2022

Wernstrong67 commented Jun 26, 2022

arv-77 commented Oct 19, 2022

NielsRogge commented Oct 20, 2022

arv-77 commented Oct 20, 2022

caleb-vicente commented Nov 6, 2022

caleb-vicente commented Dec 12, 2022

NielsRogge commented Dec 12, 2022 • edited

caleb-vicente commented Dec 12, 2022 • edited

NielsRogge commented Dec 12, 2022

caleb-vicente commented Dec 14, 2022 • edited

NielsRogge commented Dec 14, 2022

amyeroberts commented Dec 14, 2022 • edited

caleb-vicente commented Dec 16, 2022

pri1311 commented May 17, 2022 •

edited

Puranjay-del-Mishra commented May 31, 2022 •

edited

NielsRogge commented Dec 12, 2022 •

edited

caleb-vicente commented Dec 12, 2022 •

edited

caleb-vicente commented Dec 14, 2022 •

edited

amyeroberts commented Dec 14, 2022 •

edited