CANET — Architecture and Code from scratch

Solomon
4 min readAug 10, 2022
https://www.researchgate.net/publication/326875064_Towards_a_Meaningful_3D_Map_Using_a_3D_Lidar_and_a_Camera

In previous article , we have discussed what are the motivations behind creating CANET architecture, in this article, we can go through how the model is designed to capture the Global and Context details and also i have replicated this architecture from scratch so that we can get idea to create such architectures on our own.

CANET comprises of two parts 1) Encoder and 2)Decoder

In Encoder, the input raw image is sent to the Channel Maps which are arranged consecutively followed by Global flow ,context flow and Feature Selection Modules.

In Decoder, the data is upsampled and concatenated with Feature Selection module which is again upsampled to get the segmented output image.

Channel Map

A channel maps are created by arranging Conv block and Identity Block.

Conv Block

Identity Block

Encoder

Encoder has 4 Channel Maps, say C1,C2,C3,C4 and they are arranged in such a way that,

C1 width and heights are 4 times less than the original image

C2 width and heights are 8 times less than the original image

C3 width and heights are 8 times less than the original image

C4 width and heights are 8 times less than the original image

This done by using Stride parameter,

The output of C4 will be passed to the Chained Context Aggregation Module(CAM)

Chained Context Aggregation Module(CAM)

The three main modules that encodes the image information is Global Flow, Context Flow and Feature selection module

As we discussed earlier blog, the global flow will capture the high level information and context flow is for capturing the localized definitions.

Global Flow

Context Flow

Feature Selection Module

Adapted Global Convolution Network (AGCN)

Connecting all the modules

Now we have all the modules, lets connect all create a model.

Model compile with dice loss

model_canet2 = Model(inputs = INPUT, outputs = OUTPUT, name=’CANet’)

loss = sm.losses.cce_dice_loss

adam=tf.keras.optimizers.Adam()

model_canet2.compile(loss = loss, optimizer = adam, metrics = [iou_score])

Prediction

Inference can be made on the test data and we can plot the images to validate the performance. The first one one is raw image, next is the ground truth, and the third column image is the predicted one.

Few tryouts

We can modify the identity blocks in C1, C2, C3, C4

Before we apply the final sigmoid activation, we can try to add more conv layers or BN or dropouts etc.

Also we can try to use other optimizer /learning rate /weights init /regularizations

References

https://arxiv.org/abs/2002.12041

--

--

Solomon

Passionate about Data Science and applying Machine Learning,Deep Learning algorithms