In certain scenarios, it may be necessary to debug issues related to the Apache Camel Cluster Service. Problems that can manifest from a non-working setup include seeing some “unacked” records in blc_notification_state
tables, indicating that the Retry Cluster Service is not working for some reason.
If you are unaware of how Broadleaf leverage’s Apache Camel’s Cluster Service, you can read more about it here: Broadleaf Dev Central
The retry logic mainly kicks in for two scenarios:
- to send notifications for the notification states have been inserted as seed data (or directly in the DB with an “unacked” status), OR
- if the initial message send we do during a CRUD operation fails for whatever reason (e.g. the broker is down temporarily). Most of the time, with (2), it will succeed and never even hit the retry flow. If there’s something wrong with the retry flow, you’d only know if that initial send failed, which it typically doesn’t.
Debug Tip 1: Check ENV Properties
When running on Kubernetes, there are specific ENV properties that need to be set to tell the system which Camel Cluster implementation to use, e.g. file
based for local dev and kubernetes
or zookeeper
when deploying to a higher environment (e.g. on Kubernetes in the cloud)
Double check the appropriate ENV properties are set and injected properly on your pods: Broadleaf Dev Central
Additional Tip: Turn on the Broadleaf Environment Report and verify that your ENV properties are being properly picked up as expected. For example, in your config server, add the following property for all your flex packages:
broadleaf:
environment:
report:
disabled: false
Debug Tip 2: When using the Kubernetes Camel Cluster Service - check leases
If you have configured the Kubernetes Camel Cluster Service - then the number one check is to verify that each of your deployed applications is able to create a kubernetes lease
. You can verify this by running: kubectl get leases
and should see an output similar to what is presented here: Broadleaf Dev Central
Important: each type of FLEX PACKAGE should have ownership to various leases. For example, if you deploy the Balanced
composition, then you should see lease holders for the auth
, browse
, cart
, processing
, and supporting
flex packages. If you notice that one or more of the following are missing, then that likely points to some issue with that flex package type unable to interact with the Camel Cluster Service.
Debug Tip 3: Turn on more granular logging
To see what the Camel Cluster Service implementation is doing, you may find it useful to turn on additional logging. Setting the following logs may be helpful :
logging:
level:
io:
fabric8: DEBUG
org:
apache:
camel: DEBUG (you may also want to set to TRACE for more visibility, but does get noisy)
com:
broadleafcommerce:
common:
messaging:
notification: TRACE
Additional Tip: From an Flex Package startup log perspective, one other indication that there may be a problem is looking at the logs during startup - the following shows screenshots of normal INFO
level logging turned on (assuming you have Kubernetes as your backing cluster implementation):
A successful startup should have logs emitted that look like this:
A startup for a Flex Package that has a problem communicating with the Kubernetes Camel Cluster Service might look like this:
Debug Tip 4: Hook Up a Remote JVM Debugger
If possible, it may be useful to hook up a remove JVM debugger to the pod that is having Camel Cluster Service issues.
To hook up a remote debugger to a pod, the application must have the remote debug port enabled. The default HELM charts for Broadleaf’s flex packages have a property that can be set to enable this. For example, on the blc-browse
flex package chart, there is a property on the values.yml
file that you can set like:
debug:
enabled: false
port: 9004
Next - you will need to port forward the debug port to your local machine (You can do this via kubectl
or via a tool like Lens)
Once you have that setup, now you can start a remove JVM debug session in intelliJ. You can set this up using a Debug configuration similar to the one shown here:
Debug Tip 5: Understanding the Kubernetes Camel Cluster Service Flow
It’s helpful to understand the flow of how the Kubernetes Camel Cluster Service works to set appropriate breakpoints.
The general call stack looks like this:
DeferServiceStartupListener
→ BaseService
→ AbstractCamelClusterService
→ KubernetesClusterView
On initialization of DeferServiceStartupListener
, it has 2 services list: earlyServices
and services
.
During execution of this class, the doStart
method gets called twice with both lists… first the earlyServices
list (in the method onCamelContextStarting
) and then with the services list (in the method onCamelContextStarted
)
The KubernetesClusterService
is supposed to be loaded in the services
list (not the earlyServices). If you notice that the KubernetesClusterService
class is NOT present in the services
list, then that indicates a Spring bean loading problem/conflict. You’ll want to check all instantiations of the CamelClusterService
in your codebase for issues.
A successful breakpoint and step through of this flow would look something like this:
-
DeferServiceStartupListener
starts up and iterates through the list ofearlyServices
first and starts all them up.
-
DeferServiceStartupListener
next iterates through the list ofservices
and starts all them up (it should include theKubernetesClusterService
-
The
BaseService#start()
method which is a parent class ofKubernetesClusterView
will be called
-
The
BaseService#start()
method which is a parent class ofKubernetesClusterService
will be called
-
The
AbstractCamelClusterService#doStart()
method will be called
-
The
KubernetesClusterView#doStart()
method will be called and try to get a connection to the running Kubernetes Control Plain API