Posts Tagged: ‘karaf’

Coming in Karaf 3.0.0: new enterprise JNDI feature

December 13, 2013 Posted by jbonofre

In previous Karaf version (2.x), the JNDI support was “basic”.
We just leveraged Aries JNDI to support the osgi:service JNDI scheme to reference the OSGi services using JNDI name.

However, we didn’t provide a fully functionnal JNDI initial context, nor any tooling around JNDI.

In part of the new enterprise features coming with Karaf 3.0.0, the JNDI support is now more “complete”.

Add JNDI support

As most of the other enterprise features, the JNDI feature is an optional one. It means that you have to install the jndi feature first:

karaf@root()> feature:install jndi

The jndi feature installs several parts.

Ready to use initial context

Like in previous version, Karaf provides a fully compliant implementation of the OSGi Alliance JNDI Service Specification. This specification details how to advertise InitialContextFactory and ObjectFactories in an OSGi environment. It also defines how to obtain services from services registry via JNDI.

Now, it’s possible to use directly the JNDI initial context. Karaf now provides a fully functionnal initial context where you can lookup both the osgi:service scheme or a regular JNDI name.

You can do:

Context context = new InitialContext();
MyBean myBean = (MyBean) context.lookup("my/bean/name");

You can use the osgi:service scheme to access to the OSGi service registry using JNDI:

Context context = new InitialContext();
MyBean myBean = (MyBean) context.lookup("osgi:service/mybean");

JNDI Service, Commands and MBean

Karaf 3.0.0 provides a OSGi service dedicated to JNDI.

The interface of this JNDI service is org.apache.karaf.jndi.JndiService and it’s registered when installing the jndi feature.

You can manipulate the JNDI service using shell commands.

You can list the JNDI name using jndi:names:

karaf@root()> jndi:names
JNDI Name         | Class Name                                    
------------------------------------------------------------------
osgi:service/jndi | org.apache.karaf.jndi.internal.JndiServiceImpl

You can create a new JNDI name using anoher one (a kind of alias) using the jndi:alias command:

karaf@root()> jndi:alias osgi:service/jndi local/service/jndi
karaf@root()> jndi:names
JNDI Name          | Class Name                                    
-------------------------------------------------------------------
osgi:service/jndi  | org.apache.karaf.jndi.internal.JndiServiceImpl
local/service/jndi | org.apache.karaf.jndi.internal.JndiServiceImpl

For instance, here, we bind a name from the “special” osgi:service scheme as a “regular” JNDI name.

You can directly bind a OSGi service (identifiy by its service.id) with a JNDI name:

karaf@root()> jndi:bind 344 local/service/kar
karaf@root()> jndi:names
JNDI Name         | Class Name                                    
------------------------------------------------------------------
local/service/kar | org.apache.karaf.kar.internal.KarServiceImpl  
osgi:service/jndi | org.apache.karaf.jndi.internal.JndiServiceImpl

You can alias the local/service/kar name with directly service/kar:

karaf@root()> jndi:alias local/service/kar service/kar
karaf@root()> jndi:names
JNDI Name         | Class Name                                    
------------------------------------------------------------------
local/service/kar | org.apache.karaf.kar.internal.KarServiceImpl  
service/kar       | org.apache.karaf.kar.internal.KarServiceImpl  
osgi:service/jndi | org.apache.karaf.jndi.internal.JndiServiceImpl

You can unbind the service/kar name:

karaf@root()> jndi:unbind service/kar
karaf@root()> jndi:names
JNDI Name         | Class Name                                    
------------------------------------------------------------------
local/service/kar | org.apache.karaf.kar.internal.KarServiceImpl  
osgi:service/jndi | org.apache.karaf.jndi.internal.JndiServiceImpl

You can get all JNDI names, and manipulate the JNDI service using a new JMX JNDI MBean. The object name to use is org.apache.karaf:type=jndi,name=*.

Conclusion

One of our purpose for Karaf 3.0.0 is to provide more services, commands, MBeans to move Karaf as a more complete full enterprise OSGi container.

If we already provide a bunch of features, a lot are not really “visible” to the end users due to some “missing” commands or MBeans.

It’s a key point for Karaf 3.x releases.

Coming in Karaf 3.0.0: RBAC support for OSGi services and console commands

December 12, 2013 Posted by jbonofre

In a previous post, we saw a new Karaf feature: support of user groups and Role-Based Access Controle (RBAC) for the JMX layer.

We extended the RBAC support to the OSGi services, and by side effect to the console commands (as a console command is also an OSGi service).

RBAC for OSGi services

The JMX RBAC support uses a MBeanServerBuilder. The KarafMBeanServerBuilder “intercepts” the call to the MBeans, checks the definition (defined in etc/jmx.acl.*.cfg configuration files) and defines if the call can be performed or not.

Regarding the RBAC support for OSGi services, we use a similar mechanism.

The Karaf Service Guard provides a service listener which intercepts the service calls, and check if the call to the service can be performed or not.

The list of “secured” OSGi service is defined in the karaf.secured.services property in the etc/system.properties (using a LDAP syntax filter).

By default, we only “intercept” (and so secure) the command OSGi services:

karaf.secured.services = (&(osgi.command.scope=*)(osgi.command.function=*))

The RBAC definition itself are stored in etc/org.apache.karaf.service.acl.*.cfg configuration files, similar to the etc/jmx.acl*.cfg configuration files used for JMX. The syntax in this file is the same.

RBAC for console commands

As the console commands are actually OSGi services, the direct application of the OSGi services RBAC support is to secure the console commands.

By default, we secure only the OSGi services associated to the console commands (as explained early in the karaf.secured.services).

The RBAC definition on the console commands are defined in the etc/org.apache.karaf.commands.acl.*.cfg configuration files.

You can define one configuration file by command scope. For instance, the etc/org.apache.karaf.commands.acl.bundle.cfg configuration file defines the RBAC for the bundle:* commands.

For instance, in the etc/org.apache.karaf.commands.acl.bundle.cfg configuration file, we can define:

install = admin
refresh[/.*[-][f].*/] = admin
refresh = manager
restart[/.*[-][f].*/] = admin
restart = manager
start[/.*[-][f].*/] = admin
start = manager
stop[/.*[-][f].*/] = admin
stop = manager
uninstall[/.*[-][f].*/] = admin
uninstall = manager
update[/.*[-][f].*/] = admin
update = manager
watch = admin

The format is command[option]=role.

For instance, in this file we:

  • limit bundle:install and bundle:watch commands only for the users with the admin role
  • limit bundle:refresh, bundle:restart, bundle:start, bundle:stop, bundle:uninstall, bundle:update commands with the -f option (meaning executing these commands for “system” bundles) only for the users with the admin role
  • all other commands (not matching the two previously defined rules) can be executed by the users with the manager role

By default, we define RBAC for:

  • bundle:* commands (in the etc/org.apache.karaf.command.acl.bundle.cfg configuration file)
  • config:* commands (in the etc/org.apache.karaf.command.acl.config.cfg configuration file)
  • feature:* commands (in the etc/org.apache.karaf.command.acl.feature.cfg configuration file)
  • jaas:* commands (in the etc/org.apache.karaf.command.acl.jaas.cfg configuration file)
  • kar:* commands (in the etc/org.apache.karaf.command.acl.kar.cfg configuration file)
  • shell:* commands (in the etc/org.apache.karaf.command.acl.shell.cfg configuration file)
  • system:* commands (in the etc/org.apache.karaf.command.acl.system.cfg configuration file)

This RBAC rules apply on both “local” console and remote SSH console.

As you don’t really logon the “local” console, we have to define the “roles” that we can use in the “local” console.

These “local” roles are defined in the karaf.local.roles in the etc/system.properties configuration file:

karaf.local.roles = admin,manager,viewer

We can see that, when we use the “local” console, the “implicit local user” will have the admin, manager, and viewer roles.

Coming in Karaf 3.0.0: subshell and completion mode

October 10, 2013 Posted by jbonofre

If you are a Karaf user, you probably know that Karaf is very extensible: you can add features in Karaf to provide new functionalities.

For instance, you can install Camel, ActiveMQ, CXF, Cellar, etc in your Karaf runtime.

Most of these features provide new commands:
– Camel provides camel:* commands to manipulate the Camel Context, the routes, etc.
– CXF provides cxf:* commands to manipulate the CXF buses, endpoints, etc.
– ActiveMQ provides activemq:* commands to manipulate brokers.
– Cellar provides cluster:* commands to manipulate cluster nodes, cluster groups, etc.
– and so on

If you install some features like this, the number of commands available in the Karaf shell console is really impressive. And it’s not always easy to find the one that we need.

That’s why subshell support has been introduced.

Subshell

Karaf now uses commands scope to create “on the fly” a subshell: the commands are grouped by subshell. As you will see later, depending of the completion mode that you will use, you will be able to see the commands only in the current subshell, and change from one subshell to another.

Let take an exemple. In Karaf itself, we have commands to manipulate bundle and commands to manipulate feature, for instance:

  • bundle:list list the bundles
  • bundle:start start bundles
  • bundle:stop stop bundles
  • feature:list list the Karaf features
  • feature:repo-list list the Karaf features repositories

In previous Karaf version, to list bundles and features, you did something like this:


karaf@root> osgi:list
...
karaf@root> features:list
...

In Karaf 3.0.0, you can still do the same (just using the new name of the commands):


karaf@root()> bundle:list
...
karaf@root()> feature:list
...

But you can also use subshell:


karaf@root()> bundle
karaf@root(bundle)> list
...
karaf@root(bundle)> feature
karaf@root(feature)> list
...

or


karaf@root()> bundle
karaf@root(bundle)> list
...
karaf@root(bundle)> exit
karaf@root()> feature
karaf@root(feature)> list
...

We can note several things here:

  • You have commands to go into a subshell. These commands are created on the fly by Karaf using the scope of the commands. Here, we use the bundle and feature commands to go into the bundle and feature subshell.
  • You can see your current subshell location directly in the prompt:

    karaf@root(bundle)>

    We can see here that we are in the bundle subshell.
  • We can switch directly from one subhsell to another using the subshell command:

    karaf@root(bundle)> feature
    karaf@root(feature)>
  • You have a new exit command to get out from the current subhsell and return to the root level.

You have the choice between different completion mode, depending the behaviour that you prefer.

Completion Mode

The completion mode defines the behaviour of the TAB key to complete commands.

You have three different modes available:

  • GLOBAL
  • FIRST
  • SUBSHELL

You can define your default completion mode using the completionMode property in etc/org.apache.karaf.shell.cfg file. By default, you have:


completionMode = GLOBAL

But, you can also change the completion mode “on the fly” (while using the Karaf shell console) using a new command: shell:completion:


karaf@root()> shell:completion
GLOBAL
karaf@root()> shell:completion FIRST
karaf@root()> shell:completion
FIRST

shell:completion can inform you about the current completion mode used. You can also provide the new completion mode that you want.

GLOBAL completion mode

GLOBAL completion mode is the default one in Karaf 3.0.0 (mostly for transition purpose).

GLOBAL mode doesn’t really use subshell: it’s the same behavior as in previous Karaf versions.

When you type the TAB key, whatever in which subshell you are, the completion will display all commands and all aliases:


karaf@root()> <TAB>
karaf@root()> Display all 273 possibilities? (y or n)
...
karaf@root()> feature
karaf@root(feature)> <TAB>
karaf@root(feature)> Display all 273 possibilities? (y or n)
...

FIRST completion mode

FIRST completion mode is an alternative to the GLOBAL completion mode.

If you type the TAB key on the root level subshell, the completion will display the commands and the aliases from all subshells (as in GLOBAL mode). However, if you type the TAB key when you are in a subshell, the completion will display only the commands of the current subshell:


karaf@root()> shell:completion FIRST
karaf@root()> <TAB>
karaf@root()> Display all 273 possibilities? (y or n)
...
karaf@root()> feature
karaf@root(feature)> <TAB>
karaf@root(feature)>
info install list repo-add repo-list repo-remove uninstall version-list
karaf@root(feature)> exit
karaf@root()> log
karaf@root(log)> <TAB>
karaf@root(log)>
clear display exception-display get log set tail

SUBSHELL completion mode

SUBSHELL completion mode is the real subshell mode (to be honest, it’s my prefered one ;)).

If you type the TAB key on the root level, the completion displays the subshell commands (to go into a subshell), and the global aliases. Once you are in a subshell, if you type the TAB key, the completion displays the commands of the current subshell:


karaf@root()> shell:completion SUBSHELL
karaf@root()> <TAB>
karaf@root()>
* bundle cl config dev feature help instance jaas kar la ld lde log log:list man package region service shell ssh system
karaf@root()> bundle
karaf@root(bundle)> <TAB>
karaf@root(bundle)>
capabilities classes diag dynamic-import find-class headers info install list refresh requirements resolve restart services start start-level stop
uninstall update watch
karaf@root(bundle)> exit
karaf@root()> camel
karaf@root(camel)> <TAB>
karaf@root(camel)>
backlog-tracer-dump backlog-tracer-info backlog-tracer-start backlog-tracer-stop context-info context-list context-start context-stop endpoint-list route-info route-list route-profile route-reset-stats
route-resume route-show route-start route-stop route-suspend

Tips

The “old” full qualified command names are still valid. So, you don’t have to change anything in your scripts, you can use:


karaf@root()> feature:install
karaf@root()> ssh:ssh
...

You have the choice: use the completion mode that you prefer, you can always change the mode when you want using the shell:completion command.

My preference is for the SUBSHELL completion mode. Using this mode, you don’t see a bunch of commands on the root level, just the subshell switch commands. I think it’s clear and straight forward. When you “extend” your Karaf runtime with a lot of additional features, it’s interesting to have commands grouped by subshell.

Coming in Karaf 3.0.0: JAAS users, groups, roles, and ACLs

October 4, 2013 Posted by jbonofre

This week I worked with David Booschaert. David proposed a patch for Karaf 3.0.0 to add the notion of groups and use ACL for JMX.

He posted a blog entry about that: http://coderthoughts.blogspot.fr/2013/10/jmx-role-based-access-control-for-karaf.html.

David’s blog is very detailed, mostly in term of implementation, the usage of the interceptor, etc. This blog is more about the pure end-user usage: how to configure group, JMX ACL, etc.

JAAS users, groups, and roles

Karaf uses JAAS for user authentication and authorisation. By default, it uses the PropertiesLoginModule, which use the etc/users.properties file to store the users.

The etc/users.properties file has the following format:


user=password,role

For instance:


karaf=karaf,admin

that means we have an user karaf, with password karaf, and admin for role.

Actually, the roles are not really used in Karaf: for instance, when you use ssh or JMX, Karaf checks the principal and credentials (basically the username and password) but it doesn’t really use the roles. All users have exactly the same permissions (basically all permissions): they can execute any shell commands, access to any MBeans and call any operation on these MBeans.

More over, the roles are “only” assigned by users. So, it means that we had to define the same roles list for two different users: it was the only way to assign the same roles list to different users.

So, in addition of users and roles, we introduced JAAS groups.

An user can be a member of a group or have roles assigned directly (as previously).

A groups has typically one or more roles assigned. An user that is part of that group will get these roles associated too.
Finally, an user has the union of the roles associated with his groups, togeher with his own roles.

Basically, the etc/users.properties file doesn’t change in terms of format. We just introduced a prefix to identify a group: _g_. An “user” with the _g_: prefix is actually a group.
So a group is defined as an user, and it’s possible to use a group in the list of roles of an user:


# users
karaf = karaf,_g_:admingroup
manager = manager,_g_:managergroup
other = other,_g_:managergroup,otherrole

#groups
_g_\:admingroup = admin,viewer,manager
_g_\:managergroup = viewer,manager

We updated the jaas:* shell commands to be able to manage groups, roles, and users:


karaf@root> jaas:realm-manage --realm karaf
karaf@root> jaas:group-add managergroup
karaf@root> jaas:group-add --help
karaf@root> jaas:user-add joe joe
karaf@root> jaas:group-add joe managergroup
karaf@root> jaas:group-role-add managergroup manager
karaf@root> jaas:group-role-add managergroup viewer
karaf@root> jaas:update
karaf@root> jaas:realm-manage --realm karaf
karaf@root> jaas:user-list
User Name | Group | Role
----------------------------------
karaf | admingroup | admin
karaf | admingroup | manager
karaf | admingroup | viewer
joe | managergroup | manager
joe | managergroup | viewer

Thanks to the groups, it’s possible to factorise the roles, and easily share different roles between the different users.

Define JMX ACLs based on roles

As explained before, the roles were not really used by Karaf. Especially, on the JMX layer, for instance, using jconsole with karaf user, you were able to see all MBeans and perform all operations.

So, we introduced the support of ACL (AccessLists) on JMX.

Now, whenever a JMX operation is invoked, the roles of the current user are checked against the required roles for this operation.

The ACL are defined using configuration files in the Karaf etc folder.

The ACL configuration file is prefixed with jmx.acl and completed with the MBean ObjectName that it applies to.

For example, to define the ACL on the MBean foo.bar:type=Test, you will create a configuration file named etc/jmx.acl.foo.bar.Test.cfg.
It’s possible to define more generic configuration file: on the domain (using jmx.acl.foo.bar.cfg) applied to all MBeans in this domain , or the most generic (jmx.acl.cfg) applied to all MBeans.

A very simple configuration file looks like:


# operation = roles
test = admin
getVal = manager,viewer

The configuration file supports different syntax to provide fine-grained operation ACL:

  • Specific match for the invocation, including arguments value:

    test(int)["17"] = role1

    It means that only users with role1 assigned will be able to invoke the test operation with 17 as argument value.
  • Regex match for the invocation:

    test(int)[/[0-9]/] = role2

    It means that only users with role2 assigned will be able to invoke the test operation with argument between 0 and 9.
  • Signature match for the invocation:

    test(int) = role3

    It means that only users with role3 assigned will be able to invoke test operation.
  • Method name match for the invocation:

    test = role4

    It means that only the users with role4 assigned will be able to invoke any test operations (whatever the list of arguments is).
  • A method name wildcard match:

    te* = role5

    It means that only the users with role5 assigned will be able to invoke any operations matching te* expression.

Karaf looks for required roles using the following process:

  1. The most specific configuration file is tried first (etc/jmx.acl.foo.bar.Test.cfg).
  2. If no matching definition is found in the specific configuration file, a more generic configuration file is inspected. In our case, Karaf will use etc/jmx.acl.foo.bar.cfg.
  3. If no matching definition is found in the domain specific configuration file, the most generic configuration file is inspected, etc/jmx.acl.cfg.

The ACLs work for any kind of MBeans including the one from the JVM itself. For instance, it’s possible to create etc/jmx.acl.java.lang.Memory.cfg configuration file containing:


gc = manager

It means that only the users with manager role assigned will be able to invoke the gc operation of the JVM Memory MBean.

It’s also possible to define more advanced configuration. For instance, we want that bundles with ID between 0 and 49 can be stopped only by an admin, the other bundles can be stopped by a manager. To do so, we create etc/jmx.acl.org.apache.karaf.bundle.cfg configuration file containing:


stop(java.lang.String)[/([1-4])?[0-9]/] = admin
stop = manager

etc/jmx.acl.cfg configuration file is a global configuration for the invocations of any MBean that doesn’t have a more specific ACL.
By default, we define this configuration:


list* = viewer
get* = viewer
is* = viewer
set* = admin
* = admin

We introduced a new MBean: org.apache.karaf:type=security,area=jmx.
The purpose of this MBean is to check whether the current user can access a certain MBean or invoke a specific operation on it.
This MBean can be used by management clients to decide whether to show certain MBeans or operations to the end user.

What’s next ?

Now, David and I are working on ACL/RBAC for:

  • shell commands: as we have ACL for MBeans, it makes sense to apply the same for shell commands.
  • OSGi services: the same can be applied to any OSGi service.

I would like to thank David for this great job. It’s a great addition to Karaf and a new very strong reason to promote Karaf 3 😉

Karaf and Pax Web: disabling reverse lookup

September 29, 2013 Posted by jbonofre

Karaf can be a full WebContainer just by installing the war feature:


features:install war

The war feature will install Pax Web and Jetty web server. You can configure Pax Web using a configuration file etc/org.ops4j.pax.web.cfg. In this configuration, you can define a Jetty configuration file (like jetty.xml) using the following property:


org.ops4j.pax.web.config.file=${karaf.base}/etc/jetty.xml

Now, using the etc/jetty.xml, you have a complete access to the Jetty configuration, especially, you can define the Connector configuration.

In the “default” connector (bound to port 8181 by default), you can set “advanced” configuration.

An interesting configuration is the reverse lookup. Depending of your network, the DNS resolution may not work. By default, Jetty will try to do reverse DNS resolution, and if you can’t use a DNS server on the machine, you may encounter “bad response time”, because you will have to wait the timeout for each DNS lookup.
So, in that case, it makes sense to disable the reverse lookup. You can disable reverse lookup per Jetty connector, using the etc/jetty.xml and adding the resolveNames option on the connector:

  <Call name="addConnector">
    <Arg>
      <New class="org.eclipse.jetty.server.nio.SelectChannelConnector">
        <Set name="host"><Property name="jetty.host" /></Set>
        <Set name="port"><Property name="jetty.port" default="8040"/></Set>
        <Set name="maxIdleTime">300000</Set>
        <Set name="Acceptors">2</Set>
        <Set name="statsOn">false</Set>
        <Set name="confidentialPort">8443</Set>
        <Set name="lowResourcesConnections">20000</Set>
        <Set name="lowResourcesMaxIdleTime">5000</Set>
        <Set name="resolveNames">false</Set>
      </New>
    </Arg>
  </Call>

Pax Logging: loggers log level

September 29, 2013 Posted by jbonofre

As you probably know, Apache Karaf uses Pax Logging as logging system.

Pax Logging is an OPS4j project (Open Participation Software 4 Java) which provide a fully OSGi compliant framework for logging. Pax Logging leverages a bunch of logging frameworks like slf4j, logback, log4j, avalong, etc. It gathers all the configuration and the actual logging mechanisms in a central way. It means that, in your applications/bundles, you can use slf4j or log4j, it doesn’t matter, behind the hood you will use Pax Logging.

Karaf provides a bunch of shell commands and MBean for logging:

  • log:display to see the log
  • log:display-exception to see only the exceptions
  • log:tail to display and “follow on the fly” the log
  • log:set to change the log level of a particular logger (or the rootLogger)
  • log:get to get the current log level of a particular logger (or the rootLogger)

The default configuration is a log4j configuration described in etc/org.ops4j.pax.logging.cfg. It’s where you especially define the loggers with the level and the appenders with the the conversion pattern.

However, sometimes, you may want to disable logging in a particular class or package. A typical example is when you use the Karaf webcontainer (provided by Pax Web), and you have a monitoring tool (like Naggios or Zabbix) which access to a URL in a “bad manner”. By “bad manner”, I mean that the monitoring tool send just a “ping” most of the time, not a complete valid HTTP request.

In that case, you may see “WARNING” messages in the log, coming from the Jetty web server. The messages look like:


22:25:20,948 | WARN | tp2029485198-177 | pse.jetty.servlet.ServletHandler 514 | 54 - org.eclipse.jetty.util - 7.6.7.v20120910 | /system/console/bundles
java.lang.reflect.UndeclaredThrowableException
    at org.ops4j.pax.web.service.internal.$Proxy10.service(Unknown Source)[71:org.ops4j.pax.web.pax-web-runtime:1.1.4]
    at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:652)[62:org.eclipse.jetty.servlet:7.6.7.v20120910]
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:447)[62:org.eclipse.jetty.servlet:7.6.7.v20120910]
...

As you know the source of this warn message, you may want to “increase” the log level to ERROR (to avoid to see WARN messages), or to completely disable the log messages coming from the Jetty ServletHandler.

To change the log level, in etc/org.ops4j.pax.logging.cfg, you can create a new logger dedicated to jetty, and define the log level for this logger:


log4j.logger.org.eclipse.jetty=ERROR

or you can completely disable the logging coming from the servlet handler:


log4j.logger.org.eclipse.jetty.servlet.ServletHandler=OFF

OFF is a “special” log level which disable the logging.

Another “use case” for this is about the sshd server embedded in Karaf. You may know that you can access to Karaf using a simple ssh client (OpenSSH on Unix, Putty on Windows, or the client provided with Karaf). By default, the Karaf sshd server log all session connections in DEBUG. So if you turn the rootLogger to DEBUG, you will see a lot of “noise” in the log. So, it makes sense to change the sshd server log level to INFO, just for the channel session:


log4j.logger.org.apache.mina.sshd.server.channel.ChannelSession=INFO

Apache Hadoop and Karaf, Article 1: Karaf as HDFS client

July 8, 2013 Posted by jbonofre

Maybe some of you remember that, a couple of months ago, I posted some messages on the Hadoop mailing list about OSGi support in Hadoop (http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201202.mbox/%3C4F3285F1.2000704@nanthrax.net%3E).

In order to move forward on this topic, instead of an important refactoring, I started to work on standalone and atomic bundles that we can deploy in Karaf. The purpose is to avoid to change Hadoop core, but provides a good Hadoop support directly in Karaf.

I worked on Hadoop trunk (3.0.0-SNAPSHOT) and prepared patches (https://issues.apache.org/jira/browse/HADOOP-9706).

I also deployed bundles on my Maven repository to give users the possibility to directly deploy karaf-hadoop in a running Karaf instance.

The purpose is to explain what you can do, the values about this, and maybe you will vote to “include” it in Hadoop directly 😉

To explain exactly what you can do, I prepared a serie of blog posts:

  • Article 1: Karaf as HDFS client. This is this first post. We will see the hadoop-karaf bundle installation, the hadoop and hdfs Karaf shell commands, and how you can use HDFS to store bundles or features using the HDFS URL handler.
  • Article 2: Karaf as MapReduce job client. We will see how to run MapReduce jobs directly from Karaf, and the “hot-deploy-and-run” of MapReduce jobs using the Hadoop deployer.
  • Article 3: Exposing Hadoop, HDFS, Yarn, and MapReduce features as OSGi services. We will see how to use Hadoop features programmatically thanks to OSGi services.
  • Article 4: Karaf as a HDFS datanode (and eventually namenode). Here, more than using Karaf as a simple HDFS client, Karaf will be part of HDFS acting as a datanode, and/or namenode.
  • Article 5: Karaf, Camel, Hadoop all together. In this article, we will use the Hadoop OSGi services now available in Karaf inside Camel routes (plus the camel-hdfs component).
  • Article 6: Karaf as complete Hadoop container. I will explain here what I did in Hadoop to add a complete support of OSGi and Karaf.

Karaf as HDFS client

Just a reminder about HDFS (Hadoop Distributed FileSystem).

HDFS is composed by:
– a namenode hosting the metadata of the filesystem (directories, blocks location, file permissions or modes, …). There is only one namenode per HDFS, and the metadata are stored in memory by default.
– a set of datanode hosting the file blocks. Files are composed by blocks (like in all filesystems). The blocks are located on different datanodes. The blocks can be replicated.

A HDFS client connects to the namenode to execute actions on the filesystem (ls, rm, mkdir, cat, …).

Preparing HDFS

The first step is to set up the HDFS filesystem.

I gonna use a “pseudo-cluster”: a HDFS with the namenode and only one datanode on a single machine.
To do so, I configure the $HADOOP_INSTALL/etc/hadoop/core-site.xml file like this:


<configuration>

  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost/</value>
  </property>

</configuration>

For a pseudo-cluster, we setup only one replica per block (as we have only one datanode) in the $HADOOP_INSTALL/etc/hadoop/hdfs-site.xml file:

<configuration>

  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>

</configuration>

Now, we can format the namenode:


$HADOOP_INSTALL/bin/hdfs namenode -format

and start the HDFS (both namenode and datanode):


$HADOOP_INSTALL/sbin/start-dfs.sh

Now, we can connect to the HDFS and create a first folder:


$HADOOP_INSTALL/bin/hadoop fs -mkdir /bundles
$HADOOP_INSTALL/bin/hadoop fs -ls /
Found 1 items
drwxr-xr-x - jbonofre supergroup 0 2013-07-07 22:18 /bundles

Our HDFS is up and running.

Configuration and installation of hadoop-karaf

I created the hadoop-karaf bundle as standalone. It means that it embeds a lot of dependencies internally (directly in the bundle classloader).

The purpose is to:

  1. avoid to alter anything in Hadoop core. Thanks to this approach, I can provide hadoop-karaf bundle for different Hadoop versions, and I don’t need to alter Hadoop itself.
  2. ship all dependencies in the same bundle classloader. Of course it’s not ideal in term of OSGi, but to provide a very easy and ready to use bundle, I gather most of dependencies in the hadoop-karaf bundle.

I worked on trunk directly (for now, if you are interested I can provide hadoop-karaf for existing Hadoop releases): Hadoop 3.0.0-SNAPSHOT.

Before deploying the hadoop-karaf bundle, we have to prepare the Hadoop configuration. In order to be integrated in Karaf, I implemented a mechanism to create and populate the Hadoop configuration from OSGi ConfigAdmin.
The only requirement for the user is to create a org.apache.hadoop PID in the Karaf etc folder containing the Hadoop properties. Actually, it means to just create a $KARAF_INSTALL/etc/org.apache.hadoop.cfg file containing:


fs.default.name = hdfs://localhost/

If you don’t want to compile hadoop-karaf bundle yourself, you can use the artifact that I deployed on my Maven repository (http://maven.nanthrax.net/org/apache/hadoop/hadoop-karaf/3.0.0-SNAPSHOT/hadoop-karaf-3.0.0-20130708.050912-1.jar).

To do this, you have to edit my Maven repository in etc/org.ops4j.pax.url.mvn.cfg and add my repository in the org.ops4j.pax.url.mvn.repositories property:


org.ops4j.pax.url.mvn.repositories = \
  http://maven.nanthrax.net/@snapshots@id=maven, \
  http://repo1.maven.org/maven2@id=central, \
  ...

Now, we can start Karaf as usual:


$KARAF_INSTALL/bin/karaf

NB: I use Karaf 2.3.1.

We can now install the hadoop-karaf bundle:


karaf@root> osgi:install -s mvn:org.apache.hadoop/hadoop-karaf/3.0.0-SNAPSHOT
karaf@root> la|grep -i hadoop
[ 54] [Active ] [Created ] [ 80] Apache Hadoop Karaf (3.0.0.SNAPSHOT)

hadoop:* and hdfs:* commands

The hadoop-karaf bundle comes with new Karaf shell commands.

For this first blog post, we are going to use only one command: hadoop:fs.

The hadoop:fs command allow you to use a HDFS directly in Karaf (it’s a wrapper to hadoop -fs):


karaf@root> hadoop:fs -ls /
Found 1 items
drwxr-xr-x - jbonofre supergroup 0 2013-07-07 22:18 /bundles
karaf@root> hadoop:fs -df
Filesystem Size Used Available Use%
hdfs://localhost 5250875392 307200 4976799744 0%

HDFS URL handler

Another thing provided by the hadoop-karaf bundle is an URL handler to support directly hdfs URL.

It means that you can use hdfs URL in Karaf commands, as osgi:install, features:addurl, ….

It also means that you can use HDFS to store your Karaf bundles, features, or configuration files.

For instance, we can copy an OSGi bundle in the HDFS:


$HADOOP_INSTALL/bin/hadoop fs -copyFromLocal ~/.m2/repository/org/apache/servicemix/bundles/org.apache.servicemix.bundles.commons-lang/2.4_6/org.apache.servicemix.bundles.commons-lang-2.4_6.jar /bundles/org.apache.servicemix.bundles.commons-lang-2.4_6.jar

The commons-lang bundle is now available in the HDFS. We can check that directly in Karaf using the hadoop:fs command:


karaf@root> hadoop:fs -ls /bundles
Found 1 items
-rw-r--r-- 1 jbonofre supergroup 272039 2013-07-07 22:18 /bundles/org.apache.servicemix.bundles.commons-lang-2.4_6.jar

Now, we can install the commons-lang bundle in Karaf directly from HDFS, using a hdfs URL:


karaf@root> osgi:install hdfs:/bundles/org.apache.servicemix.bundles.commons-lang-2.4_6.jar
karaf@root> la|grep -i commons-lang
[ 55] [Installed ] [ ] [ 80] Apache ServiceMix :: Bundles :: commons-lang (2.4.0.6)

If we list the bundles location, we can the hdfs URL support:


karaf@root> la -l
...
[ 53] [Active ] [Created ] [ 30] mvn:org.apache.karaf.management.mbeans/org.apache.karaf.management.mbeans.dev/2.3.1
[ 54] [Active ] [Created ] [ 80] mvn:org.apache.hadoop/hadoop-karaf/3.0.0-SNAPSHOT
[ 55] [Installed ] [ ] [ 80] hdfs:/bundles/org.apache.servicemix.bundles.commons-lang-2.4_6.jar

Conclusion

This first blog post shows how to use Karaf as a HDFS client. The big advantage is that the hadoop-karaf bundle doesn’t change anything from Hadoop core, and so I can provide it for Hadoop 0.20.x, 1.x, 2.x, or trunk (3.0.0-SNAPSHOT).
In Article 3, you will see how to leverage directly HDFS as OSGi services (and so use in your bundles, Camel routes, …).

Again, if you think that this articles serie is interesting, and you would like to see the Karaf support in Hadoop, feel free to post a comment, a message on the Hadoop mailing list, and whatever to promote it 😉

Apache Karaf Cellar 2.3.0 released

May 24, 2013 Posted by jbonofre

The latest Cellar release (2.2.5) didn’t work with the new Karaf branch and release: 2.3.0.

If the first purpose of Cellar 2.3.0 is to be able to work with Karaf 2.3.x, actually, it’s more than that.

Let’s take a tour in the new Apache Karaf Cellar 2.3.0.

Apache Karaf 2.3.x support

Cellar 2.3.0 is fully compatible with Karaf 2.3.x branch.

Starting from Karaf 2.3.2, Cellar can be install “out of the box”.
If you want to use Cellar with Karaf 2.3.0 or Karaf 2.3.1, in order to avoid some Cellar bootstrap issue, you have to add the following property in etc/config.properties:


org.apache.aries.blueprint.synchronous=true

Upgrade to Hazelcast 2.5

As you may know, Cellar is clustered provision tool powered by Hazelcast.

We did a big jump: from Hazelcast 1.9 to Hazelcast 2.5.

Hazelcast 2.5 brings a lot of bug fixes and interesting new features. You can find more details here: http://www.hazelcast.com/docs/2.5/manual/multi_html/ch18s04.html.

In Cellar, all Hazelcast configuration is performed using an unique file: etc/hazelcast.xml.

Hazelcast 2.5 gives you more properties to configure your cluster, and the behaviors of the cluster events. The default configuration is largely enough for most use cases, but thanks to this Hazelcast version, you have the possibility to perform fine tuning.

More over, some new features are interesting for Cellar, especially:

  • IPv6 support
  • more complete backup support, when a node is disconnected from the cluster
  • better security and encryption support
  • higher tolerancy to connection failures
  • parallel IO support

Cluster groups persistence

In previous Cellar versions, the cluster groups were not store, and relay only on the cluster states. It means that it was possible to loose an existing cluster group if the group didn’t have any node.

Now, each node stores the cluster groups list, and its membership.

Like this, the cluster groups are persistent and we can restart the cluster, we won’t loose the “empty” cluster groups.

Cluster event producers, consumers, handlers status persistence

A Cellar node uses different components to manage cluster events:

  • the producer (one per node) is responsible to broadcast a cluster event to the other nodes
  • the consumer (one per node) receives cluster events and delegates the handling of the event to a handler
  • handlers (one per resource) handles a specific cluster events (features, bundles, etc) and update the node local states

The user has a complete control on producer, consumer, handlers. It means that it can stop or start the node producer, consumer, or handler.

The problem is that the current state of the producer/consumer/handler was not persistent. It means that a restart of the node will reset producer/consumer/handler to the default state (and not the previous one).
To avoid this issue, the producer/consumer/handler state is now persistent on the local node.

Smart synchronization

The synchronization of the different resources supported by Cellar is now better than before. Cellar now checks the local state of the node. Cellar checks a kind of diff between the local state and the state on the cluster. If the states differ, Cellar updates the local state as described on the cluster.

For the config especially, to avoid important CPU consumption, some properties are not considered during the synchronization because they are local to the node (for instance, service.factoryPid).

A new command has been introduced (cluster:sync) to “force” the synchronization of the local node with the cluster. It’s interesting when the node has been disconnected from the cluster, and you want to re-sync as soon as possible.

Improvement on Cellar Cloud support

My friend Achim (Achim Nierbeck) did a great job on the Cellar Cloud support.
First, he fixes some issues that we had on this module.

He gave a great demo during JAX: Integration In the Cloud With Camel, Karaf and Cellar.

Improvement on the cluster:* commands and MBeans

In order to be closer to the Karaf core commands, the cluster:* commands (and MBeans) now provide exactly the same options that you can find in the Karaf core commands.

And more is coming …

The first purpose of Cellar 2.3.0 is to provide a version ready to run on Karaf 2.3.x, and insure the stability. So I postponed some new features and improvements to Cellar 2.3.1.

In the mean time, I also released a new Cellar 2.2.6 release, containing mostly bug fixes (for the ones that still use Karaf 2.2.x with Cellar 2.2.x).

Load balancing with Apache Karaf Cellar, and mod_proxy_balancer

February 3, 2013 Posted by jbonofre

Thanks to Cellar, you can deploy your applications, CXF services, Camel routes, … on several Karaf nodes.

When you use Cellar with web applications, or CXF/HTTP endpoints, a “classic” need is to load balance the HTTP requests on the Karaf nodes.

You have different ways to do that:
– using Camel Load Balancer EIP: it’s an interesting EIP, working with any kind of endpoints. However, it requires to have a Karaf running the Load Balancer routes, so it’s not always possible depending of the user security policy (for instance, putting it in DMZ or so)
– using hardware appliances like F5, Juniper, Cisco: it’s a very good solution, “classic” solution in network teams. However, it requires expensive hardwares, not easy to buy and setup for test or “small” solution.
– using Apache httpd with mod_proxy_balancer: it’s the solution that I’m going to detail. It’s a very stable solution, powerful and easy to setup. And it costs nothing 😉

For instance, you have three Karaf nodes, exposing the following services and the hostname:
– http://192.168.134.3:8040/services
– http://192.168.134.4:8040/services
– http://192.168.134.5:8040/services

We want to load balance those three nodes.

On a dedicated server (it could be installed on one hosting Karaf), we just install Apache httpd:


# on Debian/Ubuntu system
aptitude install apache2


# on RHEL/CentOS/Fedora system
yum install httpd
# enable network connect on httpd
/usr/sbin/setsebool -P httpd_can_network_connect 1

Apache httpd comes with mod_proxy, mod_proxy_http, and mod_proxy_balancer modules. Just check if those modules are loaded in the main httpd.conf.

You can now create a new configuration for your load balancer (directly in the main httpd.conf or by creating a conf file in etc/httpd/conf.d):


<Proxy balancer://mycluster>
  BalancerMember http://192.168.134.3:8040
  BalancerMember http://192.168.134.4:8040
  BalancerMember http://192.168.134.5:8040
</Proxy>
ProxyPass /services balancer://mycluster

The load balancer will proxy the /services requests to the different Karaf nodes.

By default, the mod_proxy_balancer module uses a byrequests algorithm: all nodes will receive the same number of requests.
You can switch to bytraffic (using the lbmethod=bytraffic in the proxy configuration): in that case, all nodes will receive the same amount of traffic (by KB).

The mod_proxy_balancer module is able to support session “affinity” if your application needs it.
When a request is proxied to some back-end, then all following requests from the same user should be proxied to the same back-end.
For instance, you can use the cookie in header to define the session affinity:


Header add Set-Cookie "ROUTEID=.%{BALANCER_WORKER_ROUTE}e; path=/" env=BALANCER_ROUTE_CHANGED
<Proxy balancer://mycluster>
  BalancerMember http://192.168.134.3:8040 route=1
  BalancerMember http://192.168.134.4:8040 route=2
ProxySet stickysession=ROUTEID
</Proxy>
ProxyPass /myapp balancer://mycluster

The mod_proxy_balancer module also provide a web manager allowing you to see if your Karaf nodes are up or not, the number of requests received by each node, and the current lbmethod in use.

To enable this balancer manager, you just have to add a dedicated handler:


<Location /balancer-manager>
  SetHandler balancer-manager
  Order allow,deny
  Allow from all
</Location>

Point your browser to http://host/balancer-manager and you will see the manager page.

You can find more information about mod_proxy_balancer here: http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html.

Apache httpd with mod_proxy_balancer is an easy and good HTTP load balancer solution in front of Karaf and Cellar.

Multiple HTTP connectors in Apache Karaf

February 3, 2013 Posted by jbonofre

Installing the http feature in Karaf leverages Pax Web to embed a Jetty webcontainer.

By default, Karaf create a Jetty connector on the 8181 http port (and 8443 for https). You can change this port number by providing etc/org.ops4j.pax.web.cfg file.

But, you can also create new connector in the embedded Jetty.

You may see several advantages for multiple connectors:

  • you can isolate a set of applications, CXF services, Camel routes on a dedicated port number
  • you can setup a different configuration for each connector. For instance, you can create two SSL connectors, each with a different keystore, truststore, …

You can find etc/jetty.xml configuration file where you can create custom Jetty configuration.

NB: if you want to have both etc/org.ops4j.pax.web.cfg and etc/jetty.xmll, don’t forget to reference jetty.xml in org.ops4j.pax.web.cfg using the org.ops4j.pax.web.config.file property pointing to the jetty.xml, for instance:


# in etc/org.ops4j.pax.web.cfg
org.ops4j.pax.web.config.file=${karaf.home}/etc/jetty.xml

To configure a new connector, you can add a addConnector call in this configuration. For instance, we can create a new connector on 9191 http port number (and 9443 https port number):


  <Call name="addConnector">
    <Arg>
      <New class="org.eclipse.jetty.server.nio.SelectChannelConnector">
        <Set name="host">0.0.0.0</Set>
        <Set name="port">9191</Set>
        <Set name="maxIdleTime">300000</Set>
        <Set name="Acceptors">1</Set>
        <Set name="statsOn">false</Set>
        <Set name="confidentialPort">9443</Set>
        <Set name="name">myConnector</Set>
      </New>
    </Arg>
  </Call>

Now, Karaf will listen on 8181 and 9191 (for http), 8443 and 9443 (for https).

You can also define a connector dedicated to https with dedicated configuration for this connection, especially keystore, truststore, and client authentication:


  <Call name="addConnector">
    <Arg>
      <New class="org.eclipse.jetty.server.ssl.SslSelectChannelConnector">
        <Set name="port">9443</Set>
        <Set name="maxIdleTime">30000</Set>
        <Set name="keystore">./etc/keystore</Set>
        <Set name="password">password</Set>
        <Set name="keyPassword">password</Set>
      </New>
    </Arg>
  </Call>

By default, the web application will be bind on all connectors. If you want that your web application use a specific connector, you have to define it in the MANIFEST using the following properties:


Web-Connectors: myConnector
Web-VirtualHosts: localhost

If you use CXF services or Camel routes, if you use a connetor hostname and port number in the endpoint, it will use the corresponding connector.

For instance, the following CXF endpoint of a Camel route will use myConnector:


...
  <cxf:cxfEndpoint id="cxfEndpoint" address="http://localhost:9191/services/myservice" wsdlUrl="..."/>
...

Karaf allows you a fine grained Jetty configuration. Karaf becomes a real complete WebContainer, with custom configuration on several connectors. It’s especially interesting for SSL connector where each connector can have a dedicated keystore and truststore, and client authentication configuration.