Hi all, We're currently using Quartz in a distributed environment. I've written my own JobStore that uses a combination of Zookeeper for distributed scheduling/locking and Cassandra for persistence of the job data. Quartz has proven to be overkill and frustrating for our needs in a distributed environment. The Spring TaskScheduler system is more in line with what we need. We essentially treat tasks as asynchronous scheduled jobs for processing high volumes of small data sets. Since I'm migrating our logic away from Quartz Spring Batch looks appealing, especially job steps etc. How difficult is it to implement your own execution environment, and are there any good examples?
Thanks,
Todd
How long is a piece of string? I've certainly seen quite a few custom execution environments, integrating with various platforms and tools, so I doubt if there are any barriers from the framework. Spring encourages you to write your own code, after all.
The standard approach to cluster synchronization in Spring Batch isn't that different from Quartz - we have a shared JobRepository with an out-of-the-box JDBC implementation. If that didn't work for you before you will have to do some work with Zookeeper and Cassandra again. There's no free lunch here. At least the Spring Batch APIs should be nicer to work with.
Hi Dave, Essentially I envision the implementation to be similar to our in house SEDA framework. We use a P2P architecture in our java application runtime nodes, and use ZK too coordinate peers and leader nodes. We write all data to Cassandra, but use Queues for signaling nodes when they have data they should process. This was worked very well for us and scales well since we only use ZK as a signaling mechanism in our cluster.
Given the step architecture of the Batch project, this seems like natural fit. I've gone through the examples, but I can't seem to find any documentation on creating my own middle tier. Is there any good developer documentation to explain the interfaces that will need implemented?
I'm assuming I need to implement the following from the *repository.dao package to persist the data to Cassandra
ExecutionContextDao
JobExecutionDao
JobInstanceDao
StepExecutionDaoWhich class will I implement to perform the step signaling and job queuing to nodes?
Thanks,
Todd
Those DAOs are the right ones for the SimpleJobRepository. I'm not sure how you expect transactions to work, but there will probably be restrictions to do with restartability in some failure modes. Also, one of the main responsibilities of the JobRepository is to prevent simultaneous executions of the same JobInstance. We do that in the JDBC implementation using transaction isolation, but you would probably have to co-ordinate through Zookeeper (I guess).
To launch a job you use the JobLauncher interface and the implementation we provide should be fine and well documented. You might find the JobService useful in Spring Batch Admin as well. The rest of the environment is up to you (i.e. there are no Spring Batch interfaces to do with queueing or scheduling - we see those as external concerns up to now at least).
Hi Dave, Thanks for the input. I've played a bit more, and I've encountered 2 major issues.
1. Using maven 2 with the latest release 2.1.6.RELEASE,, it causes issues with transitive dependencies in Spring 3.0.5.RELEASE and the dependency spring-beans. Is that latest batch not compatible with spring 3?2. I noticed that scheduling itself isn't actually covered by batch. This is where our biggest issues lies at the moment. With quartz and trigger/trigger groups the nodes spend more time waiting on locks to change trigger and group states than actually processing. Are there any examples of implementing batch with the TaskScheduler?
sprin...task-scheduler
This is probably the way we're going to move forward. This allows us to schedule tasks without the notion of quot;triggersquot; in quartz, and hence avoid the locking issues.
Thanks,
Todd
Loads of people use Spring 3 with Batch 2.1.*. What's the issue?
Spring Batch does not have a scheduler precisely because there is such good support for it elsewhere (in Spring 3 for instance), so you are definitely on the right track.
The Spring task scheduler does not have built in support for locking, but you need something that plays that role (I assume it's Zookeeper in your case). Quartz is wasteful of RDBMS resources, but at lease you can see why they do it. You will need to prevent two identical Job executions triggering from different nodes in your cluster at the same time, and if you use the JDBC JobRepository you will get that for free. Is it worth the price of ditching that without even trying it? |