Big, distributed, and fast: Ehcache sucks up search
Java for the NoSQL generation
Open sourcers running the Ehcache distributed Java cache can now search their data in near real-time by harnessing lashed-together servers.
Terracotta, which bought the Ehcache project in 2009, has released Ehcache 2.4. It features an API extension that lets you perform object-level queries of data held in memory. The API is backwards compatible, so should work on older versions of Ehcache.
According to Terracotta, this lets you avoid performance bottlenecks encountered when crunching large amounts of data on a single server. Also, in using general purpose caching with your existing servers, you can sidestep expensive hardware appliances that funnel data through hundreds of cores, terabytes of memory, and Infiniband.
Terracotta said its architecture targets people crunching terabytes of data, rather than petabytes, and it claims searches of 48 seconds can now be executed in just half a second. Because Ehcache is built Java, you can also build search queries using Java rather than build search queries using a different query language.
Terracotta said it's in talks with business-intelligence tools vendors to plug their tools into the API to enable more sophisticated slicing and dicing of data.
The Ehcache project is used in about 70 per cent of Java caching. The ability to query data held in memory using the system comes as big-data providers look for ways help customers make sense of the information quickly amassing in their big-data silos.
Customers have been deploying a host of NoSQL architectures to catch data because NoSQL is seen as faster and more scalable than SQL databases in large server farms and on large web sites.
But inevitably, people now want to query the information gathered - data like searches, personal updates, and Tweets - but the search tools have been lacking.
Last month, open-source BI vendor Jaspersoft announced its Native Reporting Big Data project to build connectors that can natively query data in NoSQL databases and other stores.
Jaspersoft's project currently offers connectors for NoSQL databases Cassandra, CouchDB, MongoDB, Riak, and Neo4j; the Hadoop and Infinispan data crunching frameworks; key-value store Redis; and massively parallel processing (MMP) analytic database Vertica bought by Hewlett Packard this week.
Terracotta says Ehcache 2.4 is different from the NoSQL stable becuase it offers "tried and tested" enterprise Java that fits into existing Java architectures with "strongly consistent" data across different nodes. "We see ourselves as a bridge between the traditional and the new," Terracotta chief executive Amit Pandey said. ®
"Java for the NoSQL generation"
What? Java created the NoSQL generation. Or rather, OOP created the NoSQL generation.
When you spend your college years working on smallish data sets in an OOP on a modern machine you get used to working in RAM. Occasionally you want to "persist" data.
As a result no one learns about RDBMS anymore. The uni kids have no idea of why atomicity, consistency, isolation and durability are useful and important so now it takes forever to get the kids to a useful stage.
Running the whole dataset in RAM
Because of *course* you'll always have enough RAM to do this.
That's a script kiddies view of the world.
WTF are they teaching on *proper* CS courses these days?
Since this is the WWW, I was surprised this article had not one link to either Terracotta or Ehcache.
The "read more" links at the bottom of the page that point to more resources on this site may be helpful but not what I was looking for.
Don't have to repeat the links over and over again in the article but the first paragraph should have links to the main subjects and links inserted as other subjects come up.