[Logo] Space4J - Java Persistence
  [Search] Search   [Recent Topics] Recent Topics   [Hottest Topics] Hottest Topics   [Members]  Member Listing   [Groups] Back to home page 
[Moderation Log] Moderation Log   [Register] Register / 
[Login] Login 
Messages posted by: saoj
Forum Index » Profile for saoj » Messages posted by saoj
Author Message

You get a OutOfMemoryError and you have to increase your heap size with the -Xms options or eventually your physical RAM. Google about Java Heap size command options.

Amount of RAM should not be a problem, unless you have a really huge database. In this case you should probably use a regular relational database.

-Sergio

Are all serializable objects being held in memory during the working with Space4j????


Yes, all objects are in memory, but they are no serialized in memory. They get serialized when they are logged to disk in a command or when a snapshot is taken.


I mean, let's suppose that I've added 1K of objects into space4j...My question is, are all of these objects in memory? Or does it mean that half of these objects (the most used) is in memory and the second half is saved on HD???


All objects are always in memory, even if they are stale objects that are not being accessed.

I thought about a *passivation* strategy for Space4J in the past, that would swap unused/old objects from memory to disk. But that complicates the whole thing and loses the focus of Space4J which is in-memory access straight through collections.

If you have a logging table that grows indefintely and the information is not accessed regularly, you will be better off with a text log or a relational database. To keep this information in memory would be just a waste of RAM.


1) All objects all serialized in the "./space4_db" directory. Is somehow possible to set the directory and file where the objects will be serialized?


Download the beta jar from http://www.space4j.org/beta/space4j.jar and use the SimpleLogger.setDir("c:\\mydb") for example. You can also use a relative path like SimpleLogger.setDir("mydb").


2) I want to use REGULAR index (unique-index, non-sorted), but I didn't find an example of how to get my stored value....Of course I checked an example on this forum

User u = usersById.get(2345);

but I don't understand one thing, what type of the object is usersById???? Can you provide full example where REGULAR index is used.


usersById in this example is the map which is returned by the Index objetct.



Once you have created the index object by calling im.createIndex(indx, space4j); you can get its map to perform the lookup:



Note that you need a key object. You can even have composite indexes with composite keys (more than one attribute).

Also note that is recommended that you store your objects in the space inside Maps not Lists. The reason for that is the same reason for always having a Primary Key in a database table.

I have been thinking about this and my bet is that the bottleneck is on disk I/O.

Disk I/O after a write (or a commit in relational DB) is pretty much unescapable, because you need to make sure data is persisted and safe in case of a crash.

I am wondering who is faster. A relational database doing multiple inserts and commits in sequence or Space4J.

If you are really concerned about this speed, you can perform the disk I/O asynchronously, but the price is clear: you will not be sure when your write is persisted and safe in disk. After a crash, you will end up losing some past sucessful writes.

So two questions remain:

1) Is Space4J as fast as any regular relational database when it comes to muliple and consective inserts/commits ?

2) Is there a scenario where doing async disk i/o is actually desirable considering its cost?

Let me know your thoughts...



Hi Fred!

I would not expect this slowliness. Now I am curious what is the bottleneck: I/O?

You would have to take that Space4J source code and benchmark the writing process, step by step, to find out where the bottleneck is.

-Sergio
xolotl wrote:
By the way, the fuss over simplistic designs like Prevayler or Space4j tends to obcure
the fact that there are fully-fledged transactional db engines designed to operate
in-memory although not in-process, an example that come to mind being TimesTen
(now an Oracle property). So you can have your cake an eat it, too.


I agree. It is not about speed anymore, I would think. It is about a simple way to perform reads straight from collections.

SQL, JDBC Driver, ORM, connection pooling may be too much for many apps.

Prevayler had a not so good attitude that it was the salvation of the world and that databases were evil.

This was a joke in my personal opinion. Database are awesome!

Space4J is just another option...
xolotl wrote:Can I modify my data while someone else is reading it? Yes! Does that mean that readers might see the ground shift under them, e.g., while they are iterating over a collection?


Good question. I am not an expert on java.util.concurrent collections. We would have to ask this question to Doug Lee. However I imagine that if you are iterating (reading) and someone else is modifying, the results are king of unpredictable. But you will NOT get a ConcurrentModificationException.

I wonder how relational database handles that situation... This topic is not clear to me, but I am willing to learn from the experts... It has to do with isolation level.



1. What do you mean? An exception when executing a command will not log this command. Can you explain better?

2. Any help implementing a generics layer will be highly appreciated. Generics is good stuff and Space4J should take advantage of it. Let me know if you want to commit something to the project about that.

3. Not that I know of. But the stress test should be a good indication that it is reliable under heavy load. Of course it can have hidden bugs, but a bug does not exist until it introduces itself in a real-case scenario.

agile.guo wrote:
In my application,there will be many indexes to be created,and some indexes should be created dynamically,is there any question?


No problem at all. Indexes can be created dynamic. Just like with any relational database, the indexation process can take long if your dataset is large. I would not worry for anything < 500k.

If you are indexing a huge dataset, it is better to do it offline, with your system off, to avoid write locks. Same problem with any relational database.


And how should I can calculate the space of indexes?


Not easily. Java will not easily tell you the ammount of memory an object is taking. There are hacks to do this. Google it and if you get a good solution please share with me.

findUsers iterates through all the found items before it returns the result, which is in turn iterated again.

See the problem?

A better approach instead of iterating through all elements and sticking them in a new list object would be to create a custom iterator so that the findUsers method would return this iterator.

This is not a problem for small results, but for big results (>100k) this is not a good idea.

Did you get it?

To summarize: findUsers return a collection that we just iterate through. Returning a iterator would be more appropriate for large subsets.

But, if you assume that you want a Collection with the returned elements, then this implemenation is appropriate.

Sorry for the late response.

Yes you can use Space4J with your license. No problem...

See the PhoneBook example to have an idea how you can easily persist objects with Space4J. Then decide if you want to follow that approach instead of the standard relational database approach.

-Sergio
We know that it is impossible to abandon your good and old relational database. You will probably need to do offline reports and heavy queries and a relational database is perfect for that. So what you need is to replicate all your Space4J commands to a relational database. To do this you can implement the executeSQL method for your commands so that you can later read the logs and execute them against a relational database.

Check the method below from the Command interface:



ToDo: Create a standalone program that read the logs and execute them against a JDBC connection.
Clustering can be useful for load-balance and fault-tolerance. It also allows a snapshot to be taken without placing the whole system in read-only mode. That's because you can use a slave to take snapshot while the rest of the nodes continue to run like nothing as happened, in other words, only the slave taking the snapshot needs to enter read-only mode.

To create a Space4J cluster is very easy and simple. Refer to the PhoneBookCluster.java program for an example. We also have a stress program to test the cluster feature.

The source code for PhoneBookCluster.java is below:




When your collection becomes too big, even in-memory searches can become expensive. Imagine a Java collection with 1 million users and you need to find all users with the last name equals to "Smith". Iterating through all elements is clearly not an option. What you need is INDEXES.

Space4J supports 4 different types of indexes through maps, in other words, when you create an index a new Java Map is created to store the index.

The four types of index are:

Index.TYPE.REGULAR = Unique index non-sorted. This is used to get an user by its id for example, where each user has an unique id.


Index.TYPE.SORTED = Unique index sorted. This is used to get a list of users in a range of ids (ex: 4000 < id < 5000). Also used when you need to fetch users sorted by the indexed field.


Index.TYPE.MULTI = Non-unique index non-sorted. This is used to get all users with age equal to 23.


Index.TYPE.MULT_SORTED = Non-unique index sorted. This is used to get all users with age >= 23 and age <= 40. Also used when you need to fech users sorted by the indexed field.


The PhoneBookIndexing.java program provides examples of how to create and user indexes for your collections in memory. Below is the source code:

To stress test Space4J's cluster implementation, let's place a master node and a slave node under stress doing many reads and updates. Each node will have a predefined number of threads doing insert, delete, search and iteration at the same time in the Java collection in memory. A snapshot will also be taken from time to time by the slave node.

You can download the PhoneBookClusterStress.java program by clicking here. The source code is also listed below. (Note that to compile and run this problem you must include the space4.jar in the classpath)

To run this example you need to open two shell/dos windows, one for the master node and one for the slave node. You can also run each node in a separate machine over the network.

The arguments you must pass to the each program are:

-master or -slave = to indicate if this node is the master or the slave
IP address of the master
number of insert threads
number of delete threads
number of select threads
number of search threads
thread sleep time = how long each thread will sleep after each action
collection initial size = how many elements should the collection has before the threads start
snapshot time = how often should we take a snapshot

After executing the programs for some time, hit CTRL+C to print the statistics. Here is an example:

Master Program:

Slave Program:

The source code of the PhoneBookClusterStress.java program:


 
Forum Index » Profile for saoj » Messages posted by saoj
Go to:   
Powered by JForum 2.1.8 © JForum Team