Hi Jonathan,
I'm currently building a social website for a company that foresees a population of 500000 users in the system. The system will be hosted in the cloud by Amazon Web Service. According to my calculation the size of the "database" will grow to 8 GB. I'm really interested by Space4J to carry out the persistence. Do you know/have any experience feedback with this kind of system ?
This is an area where Space4J can be used to solve some problems. For example: If you have to transverse the friends graph to find out the friends of friends of friends of friends that can be heavy on a database, but if you have the graph of friends in memory then you are just follow object references.
Problem: If you have too many users you may run out of memory RAM space. You have to think about that. Plus you have to take into account the size of the indexes as well.
Can index be built for a map that already have datas ? If yes, what is the expected behavior : access to index map is blocking ? Is there a way to be notified when the index is ready ?
Yes. That happens exactly like it happens on a relational database when you index an existing table. Access is blocked until the indexation is finished. The method that performs the indexation is blocked until the indexation is finished, in other words, it is a synchronous operation. This is a blocking UPDATE on Space4J (see below).
When i get an object from Space4j is it a copy ? If no, how does it work when two threads want to update the same object ? What is the good practice ?
The object is not a copy, is the object itself. The Space4J concept is only possible because UPDATES are serialized, in other words, they are atomic, isolated, happening one at a time. No two UPDATES are ever executed concurrently. Now when you talk about READS, then they are executed concurrently, thanks to the new Java 1.6 concurrent collections. Space4J is like Oracle: Updates only block updates. Reader don't block or get blocked by anything.
synchronous replication across the cluster supported ?
Space4J has a cluster ring implemented on itself. To make a cluster you need to do nothing. The good thing about a cluster, besides fault-tolerance and load balance, is that it allows a snapshot of the data to be taken without putting the whole system in read-only mode. Only one cluster node would have to enter read only mode and the whole system would continue to work normally while the snapshot is being taken on that cluster node. You can see a cluster example here: http://s4j.mentaframework.org/posts/list/6.page