Storage
Here you can find more technical information about the Storage component. This page is also a draft, a place to take notes about ideas and questions that arise. You can find some questions that were not answered yet.
When the client wants to store a file, the storage node will store the file in a multi-level directory based on the hash of the file. This is calculated by the storage node.
Example:
If the file hash is 749b772032659c62b866a7ea7253a87d, the directory where the file will be stored is A8/7D/<fileid>.
Public API
GET
The client API should call this method to retrieve a file.
Format:
http://<storagenode>/sGET?id=<fileid>&hash=<filehash>
Parameters and return value:
- Parameters: {id, hash}
- Return: file contents
Example:
!http://storagenode/sGET?id=123456&hash=749b772032659c62b866a7ea7253a87d
PUTB
The client API should call this method to put a file. This method is blocking, meaning that replication and availability is confirmed. In other words, after the client sends the file, the storage node will replicate the file to the other nodes (based on the replication policy) and after that it will return an OK to the client API or ERROR otherwise.
Format:
http://<storagenode>/sPUTB?id=<fileid>&hash=<hash>&size=<filesize>&nodes=<nodelist>
Parameters and return value:
- Parameters: {fileid, hash, size, nodelist}
- Return: {OK}
Example:
http://storagenode/sPUT?id=123456&hash=749b772032659c62b866a7ea7253a87d&size=234&nodes=a/b/c.
HTTP type is POST and the FORM should be multipart/form-data
PUTNB
The client API should call this method to put a file. This method is non-blocking, meaning that replication and availability is not confirmed at the end of the request. In other words, after the client sends the file, it will get an OK immediately. The replication process will start in background by the storage node..
Format:
http://<storagenode>/sPUTNB?id=<fileid>&hash=<hash>&size=<filesize>&nodes=<nodelist>
Parameters and return value:
- Parameters: {fileid, hash, size, nodelist}
- Return: {OK}
Example:
!http://storagenode/sPUTNB?id=123456&hash=749b772032659c62b866a7ea7253a87d&size=234&nodes=a/b/c.
HTTP type is POST and the FORM should be multipart/form-data
Replication
The replication is started by the primary storage node. The client API should send the file along with the information provided by Cerebrum. When the storage node receives a file it will replicate it to the other nodes (based on the replication policy) using Erlang internals (not the public API). If the replication is successful it will notify Cerebrum using Erlang internals (not the public API) that the file is online and replicated. It will then return an OK to the client API.
Question: what if the destination storage node is offline? Possible Answer: Keep a state per file entry (online, offline). When the replication takes place, it should change the state of the file to online when completed. If a storage node becomes offline, then the file entry will not be marked as online. Monitoring can look for offline files and initiate the replication when the storage node becomes back online