P2P Storage SystemPresented by:Aakash Therani, Ankit Jasuja & Manish Shah

What is a P2P storage system? Peer-to-Peer(P2P) storage systems leverage the combinedstorage capacity of a network of storage devices(peers)contributed typically by autonomous end-users as acommon pool of storage space to store and share content. Applications Distributed file systems Content sharing Back-up & archival storage Peer data management systems

What is a P2P storage system? Peer-to-Peer(P2P) storage systems leverage the combinedstorage capacity of a network of storage devices(peers)contributed typically by autonomous end-users as acommon pool of storage space to store and share content. Applications Distributed file systems Content sharing Back-up & archival storage Peer data management systems

Designing P2P Storage SystemsFactors to keep in mind while designing p2p storage systems Persistent Storage Availability- in the presence of network partitions Durability- against failure and attack Security Issues Access control Protection against content pollution Transactions Concurrency Control Fault Tolerance

Cloud Storage v/s P2P Storage When data is stored at server clusters within the internet, this kindof data storage is referred to as cloud storage. Cloud Storage Products Amazon S3- Amazon S3 (Simple Storage Service) is a web service that offers cloudstorage through a simple HTTP-based interface. Dropbox- Dropbox is a cloud storage provider and file synchronization tool usingthe Amazon S3 storage facility as a back-end. When relying on the members of a group storing each other’sdata, it is called peer-to-peer (p2p) storage. P2P Storage Products Wuala- Wuala [37] is a commercial, distributed storage service that allows usersto trade storage capacity in a P2P way

Classification of Storage ProductsProducts can be classified based on the types of storage needs:1) Backup- Using the service as a backup facility for files stored locally on a computer(which is part of the peer network). This may involve keeping track of versions of files,as they change over time.2) File Synchronization- Keeping the same file tree that exists on a number ofdifferent computers in sync. When one file is changed on one computer, the copy ofthat file on the other computers is automatically updated. This type of functionalitymust deal with conflicts, e.g., in case the same file is changed on multiple computers atthe same time.3) Distributed file system-The online storage capacity is used to implement adistributed file system. One or more computers access the storage in a manner that isvery similar to local file systems .4) Content Sharing-Parts of the file tree stored online are used to share data withother people. By providing credentials to others, they can use the storage facility to readthe part of the tree they were granted access to .

OceanStore: An Architecture for Global-ScalePersistent Storage

OceanStore: A True Data Utility Utility model: consumers pay a monthly fee in exchange foraccess to persistent storage Highly available data from anywhere Automatic replication for disaster recovery Strong security Providers would buy and sell capacity among themselves formobile users Deep archival storage: use excess of storage space to ease datamanagement

Ubiquitous Computing

Two Unique Goals1) Ability to be constructed from an untrusted infrastructure Servers may crash without warningAll information entering the infrastructure must be encryptedServers participate in protocols for distributed consistencymanagement2) Support for Nomadic Data Locality is of utmost importancePromiscuous Caching: Data can be cached anywhere, anytimeContinuous introspective monitoring to manage caching &locality

System Overview Persistent object: The fundamental unit in OceanStore Each object is named by a Globally Unique Identifier (GUID) Objects are replicated and stored on multiple servers Floating replicas: Replicas are independent of the server Two mechanisms to locate a replica1) A fast, probabilistic algorithm to find the object near therequesting machine2) If (1) fails, then it is located through a slower,deterministic algorithm

Underlying Technologies Naming Access Control Data Location and Routing Data Update Deep Archival Storage Introspection

NamingGUID: psuedo-random, fixed-length bit string Decentralized & resistant to attempts by adversaries Self-certifying path names GUID hash(owner’s key, filename)GUID of a server is a secure hash of its keyGUID of a data fragment is a secure hash of the data content

Access ControlOceanStore supports two primitive types of access controls1) Reader Restriction Encrypt non-public data and distribute the key to userswith read access Problem: There is no way to make a reader forget whathe has read2) Writer Restriction Through ACLs specified for each object by its owner Each user has a signing key, ACLs use that key forgranting accessNote: Reads are restricted at clients via key distribution, whilewrites are restricted at servers by ignoring unauthorized updates

Data Location and Routing Objects can reside on any of the OceanStore servers Use query routing to locate objects Every object is identified by one or more GUIDs Different replicas of the same object has the same GUID OceanStore messages are labeled with A destination GUID (built on top of IP) A random number A small predicate

Distributed Routing in OceanStore Routing is a two phase process. Data location and routing combined Advantage being we avoid multiple round trip time Routing itself is 2 tiered Fast probabilistic algorithm and slow reliable hierarchical method.

Bloom Filters Based on the idea of hill-climbing If a query cannot be satisfied by a server, local information is useto route the query to a likely neighbor- Via a modified version of a Bloom filter

Attenuated Bloom Filters An attenuated Bloom filter of depth D is an array of D normalBloom filters ith Bloom filter is the union of all the Bloom filters for all of thenodes at a distance i One filter per network edge

Attenuated Bloom Filters Lookup 11010

The Global Algorithm: Wide-Scale Distributed DataLocation Plaxton’s randomized hierarchical distributed data structure Resolve one digit of the node id at a time Links form a series of random embedded trees, with each node asthe root of one of these trees. Neighbor links can be used to route from anywhere to a given node If information about the GUID (such as its location) were stored atits root, then anyone could find this information simply by followingneighbor links until they reached the root node for the GUID.

The Global Algorithm: Wide-Scale Distributed DataLocation

Achieving Locality When a replica is placed somewhere in the system, its location is“published” to the routing infrastructure. The publishing process works its way to the object’s root anddeposits a pointer at every hop alongthe way. Each new replica only needs to traverse O(log(n)) hops to reach theroot, where n is the number of the servers When someone searches for information, they climb the tree untilthey run into a pointer, after which they route directly to the object.

Achieving Fault Tolerance Avoid failures at roots Each root GUID is hashed with a small number of different saltvalues Make it difficult to target a single GUID for DoS attacks If failures are detected, just jump to any node to reach the root OceanStore continually monitors and repairs broken pointers

Advantages of Distributed Information Redundant paths to roots Scalable with a combination of probabilistic and global algorithms Easy to locate and recover failed components Plaxton links form a natural substrate for admission controls andmulticasts

Achieving Maintenance-Free Operation Recursive node insertion and removal Replicated roots Use beacons to detect faults Time-to-live fields to update routes Second-chance algorithm to avoid false diagnoses of failedcomponents Avoid the cost of recovering lost nodes Automatic reconstruction of data for failed servers

Update: Format and Semantics An update: a list of predicates associated with actions A set of predicates is evaluated in order The actions of the earliest true predicate are atomically applied Update is logged if it commits or aborts. Predicates: compare-version, compare-block, compare-size, search Actions: replace-block, insert-block, delete-block, append

Serializing Updates in an UntrustedInfrastructure Use a small primary tier of replicas to serialize updates Runs Byzantine agreement protocol Minimize communication Meanwhile, a secondary tier of replicas optimistically propagateupdates among themselves Final ordering from primary tier is multicasted to secondaryreplicas

UpdatePath of an update:a)After generating an update, a client sends it directly to the object’s inner ringb)While inner ring performs a Byzantine agreement to commit the update, secondary nodes propagatethe update among themselvesc)The result of update is multicast down the dissemination tree to all secondary nodes

The Full Update Path

Update commitment Fault tolerance: Guarantees fault tolerance if less than one third of the serversin the inner ring is malicious Secondary nodes do not participate in the Byzantine protocol,but receive consistency information

A Direct Path to Clients and Archival Storage Updates flow directly from a client to the primary tier, where theyare serialized and then multicast to the secondary servers down thedissemination tree Updates are tightly coupled with archival Archival fragments are generated at serialization time, signed,encoded and distributed with updates

Deep Archival Storage Data is fragmented Each fragment is an object Erasure coding is used to increase reliability Administrative domains are ranked by their reliability andtrustworthiness Avoid locations with correlated failures

Erasure Codes Erasure coding is a process that treats input data as a series offragments (say n) and transforms these fragments into a greaternumber of fragments (say 2nor 4n)nMessageEncoding AlgorithmcnEncodingTransmission nReceivedDecoding AlgorithmnMessage

Introspectioncomputationoptimizationobservation Observation modules monitor the activity of a running system andtrack system behavior Optimization modules adjust the computation

IntrospectionEvent handlers summarizes local events. These summaries arestored in a database. The information in the database is periodicallyanalyzed and necessary actions are taken. A summary is sent toother nodes.

Uses of Introspection Cluster recognition Identify related files Replica management Adjust replication factors Migrate floating replicas

Introspection If a replica becomes unavailable: Clients will receive service from a more distant replica This produces extra load on distant replicas Introspective mechanism detects this and new replicas arecreated Above actions provide fault tolerance and automatic repair

Applications Groupware applications Personal information management tools Email Contact lists Calendars Distributed design tools

ConclusionDifferent from other systems : Utility model Untrusted infrastructure Truly nomadic data Use of introspection Prevention of denial of service attacks Rapid response to regional outages Analysis of access patterns

Dynamo: Amazon’s Highly Available Keyvalue Store

MotivationBuild a distributed storage system: Scale Simple: key-value Highly available Guarantee Service Level Agreements (SLA)Service Level Agreements (SLA) Application can deliver its functionality in abounded time: Everydependency in the platform needs to deliver its functionality witheven tighter bounds. Example: service guaranteeing that it will provide a response within300ms for 99.9% of its requests for a peak client load of 500 requestsper second.

Design Consideration1) Sacrifice strong consistency for availability2) Conflict resolution is executed during read instead of write, i.e.“always writeable”.3) Other principles: Incremental scalability. Symmetry. Decentralization. Heterogeneity.

Partition Algorithm Consistent hashing: the output range of a hash function is treatedas a fixed circular space or ring. Virtual Nodes: Each node can be responsible for more than onevirtual node. Advantages of using virtual nodes If a node becomes unavailable the load handled by this node isevenly dispersed across the remaining available nodes. When a node becomes available again, the newly available nodeaccepts a roughly equivalent amount of load from each of the otheravailable nodes. The number of virtual nodes that a node is responsible can decidedbased on its capacity, accounting for heterogeneity in the physicalinfrastructure.

Data Versioning & Vector Clock A put() call may return to its caller before the update has been applied atall the replicas A get() call may return many versions of the same object. Challenge: an object having distinct version sub-histories, which thesystem will need to reconcile in the future. Solution: uses vector clocks in order to capture causality betweendifferent versions of the same object. A vector clock is a list of (node, counter) pairs. Every version of every object is associated with one vector clock. If the counters on the first object’s clock are less-than-or-equal to all ofthe nodes in the second clock, then the first is an ancestor of the secondand can be forgotten.

Execution1)2)3)Read / Write request on a key Arrives at a node (coordinator) Ideally the node responsible for the particular key Else forwards request to the node responsible for that key and that node will becomethe coordinator The first N healthy and distinct nodes following the key position are considered for therequest Quorums are used R – Read Quorum W – Write Quorum R W NWrites Requires generation of a new vector clock by coordinator Coordinator writes locally Forwards to N nodes, if W-1 respond then the write was successfulReads Forwards to N nodes, if R-1 respond then forwards to user Only unique responses forwarded User handles merging if multiple versions exist

FreeNet: A Distributed AnonymousInformation Storage and Retrieval System

FreeNetIntroduction:P2P network for anonymous publishing and retrieval of data Decentralized Nodes collaborate in storage and routing Data centric routing Adapts to demands Addresses privacy & availability concernsFeatures: Anonymity for producers and consumers Deniability for information stores Resistance to denial attacks Efficient storing and routing Does NOT providePermanent file storageLoad balancingAnonymity for general n/w usage

ArchitectureRequest:1.Key2.Hops to live3.ID4.DepthEach node – local data store routingtableRequest file through locationindependent keysRouting - chain of proxy requests decision is localGraph structure actively evolves overtime

Keys and SearchingProblems with SSK - updating, versioningContent Hash Keys (CHK)Encrypted by a random encryption keyPublish CHK decryption keyCHK SSK easily updateable files2 step process – publish file, publish pointerResults in pointers to newer versionOlder versions accessed thru CHKCan be used for splitting files

File retrievingcabfedLocation of keys: Hypertext spider Indirect files – published with KSK ofsearch words Publish bookmarksFile retrievalRequest forwarded to node in RT withclosest lexicographic match for the binarykeyRequest routing follows steepest-ascenthill climbing: first choice failure backtrack second choiceTimers, hops - curtail request threadsFiles cached all along the retrieval pathSelf-reinforcing cycle – results in key expertise

Data Management Finite data stores - nodes resort to LRU Routing table entries linger after data eviction Outdated (or unpopular) docs disappear automatically Bipartite eviction – short term policy New files replace most recent files Prevents established files being evicted by attacks

Protocol and SecurityPROTOCOL Nodes with frequently changing IPs use ARKs Return address specified in requests – threat? Messages do not always terminate when hops-to-live reaches 1 Depth is initialized by original requestor to arbitrarily small value Request state maintained at each node – timers – LRUSECURITY File integrity - KSK vulnerable to dictionary attacks DOS attacks – Hash Cash to slow down Attempts to displace valid files are constrained by the insertprocedure

Thank You.!!!

(which is part of the peer network). This may involve keeping track of versions of files, as they change over time. 2) File Synchronization-Keeping the same file tree that exists on a number of different computers in sync. When one file is changed on one computer, the copy of that file