draft-ietf-nfsv4-rfc5664bis-01.txt   draft-ietf-nfsv4-rfc5664bis-02.txt 
NFSv4 B. Halevy NFSv4 B. Halevy
Internet-Draft Tonian Internet-Draft PrimaryData
Intended status: Standards Track B. Harrosh Intended status: Standards Track B. Harrosh
Expires: September 29, 2012 B. Welch Expires: April 8, 2014 B. Welch
Panasas Panasas
March 28, 2012 October 05, 2013
Object-Based Parallel NFS (pNFS) Operations Object-Based Parallel NFS (pNFS) Operations
draft-ietf-nfsv4-rfc5664bis-01 draft-ietf-nfsv4-rfc5664bis-02
Abstract Abstract
Parallel NFS (pNFS) extends Network File System version 4 (NFSv4) to Parallel NFS (pNFS) extends Network File System version 4 (NFSv4) to
allow clients to directly access file data on the storage used by the allow clients to directly access file data on the storage used by the
NFSv4 server. This ability to bypass the server for data access can NFSv4 server. This ability to bypass the server for data access can
increase both performance and parallelism, but requires additional increase both performance and parallelism, but requires additional
client functionality for data access, some of which is dependent on client functionality for data access, some of which is dependent on
the class of storage used, a.k.a. the Layout Type. The main pNFS the class of storage used, a.k.a. the Layout Type. The main pNFS
operations and data types in NFSv4 Minor version 1 specify a layout- operations and data types in NFSv4 Minor version 1 specify a layout-
type-independent layer; layout-type-specific information is conveyed type-independent layer; layout-type-specific information is conveyed
using opaque data structures whose internal structure is further using opaque data structures whose internal structure is further
defined by the particular layout type specification. This document defined by the particular layout type specification. This document
specifies the NFSv4.1 Object-Based pNFS Layout Type as a companion to specifies the NFSv4.1 Object-Based pNFS Layout Type as a companion to
the main NFSv4 Minor version 1 specification. the main NFSv4 Minor version 1 specification. This document has been
updated since the initial version to clarify and fix some of the
RAID-related computations so they match current implementations.
Status of this Memo Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on September 29, 2012. This Internet-Draft will expire on April 8, 2014.
Copyright Notice Copyright Notice
Copyright (c) 2012 IETF Trust and the persons identified as the Copyright (c) 2013 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4
1.2. Overview of Changes . . . . . . . . . . . . . . . . . . . 4
2. XDR Description of the Objects-Based Layout Protocol . . . . . 4 2. XDR Description of the Objects-Based Layout Protocol . . . . . 4
2.1. Code Components Licensing Notice . . . . . . . . . . . . . 5 2.1. Code Components Licensing Notice . . . . . . . . . . . . . 5
3. Basic Data Type Definitions . . . . . . . . . . . . . . . . . 6 3. Basic Data Type Definitions . . . . . . . . . . . . . . . . . 6
3.1. pnfs_osd_objid4 . . . . . . . . . . . . . . . . . . . . . 6 3.1. pnfs_osd_objid4 . . . . . . . . . . . . . . . . . . . . . 6
3.2. pnfs_osd_version4 . . . . . . . . . . . . . . . . . . . . 7 3.2. pnfs_osd_version4 . . . . . . . . . . . . . . . . . . . . 7
3.3. pnfs_osd_object_cred4 . . . . . . . . . . . . . . . . . . 8 3.3. pnfs_osd_object_cred4 . . . . . . . . . . . . . . . . . . 8
3.4. pnfs_osd_raid_algorithm4 . . . . . . . . . . . . . . . . . 9 3.4. pnfs_osd_raid_algorithm4 . . . . . . . . . . . . . . . . . 9
4. Object Storage Device Addressing and Discovery . . . . . . . . 9 4. Object Storage Device Addressing and Discovery . . . . . . . . 9
4.1. pnfs_osd_targetid_type4 . . . . . . . . . . . . . . . . . 10 4.1. pnfs_osd_targetid_type4 . . . . . . . . . . . . . . . . . 10
4.2. pnfs_osd_deviceaddr4 . . . . . . . . . . . . . . . . . . . 11 4.2. pnfs_osd_deviceaddr4 . . . . . . . . . . . . . . . . . . . 11
4.2.1. SCSI Target Identifier . . . . . . . . . . . . . . . . 11 4.2.1. SCSI Target Identifier . . . . . . . . . . . . . . . . 11
4.2.2. Device Network Address . . . . . . . . . . . . . . . . 12 4.2.2. Device Network Address . . . . . . . . . . . . . . . . 12
5. Object-Based Layout . . . . . . . . . . . . . . . . . . . . . 12 5. Object-Based Layout . . . . . . . . . . . . . . . . . . . . . 13
5.1. pnfs_osd_data_map4 . . . . . . . . . . . . . . . . . . . . 13 5.1. pnfs_osd_data_map4 . . . . . . . . . . . . . . . . . . . . 13
5.2. pnfs_osd_layout4 . . . . . . . . . . . . . . . . . . . . . 14 5.2. pnfs_osd_layout4 . . . . . . . . . . . . . . . . . . . . . 14
5.3. Data Mapping Schemes . . . . . . . . . . . . . . . . . . . 15 5.3. Data Mapping Schemes . . . . . . . . . . . . . . . . . . . 15
5.3.1. Simple Striping . . . . . . . . . . . . . . . . . . . 15 5.3.1. Simple Striping . . . . . . . . . . . . . . . . . . . 15
5.3.2. Nested Striping . . . . . . . . . . . . . . . . . . . 16 5.3.2. Nested Striping . . . . . . . . . . . . . . . . . . . 16
5.3.3. Mirroring . . . . . . . . . . . . . . . . . . . . . . 18 5.3.3. Mirroring . . . . . . . . . . . . . . . . . . . . . . 18
5.4. RAID Algorithms . . . . . . . . . . . . . . . . . . . . . 19 5.4. RAID Algorithms . . . . . . . . . . . . . . . . . . . . . 19
5.4.1. PNFS_OSD_RAID_0 . . . . . . . . . . . . . . . . . . . 19 5.4.1. PNFS_OSD_RAID_0 . . . . . . . . . . . . . . . . . . . 19
5.4.2. PNFS_OSD_RAID_4 . . . . . . . . . . . . . . . . . . . 19 5.4.2. PNFS_OSD_RAID_4 . . . . . . . . . . . . . . . . . . . 19
5.4.3. PNFS_OSD_RAID_5 . . . . . . . . . . . . . . . . . . . 20 5.4.3. PNFS_OSD_RAID_5 . . . . . . . . . . . . . . . . . . . 20
skipping to change at page 4, line 35 skipping to change at page 4, line 35
scheme that allows the pNFS server to control what operations and scheme that allows the pNFS server to control what operations and
what objects can be used by clients. This scheme is described in what objects can be used by clients. This scheme is described in
more detail in the "Security Considerations" section (Section 13). more detail in the "Security Considerations" section (Section 13).
1.1. Requirements Language 1.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [2]. document are to be interpreted as described in RFC 2119 [2].
1.2. Overview of Changes
This document is an update to the initial RFC. The primary area for
changes are the clarification and correction of the RAID-related
equations and algorithms in Section 5.3. The equations were restated
for clarity, and in a few places minor corrections were made to
ensure that this spec accurately matches current implementations. In
addition, minor corrections have been made to other sections.
2. XDR Description of the Objects-Based Layout Protocol 2. XDR Description of the Objects-Based Layout Protocol
This document contains the external data representation (XDR [3]) This document contains the external data representation (XDR [3])
description of the NFSv4.1 objects layout protocol. The XDR description of the NFSv4.1 objects layout protocol. The XDR
description is embedded in this document in a way that makes it description is embedded in this document in a way that makes it
simple for the reader to extract into a ready-to-compile form. The simple for the reader to extract into a ready-to-compile form. The
reader can feed this document into the following shell script to reader can feed this document into the following shell script to
produce the machine readable XDR description of the NFSv4.1 objects produce the machine readable XDR description of the NFSv4.1 objects
layout protocol: layout protocol:
skipping to change at page 6, line 16 skipping to change at page 6, line 25
/// * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, /// * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
/// * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT /// * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
/// * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR /// * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
/// * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS /// * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
/// * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF /// * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
/// * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, /// * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
/// * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING /// * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
/// * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF /// * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
/// * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. /// * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
/// * /// *
/// * This code was derived from draft-ietf-nfsv4-rfc5664bis-01. /// * This code was derived from draft-ietf-nfsv4-rfc5664bis-02.
[[RFC Editor: please insert RFC number if needed]] [[RFC Editor: please insert RFC number if needed]]
/// * Please reproduce this note if possible. /// * Please reproduce this note if possible.
/// */ /// */
/// ///
/// /* /// /*
/// * pnfs_osd_prot.x /// * pnfs_osd_prot.x
/// */ /// */
/// ///
/// %#include <nfs4_prot.x> /// %#include <nfs4_prot.x>
/// ///
skipping to change at page 7, line 21 skipping to change at page 7, line 34
3.2. pnfs_osd_version4 3.2. pnfs_osd_version4
/// enum pnfs_osd_version4 { /// enum pnfs_osd_version4 {
/// PNFS_OSD_MISSING = 0, /// PNFS_OSD_MISSING = 0,
/// PNFS_OSD_VERSION_1 = 1, /// PNFS_OSD_VERSION_1 = 1,
/// PNFS_OSD_VERSION_2 = 2 /// PNFS_OSD_VERSION_2 = 2
/// }; /// };
/// ///
pnfs_osd_version4 is used to indicate the OSD protocol version or pnfs_osd_version4 is used to indicate the OSD protocol version used
whether an object is missing (i.e., unavailable). Some of the to access an object, or whether an object is missing (i.e.,
object-based layout- supported RAID algorithms encode redundant unavailable). Some of the RAID algorithms supported by object-based
information and can compensate for missing components, but the data layouts encode redundant information and can compensate for missing
placement algorithm needs to know what parts are missing. components, but the data placement algorithms need to be aware of the
logical positions of the missing components.
At this time, the OSD standard is at version 1.0, and we anticipate a The 1.0 version of the OSD standard has been ratified. The 2.0
version 2.0 of the standard (SNIA T10/1729-D [14]). The second version of the OSD standard has reached final draft status, but has
generation OSD protocol has additional proposed features to support not been fully ratified. However, current object-based pNFS
more robust error recovery, snapshots, and byte-range capabilities. implementations adhere to the OSD 2.0 protocol (SNIA T10/1729-D
Therefore, the OSD version is explicitly called out in the [14]). The second generation OSD protocol has additional features to
support more robust error recovery, snapshots, and byte-range
capabilities. For completeness, and to allow for future revisions in
the OSD protocol, the OSD version is explicitly called out in the
information returned in the layout. (This information can also be information returned in the layout. (This information can also be
deduced by looking inside the capability type at the format field, deduced by looking inside the capability type at the format field,
which is the first byte. The format value is 0x1 for an OSD v1 which is the first byte. The format value is 0x1 for an OSD v1
capability. However, it seems most robust to call out the version capability.)
explicitly.)
3.3. pnfs_osd_object_cred4 3.3. pnfs_osd_object_cred4
/// enum pnfs_osd_cap_key_sec4 { /// enum pnfs_osd_cap_key_sec4 {
/// PNFS_OSD_CAP_KEY_SEC_NONE = 0, /// PNFS_OSD_CAP_KEY_SEC_NONE = 0,
/// PNFS_OSD_CAP_KEY_SEC_SSV = 1 /// PNFS_OSD_CAP_KEY_SEC_SSV = 1
/// }; /// };
/// ///
/// struct pnfs_osd_object_cred4 { /// struct pnfs_osd_object_cred4 {
/// pnfs_osd_objid4 oc_object_id; /// pnfs_osd_objid4 oc_object_id;
skipping to change at page 11, line 7 skipping to change at page 11, line 7
/// enum pnfs_osd_targetid_type4 { /// enum pnfs_osd_targetid_type4 {
/// OBJ_TARGET_ANON = 1, /// OBJ_TARGET_ANON = 1,
/// OBJ_TARGET_SCSI_NAME = 2, /// OBJ_TARGET_SCSI_NAME = 2,
/// OBJ_TARGET_SCSI_DEVICE_ID = 3 /// OBJ_TARGET_SCSI_DEVICE_ID = 3
/// }; /// };
/// ///
4.2. pnfs_osd_deviceaddr4 4.2. pnfs_osd_deviceaddr4
The "pnfs_osd_deviceaddr4" data structure is returned by the server
as the storage-protocol-specific opaque field da_addr_body in the
"device_addr4" structure by a successful GETDEVICEINFO operation
NFSv4.1 [6].
The specification for an object device address is as follows: The specification for an object device address is as follows:
/// union pnfs_osd_targetid4 switch (pnfs_osd_targetid_type4 oti_type) { /// union pnfs_osd_targetid4 switch (pnfs_osd_targetid_type4 oti_type) {
/// case OBJ_TARGET_SCSI_NAME: /// case OBJ_TARGET_SCSI_NAME:
/// string oti_scsi_name<>; /// string oti_scsi_name<>;
/// ///
/// case OBJ_TARGET_SCSI_DEVICE_ID: /// case OBJ_TARGET_SCSI_DEVICE_ID:
/// opaque oti_scsi_device_id<>; /// opaque oti_scsi_device_id<>;
/// ///
/// default: /// default:
skipping to change at page 19, line 24 skipping to change at page 19, line 24
pnfs_osd_raid_algorithm4 determines the algorithm and placement of pnfs_osd_raid_algorithm4 determines the algorithm and placement of
redundant data. This section defines the different redundancy redundant data. This section defines the different redundancy
algorithms. Note: The term "RAID" (Redundant Array of Independent algorithms. Note: The term "RAID" (Redundant Array of Independent
Disks) is used in this document to represent an array of component Disks) is used in this document to represent an array of component
objects that store data for an individual file. The objects are objects that store data for an individual file. The objects are
stored on independent object-based storage devices. File data is stored on independent object-based storage devices. File data is
encoded and striped across the array of component objects using encoded and striped across the array of component objects using
algorithms developed for block-based RAID systems. algorithms developed for block-based RAID systems.
The use of per-file RAID encoding in the object-layout for pNFS
imposes an additional responsibility on the file system client. The
pNFS client SHOULD generate the redundant data and write it do
storage along with the file data according to the RAID parameters
returned in the layout. However, various error conditions may
prevent the client from meeting its obligations, and this is
supported by the error information in the pnfs_osd_ioerr4 structure
(see Section 8.1). An explicit error return from the client, or an
implicit error caused by a client's failure to return a layout MUST
trigger recovery action by the server to prevent access to invalid
data (see Section 7). It is the server's responsibility to only
grant layout information to files that can be safely accessed, and to
deny access to files that are in an inconsistent state.
5.4.1. PNFS_OSD_RAID_0 5.4.1. PNFS_OSD_RAID_0
PNFS_OSD_RAID_0 means there is no parity data, so all bytes in the PNFS_OSD_RAID_0 means there is no parity data, so all bytes in the
component objects are data bytes located by the above equations for C component objects are data bytes located by the above equations for C
and O. If a component object is marked as PNFS_OSD_MISSING, the pNFS and O. If a component object is marked as PNFS_OSD_MISSING, an I/O
client MUST either return an I/O error if this component is attempted error MUST be returned if this component is accessed. In this case,
to be read or, alternatively, it can retry the READ against the pNFS the generic NFS client layer MAY elect to retry this operation
server. against the pNFS server.
5.4.2. PNFS_OSD_RAID_4 5.4.2. PNFS_OSD_RAID_4
PNFS_OSD_RAID_4 means that the last component object, or the last in PNFS_OSD_RAID_4 means that the last component object, or the last in
each group (if odm_group_width is greater than zero), contains parity each group (if odm_group_width is greater than zero), contains parity
information computed over the rest of the stripe with an XOR information computed over the rest of the stripe with an XOR
operation. If a component object is unavailable, the client can read operation. If a component object is unavailable, the client can read
the rest of the stripe units in the damaged stripe and recompute the the rest of the stripe units in the damaged stripe and recompute the
missing stripe unit by XORing the other stripe units in the stripe. missing stripe unit by XORing the other stripe units in the stripe.
Or the client can replay the READ against the pNFS server that will Or the client can replay the READ against the pNFS server that will
skipping to change at page 22, line 13 skipping to change at page 22, line 13
Qdev = (I + 1) % W Qdev = (I + 1) % W
5.4.5. RAID Usage and Implementation Notes 5.4.5. RAID Usage and Implementation Notes
RAID layouts with redundant data in their stripes require additional RAID layouts with redundant data in their stripes require additional
serialization of updates to ensure correct operation. Otherwise, if serialization of updates to ensure correct operation. Otherwise, if
two clients simultaneously write to the same logical range of an two clients simultaneously write to the same logical range of an
object, the result could include different data in the same ranges of object, the result could include different data in the same ranges of
mirrored tuples, or corrupt parity information. It is the mirrored tuples, or corrupt parity information. It is the
responsibility of the metadata server to enforce serialization responsibility of the metadata server to enforce serialization
requirements such as this. For example, the metadata server may do requirements. Serialization MUST occur at the RAID stripe boundary
so by not granting overlapping write layouts within mirrored objects. for write operations to avoid corrupting parity by concurrent updates
to the same stripe. Mirrors do not have explicit stripe boundaries,
so it is sufficient to serialize writes to the same byte ranges.
Many alternative encoding schemes exist for P>=2 [22]. These involve Many alternative encoding schemes exist for P>=2 [22]. These involve
P or Q equations different than those used in PNFS_OSD_RAID_PQ. P or Q equations different than the Reed-Solomon encoding used in
Thus, if one of these schemes is to be used in the future, a distinct PNFS_OSD_RAID_PQ. Thus, if one of these schemes is to be used in the
value must be added to pnfs_osd_raid_algorithm4 for it. While Reed- future, a distinct value must be added to pnfs_osd_raid_algorithm4
Solomon codes are well understood, recently discovered schemes such for it.
as Liberation codes are more computationally efficient for small
group_widths, and Cauchy Reed-Solomon codes are more computationally
efficient for higher values of P.
6. Object-Based Layout Update 6. Object-Based Layout Update
layoutupdate4 is used in the LAYOUTCOMMIT operation to convey updates layoutupdate4 is used in the LAYOUTCOMMIT operation to convey updates
to the layout and additional information to the metadata server. It to the layout and additional information to the metadata server. It
is defined in the NFSv4.1 [6] as follows: is defined in the NFSv4.1 [6] as follows:
struct layoutupdate4 { struct layoutupdate4 {
layouttype4 lou_type; layouttype4 lou_type;
opaque lou_body<>; opaque lou_body<>;
skipping to change at page 24, line 9 skipping to change at page 24, line 9
However, if I/O errors were encountered, the server better not However, if I/O errors were encountered, the server better not
attempt to write the new attributes on the storage devices until it attempt to write the new attributes on the storage devices until it
receives the I/O error report; therefore, the client MUST set the receives the I/O error report; therefore, the client MUST set the
olu_ioerr_flag to true. Note that in this case, the client SHOULD olu_ioerr_flag to true. Note that in this case, the client SHOULD
send both the LAYOUTCOMMIT and LAYOUTRETURN operations in the same send both the LAYOUTCOMMIT and LAYOUTRETURN operations in the same
COMPOUND RPC. COMPOUND RPC.
7. Recovering from Client I/O Errors 7. Recovering from Client I/O Errors
The pNFS client may encounter errors when directly accessing the The pNFS client may encounter errors when directly accessing the
object storage devices. However, it is the responsibility of the object storage devices. A well behaved client will report any such
metadata server to handle the I/O errors. When the errors promptly by executing a LAYOUTRETURN. When the
LAYOUT4_OSD2_OBJECTS layout type is used, the client MUST report the LAYOUT4_OSD2_OBJECTS layout type is used, the client MUST report the
I/O errors to the server at LAYOUTRETURN time using the I/O errors to the server at LAYOUTRETURN time using the
pnfs_osd_ioerr4 structure (see Section 8.1). pnfs_osd_ioerr4 structure (see Section 8.1).
The metadata server analyzes the error and determines the required It is the responsibility of the metadata server to handle the I/O
errors. The server MUST analyze the error and perform the required
recovery operations such as repairing any parity inconsistencies, recovery operations such as repairing any parity inconsistencies,
recovering media failures, or reconstructing missing objects. recovering media failures, or reconstructing missing objects.
The metadata server SHOULD recall any outstanding layouts to allow it The metadata server SHOULD recall any outstanding layouts to allow it
exclusive write access to the stripes being recovered and to prevent exclusive write access to the stripes being recovered and to prevent
other clients from hitting the same error condition. In these cases, other clients from hitting the same error condition. In these cases,
the server MUST complete recovery before handing out any new layouts the server MUST complete recovery before handing out any new layouts
to the affected byte ranges. to the affected byte ranges.
Although it MAY be acceptable for the client to propagate a The client SHOULD attempt to compensate for the error before giving
corresponding error to the application that initiated the I/O up and reflecting an error to the application. The first step in
operation and drop any unwritten data, the client SHOULD attempt to error recovery is to return the layout with LAYOUTRETURN and the
retry the original I/O operation by requesting a new layout using associated error information. The second step is to request a new
LAYOUTGET and retry the I/O operation(s) using the new layout, or the layout using LAYOUTGET and then retry the I/O operation with the new
client MAY just retry the I/O operation(s) using regular NFS READ or layout. Finally, if the error persists, the client may choose to
WRITE operations via the metadata server. The client SHOULD attempt retry the I/O operation using regular NFS READ or WRITE operations
to retrieve a new layout and retry the I/O operation using OSD via the metadata server.
commands first and only if the error persists, retry the I/O
operation via the metadata server.
8. Object-Based Layout Return 8. Object-Based Layout Return
layoutreturn_file4 is used in the LAYOUTRETURN operation to convey layoutreturn_file4 is used in the LAYOUTRETURN operation to convey
layout-type specific information to the server. It is defined in the layout-type specific information to the server. It is defined in the
NFSv4.1 [6] as follows: NFSv4.1 [6] as follows:
struct layoutreturn_file4 { struct layoutreturn_file4 {
offset4 lrf_offset; offset4 lrf_offset;
length4 lrf_length; length4 lrf_length;
skipping to change at page 37, line 36 skipping to change at page 37, line 36
Appendix A. Acknowledgments Appendix A. Acknowledgments
Todd Pisek was a co-editor of the initial versions of this document. Todd Pisek was a co-editor of the initial versions of this document.
Daniel E. Messinger, Pete Wyckoff, Mike Eisler, Sean P. Turner, Brian Daniel E. Messinger, Pete Wyckoff, Mike Eisler, Sean P. Turner, Brian
E. Carpenter, Jari Arkko, David Black, and Jason Glasgow reviewed and E. Carpenter, Jari Arkko, David Black, and Jason Glasgow reviewed and
commented on this document. commented on this document.
Authors' Addresses Authors' Addresses
Benny Halevy Benny Halevy
Tonian, Inc. Primary Data
Email: bhalevy@tonian.com Email: bhalevy@primarydata.com
URI: http://www.tonian.com/ URI: http://www.primarydata.com/
Boaz Harrosh Boaz Harrosh
Panasas, Inc. Panasas, Inc.
1501 Reedsdale St. Suite 400 1501 Reedsdale St. Suite 400
Pittsburgh, PA 15233 Pittsburgh, PA 15233
USA USA
Phone: +1-412-323-3500 Phone: +1-412-323-3500
Email: bharrosh@panasas.com Email: bharrosh@panasas.com
URI: http://www.panasas.com/ URI: http://www.panasas.com/
Brent Welch Brent Welch
Panasas, Inc. Panasas, Inc.
6520 Kaiser Drive 969 W. Maude Ave
Fremont, CA 95444 Sunnyvale, CA 94095
USA USA
Phone: +1-650-608-7770 Phone: +1-408-215-6715
Email: welch@panasas.com Email: welch@acm.org
URI: http://www.panasas.com/ URI: http://www.panasas.com/
 End of changes. 27 change blocks. 
55 lines changed or deleted 87 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/