Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No replica created with composable resources #108

Open
ingridbr opened this issue Nov 13, 2020 · 11 comments
Open

No replica created with composable resources #108

ingridbr opened this issue Nov 13, 2020 · 11 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@ingridbr
Copy link

ingridbr commented Nov 13, 2020

We are testing NFSRODS (v1.0.0) and iRODS 4.2.8 cluster with the followin composable resource defined:

default:passthru
└── tier1-p-irods-2020-pilot:passthru
    └── tier1-p-irods-2020-pilot-replication:replication
        └── tier1-p-irods-posix:passthru
            └── tier1-p-irods-posix-1-4:random
                ├── tier1-p-irods-posix-1-a-2-a:replication
                │   ├── tier1-p-irods-posix-1-a-weight:passthru
                │   │   └── tier1-p-irods-posix-1-a:unixfilesystem
                │   └── tier1-p-irods-posix-2-a-weight:passthru
                │       └── tier1-p-irods-posix-2-a:unixfilesystem
                ├── tier1-p-irods-posix-1-b-4-b:replication
                │   ├── tier1-p-irods-posix-1-b-weight:passthru
                │   │   └── tier1-p-irods-posix-1-b:unixfilesystem
                │   └── tier1-p-irods-posix-4-b-weight:passthru
                │       └── tier1-p-irods-posix-4-b:unixfilesystem
                ├── tier1-p-irods-posix-3-a-4-a:replication
                │   ├── tier1-p-irods-posix-3-a-weight:passthru
                │   │   └── tier1-p-irods-posix-3-a:unixfilesystem
                │   └── tier1-p-irods-posix-4-a-weight:passthru
                │       └── tier1-p-irods-posix-4-a:unixfilesystem
                └── tier1-p-irods-posix-3-b-2-b:replication
                    ├── tier1-p-irods-posix-2-b-weight:passthru
                    │   └── tier1-p-irods-posix-2-b:unixfilesystem
                    └── tier1-p-irods-posix-3-b-weight:passthru
                        └── tier1-p-irods-posix-3-b:unixfilesystem

and we are using with this configuration the nfsrods server:

{
    "nfs_server": {
        "port": 2049,
        "irods_mount_point": "/ourZone",
        "user_information_refresh_time_in_milliseconds": 3600000,
        "file_information_refresh_time_in_milliseconds": 1000,
        "user_access_refresh_time_in_milliseconds": 1000
    },

    "irods_client": {
        "zone": "ourZone",
        "host": "irodsServer",
        "port": 1247,
        "default_resource": "default",
        "ssl_negotiation_policy": "CS_NEG_REFUSE",
        "connection_timeout_in_seconds": 600,
        "proxy_admin_account": {
            "username": "rods",
            "password": "XXXXXXX"
        }
    }
}

With this setup when we copy a file to the NFS mount point using the cp command we see that only one of replicas is created:

$ cp test.txt /vsc-mounts/leuven-irods-tier1-p-mnt/home/vsc30706/testnfsrods
$ ils -L /kuleuven_tier1_pilot/home/vsc30706/testnfsrods
/kuleuven_tier1_pilot/home/vsc30706/testnfsrods:
  vsc30706          0 **default**;tier1-p-irods-2020-pilot;tier1-p-irods-2020-pilot-replication;tier1-p-irods-posix;tier1-p-irods-posix-1-4;tier1-p-irods-posix-3-b-2-b;tier1-p-irods-posix-3-b-weight;tier1-p-irods-posix-3-b           20 2020-11-13.10:30 & test.txt
        generic    /irods/b/home/vsc30706/testnfsrods/test.txt

When doing an iput of the same file on the same directory the the 2 replicas are created (as expected):

$ iput test.txt /kuleuven_tier1_pilot/home/vsc30706/testnfsrods/
$ ils -L /kuleuven_tier1_pilot/home/vsc30706/testnfsrods/
/kuleuven_tier1_pilot/home/vsc30706/testnfsrods:
  vsc30706          0 **default**;tier1-p-irods-2020-pilot;tier1-p-irods-2020-pilot-replication;tier1-p-irods-posix;tier1-p-irods-posix-1-4;tier1-p-irods-posix-3-a-4-a;tier1-p-irods-posix-3-a-weight;tier1-p-irods-posix-3-a           20 2020-11-13.10:33 & test.txt
        generic    /irods/a/home/vsc30706/testnfsrods/test.txt
  vsc30706          1 **default**;tier1-p-irods-2020-pilot;tier1-p-irods-2020-pilot-replication;tier1-p-irods-posix;tier1-p-irods-posix-1-4;tier1-p-irods-posix-3-a-4-a;tier1-p-irods-posix-4-a-weight;tier1-p-irods-posix-4-a           20 2020-11-13.10:33 & test.txt
        generic    /irods/a/home/vsc30706/testnfsrods/test.txt

In both cases the same resource is used (default) but it seems that with NFSRODS the second copy is not done.
We do not see any error on the irods log.

@trel
Copy link
Member

trel commented Nov 17, 2020

That is surprising - we'll look into reproducing it here.

@trel
Copy link
Member

trel commented Nov 17, 2020

As a workaround in the meantime, you can run iadmin modresc default rebalance to heal any missing/stale replicas.

@trel trel changed the title No replica created whith composable resources No replica created with composable resources Nov 17, 2020
@trel trel added the bug Something isn't working label Nov 17, 2020
@trel
Copy link
Member

trel commented Dec 10, 2020

Thought streaming (open/write/close) was the culprit...

but... istream also appears to work as expected against a similar tree in 4.2.8 with pt defined as the default resource:

$ ienv | grep version
irods_version - 4.2.8
$ ilsresc
demoResc:unixfilesystem
pt:passthru
└── r1:random
    ├── repl1:replication
    │   ├── ufs1:unixfilesystem
    │   └── ufs11:unixfilesystem
    └── repl2:replication
        ├── ufs2:unixfilesystem
        └── ufs22:unixfilesystem
$ iput rules.ninja withput
$ echo "perhaps" | istream write withstream
$ ils -l
/tempZone/home/rods:
  rods              0 pt;r1;repl2;ufs22        51592 2020-12-09.21:10 & withput
  rods              1 pt;r1;repl2;ufs2        51592 2020-12-09.21:10 & withput
  rods              0 pt;r1;repl1;ufs11            8 2020-12-09.21:10 & withstream
  rods              1 pt;r1;repl1;ufs1            8 2020-12-09.21:10 & withstream

@alanking
Copy link
Contributor

This is likely caused by the openType not being set in the dataObjInp when calling rsDataObjOpen. fileModified is only triggered when the openType is set to CREATE_TYPE or OPEN_FOR_WRITE_TYPE:

https://github.com/irods/irods/blob/9eb6c23df45cdedff8ee9c3af71a65d304635037/server/api/src/rsModDataObjMeta.cpp#L67-L69

The regParam passed to this API is constructed using the keywords found in the L1 descriptor, which is populated in open.

See replica_close API plugin:

https://github.com/irods/irods/blob/791e413f5401103efbc491ee97b2efb245cf4c31/plugins/api/src/replica_close.cpp#L227-L230

...and rsDataObjClose API:

https://github.com/irods/irods/blob/9eb6c23df45cdedff8ee9c3af71a65d304635037/server/api/src/rsDataObjClose.cpp#L306
https://github.com/irods/irods/blob/9eb6c23df45cdedff8ee9c3af71a65d304635037/server/api/src/rsDataObjClose.cpp#L587

@korydraughn
Copy link
Collaborator

The latest NFSRODS commits do not fix this issue (NFSRODS: 77b54c9, iRODS: irods/irods@9c57ce9).

However, the results show that additional replicas are created. The good replica has the correct size while the stale replica does not.

$ ilsresc
demoResc:unixfilesystem
pt:passthru
└── repl:replication
    ├── ufs0:unixfilesystem
    └── ufs1:unixfilesystem
$ cp <file> /mnt/nfsrods/foo # NFSRODS is configured to target the "pt" resource.
$ ils -l
/tempZone/home/kory:
  kory              0 pt;repl;ufs0      2001391 2022-01-26.13:55 & foo
  kory              1 pt;repl;ufs1            0 2022-01-26.13:55 X foo

We're close, but this still needs some work.

@michael-conway
Copy link
Member

michael-conway commented Jan 26, 2022 via email

@korydraughn
Copy link
Collaborator

sure thing.

@korydraughn
Copy link
Collaborator

See irods/irods#6142

@trel
Copy link
Member

trel commented Feb 7, 2022

I believe this is now handled in NFSRODS 2.1.0 due to Jargon 4.3.2.5-SNAPSHOT.

@korydraughn
Copy link
Collaborator

We'll need to verify that using various file sizes.

@korydraughn korydraughn added this to the 2.3.0 milestone Feb 1, 2024
@korydraughn
Copy link
Collaborator

Confirmed PR #202 does not resolve this issue.

The file is uploaded correctly. The first replica has the correct size and is marked good. The physical size is correct too.

The second replica has a size of 0 in the catalog and is marked stale. The second replica's physical size is 0.

@korydraughn korydraughn modified the milestones: 2.3.0, 2.4.0 May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

No branches or pull requests

5 participants