Eelco steps down from the board (NixOS)

meteokr@community.adiquaints.moe · 6 months ago

I can’t seem to find it, but I think it was James Gosling, where he was blocked from reviewing code at Google because he hadn’t gone through the company’s approval process. I hope this wasn’t a myth I’ve been carrying on for this long.

meteokr@community.adiquaints.moe · 6 months ago

Right, I did hear about that lawsuit way back when, I just didn’t know of these types of consequences. Very appreciated, especially the sources.

meteokr@community.adiquaints.moe · 6 months ago

I really appreciate you linking studies about this topic, as finding this kind of research can be daunting. Those looks like really interesting reads.

meteokr@community.adiquaints.moe · 6 months ago

Is this for hardware RAID controllers, or have you experience software RAID like LVM or ZFS exhibiting the same drop out behavior? I personally haven’t but it be nice to look out for future drives.

meteokr@community.adiquaints.moe · edit-2 6 months ago

Drive 1: A, Drive 2: 1/2 A, Drive 3: 2/2 A. Drive 2 + Drive 3 = Drive 1. Hmm that would only be one set of the party though. So you could also add 1/2 of A to Drive 1, and 2/2 to Drive 2 so that the parity on Drive 1 + Drive 2 = Drive 3. Which is extremely silly, and doesn’t make a lot of sense to use in the real world.

meteokr@community.adiquaints.moe · 6 months ago

Oh thanks for the tip! I’ve edited my comment to reflect the minimum of 4 drives for a RAID6 array.

I’ve not used RAID6 for a small array like that before so I didn’t know it had a conventional lower limit. From the technical sense it doesn’t have to have 4 drives, it just wouldn’t make any sense to use it that way so I see why software wouldn’t support such a use case.

meteokr@community.adiquaints.moe · 6 months ago

Yes their failure rates are usually a bit higher, but usually less than the increase in rate from using more than one disk instead. A bit of math can be done using Backblaze’s disk failure rate data to get a reasonable approximation of the overall risk of failure.

meteokr@community.adiquaints.moe · 6 months ago

Exactly! RAID gives you the breathing room to react to the partial failure of the full RAID array disk. I appreciate your understanding.

meteokr@community.adiquaints.moe · 6 months ago

Oh I thought there was some other CVE acronym I was unaware of. I don’t think periodically git cloning a repo every few days would be something to worry about. Ever since the Yuzu take down I got in the habit of mirroring a bunch of repos that I’d be very sad to lose, just as a precaution, it probably won’t matter, but it’s a tiny peace of mind knowing I could at the very least compile it myself if it was lost.

meteokr@community.adiquaints.moe · edit-2 6 months ago

Consider a scenario with a degraded RAID 1 array comprised of two 1.6 TB disks capable of transferring data at a sustained rate of 6 Gbps: you should be able to recover from a single disk failure in just over half an hour.

Repeat the same scenario with 32 TB members, now we’re looking at a twelve hour recovery - twelve hours of intensive activity that could push either of your drives over the edge. Increasing data density actually increases the risk of data loss.

The speed and method you use recover from data loss is not relevant to the discussion of how to handle drive failure. That varies wildly depending on your specific setup.

Finally, we say you shouldn’t think of RAID as a backup because the entire array could fail, not for the excruciatingly literal reasons you are attempting to convey. If you lose half of a two disk mirror set, you haven’t lost any data.

My premise is that reducing the number of drives reduces the risk of drive failure which could lead to data loss. RAID is not a backup, because it literally isn’t. If you have two drives in RAID1 you have 1 set of your data. If you have 4 drives in RAID6 you have 1 set of your data. In both examples you have a single very durable drive, but you do not have a backup. A backup prevents data loss, RAID does not.

Think of it this way. You have a single very large drive, and you explicitly only use 1/2 of it. The other 1/2 of the drive becomes broken and you cannot read or write to it. The first 1/2 work perfectly fine, and fits all your data. Would you consider this drive functional, or failed? A RAID degradation is a warning to the user that a portion of the single drive is broken, and needs to be repaired. A RAID block device should always be treated as a single physical drive, with varying levels of durability and warning signs depending upon its configuration. It can’t be a backup, because all its doing is delaying the eventual failure. Delaying a failure does not prevent the failure from happening, and does not help you when a failure occurs.

meteokr@community.adiquaints.moe · 6 months ago

Gonna explain or just continue to mock me?

meteokr@community.adiquaints.moe · 6 months ago

What does that mean?

meteokr@community.adiquaints.moe · 6 months ago

*in some jurisdictions.

meteokr@community.adiquaints.moe · 6 months ago

Selfhosting my own git server, partially to mirror repos like this.

meteokr@community.adiquaints.moe · 6 months ago

A simple way of doing it, is to just move some of the data somewhere else, and then restore that backup. If the contents are fine, then all is well, and if they aren’t, then you can delete the broken restore, and move the files back where they were. Depending on how you are doing backups, some system have built in “dry-run” style tests were they can test themselves, but you should still verify the contents every so often.

meteokr@community.adiquaints.moe · 6 months ago

Thank you for understanding.

meteokr@community.adiquaints.moe · 6 months ago

You misunderstand my claim.

meteokr@community.adiquaints.moe · edit-2 6 months ago

TL:DR; Bigger drives reduces the risk of data loss overtime. Please backup your data. RAID is not a backup.

As drives get bigger and bigger, the emotionally risk you feel when you fill them up is real. However, that is not the best way to think about it. Drives will inevitably fail, and drives are easily replaced commodities, their failure should be expected, and handled appropriately. RAID is not a backup, and does not reduce your risk of drive failure. RAID creates a safer environment for your data when a drive fails. How you should think about RAID is as if you are replacing a failed drive in advance, not as a reduction of risk of the drive failing.

To illustrate my point, we have Y of data to store. I can either split the data across X number drives, or store it all on a single drive. Which is safer? A single drive is objectively safer, given the same failure rate. So we have two cases for this situation. In both cases, this imaginary drive fails 10% of the time. The exact amount doesn’t matter so long as they are reasonably close.

Case A: You have 1 drive holding all your data. There is a 1/10 chance it fails. Your risk is 10%.

Case B: You have X drives holding all your data. Each drive has a 1/10 chance of failing. so a 1−(9/10)^X chance any of the drives fail. For all of X, your rate of failure is higher than 1/10. For two drives you have 19% chance of failure, three drives is 27%.

In all cases your rate of failure increases the more drives you add to hold your data. Please do not become confused by what RAID does for this illustration. RAID will not prevent drive failures. RAID allows you to, in essence, “pre-fail” a drive in advance. A drive will fail, and some RAID configurations(1,5,6) will replace the functionality of the failed drive until you can replace the “real” failed drive. RAID did not prevent your drive failure, it only moved the time the failure happened to be convenient for the user. A RAID1 array with a failed drive is still a failed drive that needs to be replaced, and still needs to be restored from backup/re-striped.

Let’s take the cases of no RAID vs RAID1.

Case A: You have 1 drive holding all your data. When the drives fails, you stop your work, and replace the drive immediately.

Case RAID1: You have 1 drive holding all your data. You continue working because you’ve been very busy. You replace the drive when you have some downtime a week later.

In Case A, you had lost productivity because the drive failed at an inconvenient time, in the RAID1 case you could schedule the drive replacement for a later date when you had some spare time, huge improvement in the user experience. But wait! I said in the case of RAID1 only one of the drives was holding my data, should I have said 2 drives were? Yes, in a literal sense the RAID1 holds a copy of the data in the second drive. However, RAID is not a backup, it is a system to schedule the time of drive failures. Your backup of the RAID array is what holds a real second copy of your data, not your mirrored drive, because RAID is not a backup. Your second drive was still present in Case A, it was just replaced after the failure occurred, rather than before the first one failed.

Be safe with your data. please make backups, and verify you can restore from them regularly. RAID is not a backup.

meteokr@community.adiquaints.moe · 6 months ago

The person I replied to said

I’m uncomfortable storing 16TB worth of data on one drive

as a criticism of using a single 32TB drive.

I argue that a single 32TB drive is less risk than using 2 16TB drives. Am I wrong?

meteokr@community.adiquaints.moe · 6 months ago

The amount of risk of drives failing is not dependent of your raid config at all. ignoring excessive duty cycling. I believe you are misunderstanding the point I was making in my original reply. I’m claiming that these 32TB drives will reduce your risk of losing data than by raiding 2 16TB drives, given the same failure rate.

I’m uncomfortable storing 16TB worth of data on one drive

Example you have 20TB of data. What is safer?

2 16TB drives in raid0
1 32TB drive

This is completely irrelevant to your backup solution. You should have backups, of course, but I don’t see how that factors into my point? You have to put the data somewhere, and then back it up, where do you put it? I will always put it on as few physical drives as possible, to minimize the risk of drive failure over time so I don’t have to restore/re-stripe as often.

meteokr@community.adiquaints.moe · 6 months ago

Eelco steps down from the board (NixOS)

meteokr@community.adiquaints.moe · 7 months ago

Extreme Go Horse Programming Methodology