Recently, I had the somewhat laborious job to backup a stack of websites and blogs from my remote web server. Initially I tried do it with a simple SCP command; but after letting it run for about an hour, it was obvious that it was just too slow and taking too long downloading each file one at a time.
After talking complaining to a friend he suggested using the RSYNC command. Surprisingly it was incredibly very easy to get it working, you simply issue:
rsync -avz -e ssh remoteuser@remotehost:/remote/dir /this/dir/
Obviously changing the appropriate parts for your case.
I found it to be at least 10 fold faster (or more) than SCP on it’s own, and better still, RSYNC will resume when SCP wont! Try it and see for yourself.
This is exactly what I needed. I had to transfer several GB over a flaky connection and rsync with the -P flag made an otherwise maddening task doable.
Yeah, It’s not pretty, but you just cannot beat unix based machines like OSX when you need to copy huge quantities of data from one machine to another. I’m glad this was helpful for you.
How is this multithreaded? I have been looking for a way to have rsync do multi-threaded transfers, but I do not see how your solution is multithreaded. It copies one file at a time, right?
I am guessing that the speed up you see is from compression instead (-z option in rsync).
Of course I could be wrong, but I think that while the transfers themselves might not be specifically threaded, I think there is enough observational evidence to prove that it is. That being said, it is very possible that you might be correct. he main point is that its the fastest way I’ve seen to reliably transfer data from one machine to another (and so far I’ve yet to find a method that’s faster).
HOWEVER, if you do happen to find out how to make it better please let me know, I’d like to know personally and i will update this for the benefit of others. Naturally I will credit you for the addendum.
Good luck.
I think you do not understand what multi-threaded means. rsync is NOT multithreaded. This can easily be seen by the fact that rsync on a multicore/multicpu system will only use ONE core. I would like to know what other observational evidence you have that shows that it IS multi-threaded.
As I said @mb I could be wrong. And yes, I do know what multi-threaded means. And If you also know, you would know that if a process is executing on a multiple threads, how many ‘cores’ a given CPU has is irrelevant (in terms of proving/disproving MTA applications). However, it may *not* be multi-threaded as @Thomas pointed out – I also admit that my evidence is purely anecdotal, but my OBSERVATIONS are that in pure throughput terms rsync is an EXCELLENT alternative when copying a very large quantity of very small files from one machine to another, even faster than tar, scp, ftp, etc. At the time I wrote this post i was trying to solve a problem about how to transfer 10’s of billions of 1K files from one server to another. Rsync won, hands down. At the time I believed (and it certainly appeared to be threaded) but I accept that may not have been the case.
Hi,
I use ssh and tar for huge transfer (pulling):
ssh user@remote_host tar c source | tar x
or the other way (pushing):
tar c source | ssh user@remote_host tar x
If you want to get rid of base directories (pulling) or point files to specific directory (pushing) use -C with tar. Of course you could use -C on both sides of tar. If you wan compression use -z or -j with tar.
For my set of test data (around 1GB):
scp: 50sec
rsync: 40sec
tar: 15sec
Why it is so fast? Tar does not process file by file but block by block.
Wow, the tar way is really great! Thanks! No stats though, but when running in screen you can just put it in the background and check the file size.
Try BBCP for multi-threaded copy:
http://www.slac.stanford.edu/~abh/bbcp/
You are a saint. I have been using SCP for years and finally searched for something else. This is a huge game changer. Thanks for sharing this and making it so simple.