SVN to GitHub Migration - Removing Big Files

There are two situation in which you want to remove big files (e.g. binaries, libraries, etc):

  1. As a step of preparing/cleaning a cloned SVN repo
  2. When the repo is already pushed to a remote Repo (e.g. GitHub)
Both situations can be handled with the same workflow, but with different preparation steps. We will use the BFG Repo Cleaner for both.

Preparation of the Repository

Preparing a Local Cloned SVN Repository

The BFG Repo Cleaner can only operate on a bare Git repository. So, this has to be the very final step before pushing the repository to GitHub as we will clone our local repository into a local bare mirror.

Go to your working directory. In the following example, "SES" is the name of the Git repository.

|

git clone --mirror SES SES-bare.git # remove the automatically created refs/remote/origin cd SES-bar.git git remote remove origin cd .. 

|

Cleaning a Remote Repository

The workflow is basically the same: we create a bare repo clone from the remote one.

|

git clone --mirror https://github.com/52North/SES.git SES-bare.git 

|

Performing the Cleanup

Save the following script in a file called "remove_files.sh" in your working directory, and mark it executable. Download BFG Repo Cleaner (see instructions in the script below).

|

#!/bin/bash # Download the BFG Repo Cleaner from http://rtyley.github.com/bfg-repo-cleaner/ and rename the jar to bfg.jar # You can also use curl for the download: # curl -O http://repo1.maven.org/maven2/com/madgag/bfg/1.7.0/bfg-1.7.0.jar # mv bfg-1.7.0.jar bfg.jar # Usage: ./remove_files.sh files.txt MyRepo.git

while read file_hash file_to_remove do echo "Removing "$file_to_remove; lastFile=`echo $file_to_remove | awk -F/ '{print $NF}'`; java -jar bfg.jar --delete-files $lastFile $2; done < $1

cd $2; git gc --prune=now --aggressive; cd ..; 

|

Now, navigate into the previously created bare repo and determine the files for removal. In the case below it is all .jar files.

They are stored in a file files.txt one directory up, i.e. in your working directory. The navigate back to your working directory.

|

cd SES-bare.git # export unwanted files to a text file git rev-list --objects --all | grep '.jar' > ../files.txt cd .. 

|

Perform the removal script:

|

./remove_files.sh files.txt SES-bare.git 

|

You can now compare the sizes of the local repository and the cleaned up repository and see a reduction in size depending on the number and versions of binary files that were removed.

If you have worked on a local clone of a remote repo (see SvnToGitHubMigrationBigFiles), you can push back the changes:

|

git push 

|

If you are preparing a local SVN clone for remote publishing, you must remove the remote refeference to your (other) local repository. Navigate to the bare repository and execute the following command:

|

git remote rm origin 

|

You can list the existing remotes with git remote show, but nothing should be listed there.

Removing SVN remote

You can remove the SVN remote now, and if you have problems take a look here: http://stackoverflow.com/questions/12013788/how-to-remove-subversion-remote-in-git.

Topic revision: r1 - 18 Feb 2014, danielnuest
Legal Notice | Privacy Statement


This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Wiki? Send feedback