Problem Statement:
Lets say you have a working directory of GIT where all your project work is stored and checked in. Also you are using Open Source Solutions from 3-4 different repositories. You want the latest code to be available whenever it is uploaded. 1 way to do that is to download the zip/repository code each time to your project folder. Other sleek and effective ways are:
- Git Sub modules:
Submodules allow you to keep a Git repository as a subdirectory of another Git repository. This lets you clone another repository into your project and keep your commits separate.Extract:Starting with Submodules
We’ll walk through developing a simple project that has been split up into a main project and a few sub-projects.
Let’s start by adding an existing Git repository as a submodule of the repository that we’re working on. To add a new submodule you use the
git submodule add
command with the URL of the project you would like to start tracking. In this example, we’ll add a library called “DbConnector”.$
git submodule add https://github.com/chaconinc/DbConnectorCloning into 'DbConnector'...
remote: Counting objects: 11, done.
remote: Compressing objects: 100% (10/10), done.
remote: Total 11 (delta 0), reused 11 (delta 0)
Unpacking objects: 100% (11/11), done.
Checking connectivity... done.
By default, submodules will add the subproject into a directory named the same as the repository, in this case “DbConnector”. You can add a different path at the end of the command if you want it to go elsewhere.
If you run
git status
at this point, you’ll notice a few things.$
git statusOn branch master
Your branch is up-to-date with 'origin/master'.
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
new file: .gitmodules
new file: DbConnector
First you should notice the new
.gitmodules
file. This is a configuration file that stores the mapping between the project’s URL and the local subdirectory you’ve pulled it into:$
cat .gitmodules[submodule "DbConnector"]
path = DbConnector
url = https://github.com/chaconinc/DbConnector
If you have multiple submodules, you’ll have multiple entries in this file. It’s important to note that this file is version-controlled with your other files, like your
.gitignore
file. It’s pushed and pulled with the rest of your project. This is how other people who clone this project know where to get the submodule projects from.Since the URL in the .gitmodules file is what other people will first try to clone/fetch from, make sure to use a URL that they can access if possible. For example, if you use a different URL to push to than others would to pull from, use the one that others have access to. You can overwrite this value locally with
git config submodule.DbConnector.url PRIVATE_URL
for your own use.The other listing in the
git status
output is the project folder entry. If you rungit diff
on that, you see something interesting:$
git diff --cached DbConnectordiff --git a/DbConnector b/DbConnector
new file mode 160000
index 0000000..c3f01dc
--- /dev/null
+++ b/DbConnector
@@ -0,0 +1 @@
+Subproject commit c3f01dc8862123d317dd46284b05b6892c7b29bc
Although
DbConnector
is a subdirectory in your working directory, Git sees it as a submodule and doesn’t track its contents when you’re not in that directory. Instead, Git sees it as a particular commit from that repository.If you want a little nicer diff output, you can pass the
--submodule
option togit diff
.$
git diff --cached --submodulediff --git a/.gitmodules b/.gitmodules
new file mode 100644
index 0000000..71fc376
--- /dev/null
+++ b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "DbConnector"]
+ path = DbConnector
+ url = https://github.com/chaconinc/DbConnector
Submodule DbConnector 0000000...c3f01dc (new submodule)
When you commit, you see something like this:
$
git commit -am'added DbConnector module'
[master fb9093c] added DbConnector module
2 files changed, 4 insertions(+)
create mode 100644 .gitmodules
create mode 160000 DbConnector
Notice the
160000
mode for theDbConnector
entry. That is a special mode in Git that basically means you’re recording a commit as a directory entry rather than a subdirectory or a file.Cloning a Project with Submodules
Here we’ll clone a project with a submodule in it. When you clone such a project, by default you get the directories that contain submodules, but none of the files within them yet:
$
git clone https://github.com/chaconinc/MainProjectCloning into 'MainProject'...
remote: Counting objects: 14, done.
remote: Compressing objects: 100% (13/13), done.
remote: Total 14 (delta 1), reused 13 (delta 0)
Unpacking objects: 100% (14/14), done.
Checking connectivity... done.
$
cd
MainProject$
ls -latotal 16
drwxr-xr-x 9 schacon staff 306 Sep 17 15:21 .
drwxr-xr-x 7 schacon staff 238 Sep 17 15:21 ..
drwxr-xr-x 13 schacon staff 442 Sep 17 15:21 .git
-rw-r--r-- 1 schacon staff 92 Sep 17 15:21 .gitmodules
drwxr-xr-x 2 schacon staff 68 Sep 17 15:21 DbConnector
-rw-r--r-- 1 schacon staff 756 Sep 17 15:21 Makefile
drwxr-xr-x 3 schacon staff 102 Sep 17 15:21 includes
drwxr-xr-x 4 schacon staff 136 Sep 17 15:21 scripts
drwxr-xr-x 4 schacon staff 136 Sep 17 15:21 src
$
cd
DbConnector/$
ls$
The
DbConnector
directory is there, but empty. You must run two commands:git submodule init
to initialize your local configuration file, andgit submodule update
to fetch all the data from that project and check out the appropriate commit listed in your superproject:$
git submodule initSubmodule 'DbConnector' (https://github.com/chaconinc/DbConnector) registered for path 'DbConnector'
$
git submodule updateCloning into 'DbConnector'...
remote: Counting objects: 11, done.
remote: Compressing objects: 100% (10/10), done.
remote: Total 11 (delta 0), reused 11 (delta 0)
Unpacking objects: 100% (11/11), done.
Checking connectivity... done.
Submodule path 'DbConnector': checked out 'c3f01dc8862123d317dd46284b05b6892c7b29bc'
Now your
DbConnector
subdirectory is at the exact state it was in when you committed earlier.There is another way to do this which is a little simpler, however. If you pass
--recursive
to thegit clone
command, it will automatically initialize and update each submodule in the repository.$
git clone --recursive https://github.com/chaconinc/MainProjectCloning into 'MainProject'...
remote: Counting objects: 14, done.
remote: Compressing objects: 100% (13/13), done.
remote: Total 14 (delta 1), reused 13 (delta 0)
Unpacking objects: 100% (14/14), done.
Checking connectivity... done.
Submodule 'DbConnector' (https://github.com/chaconinc/DbConnector) registered for path 'DbConnector'
Cloning into 'DbConnector'...
remote: Counting objects: 11, done.
remote: Compressing objects: 100% (10/10), done.
remote: Total 11 (delta 0), reused 11 (delta 0)
Unpacking objects: 100% (11/11), done.
Checking connectivity... done.
Submodule path 'DbConnector': checked out 'c3f01dc8862123d317dd46284b05b6892c7b29bc'
Working on a Project with Submodules
Now we have a copy of a project with submodules in it and will collaborate with our teammates on both the main project and the submodule project.
Pulling in Upstream Changes
The simplest model of using submodules in a project would be if you were simply consuming a subproject and wanted to get updates from it from time to time but were not actually modifying anything in your checkout. Let’s walk through a simple example there.
If you want to check for new work in a submodule, you can go into the directory and run
git fetch
andgit merge
the upstream branch to update the local code.$
git fetchFrom https://github.com/chaconinc/DbConnector
c3f01dc..d0354fc master -> origin/master
$
git merge origin/masterUpdating c3f01dc..d0354fc
Fast-forward
scripts/connect.sh | 1 +
src/db.c | 1 +
2 files changed, 2 insertions(+)
Now if you go back into the main project and run
git diff --submodule
you can see that the submodule was updated and get a list of commits that were added to it. If you don’t want to type--submodule
every time you rungit diff
, you can set it as the default format by setting thediff.submodule
config value to “log”.$
git config --global diff.submodule log$
git diffSubmodule DbConnector c3f01dc..d0354fc:
> more efficient db routine
> better connection routine
If you commit at this point then you will lock the submodule into having the new code when other people update.
There is an easier way to do this as well, if you prefer to not manually fetch and merge in the subdirectory. If you run
git submodule update --remote
, Git will go into your submodules and fetch and update for you.$
git submodule update --remote DbConnectorremote: Counting objects: 4, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 4 (delta 2), reused 4 (delta 2)
Unpacking objects: 100% (4/4), done.
From https://github.com/chaconinc/DbConnector
3f19983..d0354fc master -> origin/master
Submodule path 'DbConnector': checked out 'd0354fc054692d3906c85c3af05ddce39a1c0644'
This command will by default assume that you want to update the checkout to the
master
branch of the submodule repository. You can, however, set this to something different if you want. For example, if you want to have the DbConnector submodule track that repository’s “stable” branch, you can set it in either your.gitmodules
file (so everyone else also tracks it), or just in your local.git/config
file. Let’s set it in the.gitmodules
file:$
git config -f .gitmodules submodule.DbConnector.branch stable$
git submodule update --remoteremote: Counting objects: 4, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 4 (delta 2), reused 4 (delta 2)
Unpacking objects: 100% (4/4), done.
From https://github.com/chaconinc/DbConnector
27cf5d3..c87d55d stable -> origin/stable
Submodule path 'DbConnector': checked out 'c87d55d4c6d4b05ee34fbc8cb6f7bf4585ae6687'
If you leave off the
-f .gitmodules
it will only make the change for you, but it probably makes more sense to track that information with the repository so everyone else does as well.When we run
git status
at this point, Git will show us that we have “new commits” on the submodule.$
git statusOn branch master
Your branch is up-to-date with 'origin/master'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: .gitmodules
modified: DbConnector (new commits)
no changes added to commit (use "git add" and/or "git commit -a")
If you set the configuration setting
status.submodulesummary
, Git will also show you a short summary of changes to your submodules:$
git config status.submodulesummary 1$
git statusOn branch master
Your branch is up-to-date with 'origin/master'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: .gitmodules
modified: DbConnector (new commits)
Submodules changed but not updated:
* DbConnector c3f01dc...c87d55d (4):
> catch non-null terminated lines
At this point if you run
git diff
we can see both that we have modified our.gitmodules
file and also that there are a number of commits that we’ve pulled down and are ready to commit to our submodule project.$
git diffdiff --git a/.gitmodules b/.gitmodules
index 6fc0b3d..fd1cc29 100644
--- a/.gitmodules
+++ b/.gitmodules
@@ -1,3 +1,4 @@
[submodule "DbConnector"]
path = DbConnector
url = https://github.com/chaconinc/DbConnector
+ branch = stable
Submodule DbConnector c3f01dc..c87d55d:
> catch non-null terminated lines
> more robust error handling
> more efficient db routine
> better connection routine
This is pretty cool as we can actually see the log of commits that we’re about to commit to in our submodule. Once committed, you can see this information after the fact as well when you run
git log -p
.$
git log -p --submodulecommit 0a24cfc121a8a3c118e0105ae4ae4c00281cf7ae
Author: Scott Chacon <schacon@gmail.com>
Date: Wed Sep 17 16:37:02 2014 +0200
updating DbConnector for bug fixes
diff --git a/.gitmodules b/.gitmodules
index 6fc0b3d..fd1cc29 100644
--- a/.gitmodules
+++ b/.gitmodules
@@ -1,3 +1,4 @@
[submodule "DbConnector"]
path = DbConnector
url = https://github.com/chaconinc/DbConnector
+ branch = stable
Submodule DbConnector c3f01dc..c87d55d:
> catch non-null terminated lines
> more robust error handling
> more efficient db routine
> better connection routine
Git will by default try to update all of your submodules when you run
git submodule update --remote
so if you have a lot of them, you may want to pass the name of just the submodule you want to try to update.Working on a Submodule
It’s quite likely that if you’re using submodules, you’re doing so because you really want to work on the code in the submodule at the same time as you’re working on the code in the main project (or across several submodules). Otherwise you would probably instead be using a simpler dependency management system (such as Maven or Rubygems).
So now let’s go through an example of making changes to the submodule at the same time as the main project and committing and publishing those changes at the same time.
So far, when we’ve run the
git submodule update
command to fetch changes from the submodule repositories, Git would get the changes and update the files in the subdirectory but will leave the sub-repository in what’s called a “detached HEAD” state. This means that there is no local working branch (like “master”, for example) tracking changes. So any changes you make aren’t being tracked well.In order to set up your submodule to be easier to go in and hack on, you need do two things. You need to go into each submodule and check out a branch to work on. Then you need to tell Git what to do if you have made changes and then
git submodule update --remote
pulls in new work from upstream. The options are that you can merge them into your local work, or you can try to rebase your local work on top of the new changes.First of all, let’s go into our submodule directory and check out a branch.
$
git checkout stableSwitched to branch 'stable'
Let’s try it with the “merge” option. To specify it manually, we can just add the
--merge
option to ourupdate
call. Here we’ll see that there was a change on the server for this submodule and it gets merged in.$
git submodule update --remote --mergeremote: Counting objects: 4, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 4 (delta 2), reused 4 (delta 2)
Unpacking objects: 100% (4/4), done.
From https://github.com/chaconinc/DbConnector
c87d55d..92c7337 stable -> origin/stable
Updating c87d55d..92c7337
Fast-forward
src/main.c | 1 +
1 file changed, 1 insertion(+)
Submodule path 'DbConnector': merged in '92c7337b30ef9e0893e758dac2459d07362ab5ea'
If we go into the DbConnector directory, we have the new changes already merged into our local
stable
branch. Now let’s see what happens when we make our own local change to the library and someone else pushes another change upstream at the same time.$
cd
DbConnector/$
vim src/db.c$
git commit -am'unicode support'
[stable f906e16] unicode support
1 file changed, 1 insertion(+)
Now if we update our submodule we can see what happens when we have made a local change and upstream also has a change we need to incorporate.
$
git submodule update --remote --rebaseFirst, rewinding head to replay your work on top of it...
Applying: unicode support
Submodule path 'DbConnector': rebased into '5d60ef9bbebf5a0c1c1050f242ceeb54ad58da94'
If you forget the
--rebase
or--merge
, Git will just update the submodule to whatever is on the server and reset your project to a detached HEAD state.$
git submodule update --remoteSubmodule path 'DbConnector': checked out '5d60ef9bbebf5a0c1c1050f242ceeb54ad58da94'
If this happens, don’t worry, you can simply go back into the directory and check out your branch again (which will still contain your work) and merge or rebase
origin/stable
(or whatever remote branch you want) manually.If you haven’t committed your changes in your submodule and you run a submodule update that would cause issues, Git will fetch the changes but not overwrite unsaved work in your submodule directory.
$
git submodule update --remoteremote: Counting objects: 4, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 4 (delta 0), reused 4 (delta 0)
Unpacking objects: 100% (4/4), done.
From https://github.com/chaconinc/DbConnector
5d60ef9..c75e92a stable -> origin/stable
error: Your local changes to the following files would be overwritten by checkout:
scripts/setup.sh
Please, commit your changes or stash them before you can switch branches.
Aborting
Unable to checkout 'c75e92a2b3855c9e5b66f915308390d9db204aca' in submodule path 'DbConnector'
If you made changes that conflict with something changed upstream, Git will let you know when you run the update.
$
git submodule update --remote --mergeAuto-merging scripts/setup.sh
CONFLICT (content): Merge conflict in scripts/setup.sh
Recorded preimage for 'scripts/setup.sh'
Automatic merge failed; fix conflicts and then commit the result.
Unable to merge 'c75e92a2b3855c9e5b66f915308390d9db204aca' in submodule path 'DbConnector'
You can go into the submodule directory and fix the conflict just as you normally would.
- Composer{PHP Only}:
Composer is a tool for dependency management in PHP. It allows you to declare the dependent libraries your project needs and it will install them in your project for you.
Dependency management
Composer is not a package manager. Yes, it deals with “packages” or libraries, but it manages them on a per-project basis, installing them in a directory (e.g.
vendor
) inside your project. By default it will never install anything globally. Thus, it is a dependency manager.This idea is not new and Composer is strongly inspired by node’s npm and ruby’s bundler. But there has not been such a tool for PHP.
The problem that Composer solves is this:
a) You have a project that depends on a number of libraries.
b) Some of those libraries depend on other libraries.
c) You declare the things you depend on.
d) Composer finds out which versions of which packages need to be installed, and installs them (meaning it downloads them into your project).