Refactoring Databases: Evolutionary Database Design
Scott W. Ambler, IBM
Pramod Sadalage, ThoughtWorks
I was interested to note that our shop already has most of the infrastructure for this. But it had never occurred to me to think of it in terms of ordinary refactoring, i.e., rename = add a new thing that delegates to the old, then migrate stuff, and finally remove the old. Makes it a lot less scary (but does require things like triggers to keep the two fields in sync).
Okay, enough retrospective introduction. On to the actual notes:
This stuff works in practice. It is not theory.
Lot of stuff throughout the session about politics between devs and DBAs. Glad we don’t have to deal with that…
If the devs are delivering working software every iteration, so must everyone else, including the data people. Goal is to give the data community the techniques, and eventually the tools, to do this.
What you’ll see
- Proven techniques for
- Evolutionary database development
- DBAs to be effective members of agile teams
- Improving quality of data assets
- Organizations to improve their approach to data
Can you rename a column in your production database and safely deploy the change in a single day?
- Most people have no expectations of their data people being able to accomplish the trivial
- The trivial should be trivial
The Rename Column Refactoring
- Renaming the column would break everything. So don’t do that.
- Instead, refactor it:
- Define a transition period. (Specify an end date? esp. if internal app)
- Add the second column, copy the data, add triggers, etc. At this point, from the point of view of people who care, the column is already renamed.
- After the transition period ends, remove the old column and the scaffolding.
- If people whine about the transition period being too short, e.g. we can’t possibly update our mission-critical app in two years, then how mission-critical is it, really?
- Can’t put your business at risk, but sometimes you’ve got to motivate people to change
- What happens if there are apps you don’t know about?
- You’ll take a hit on this. You’ve gotta take that hit.
- Someone in the organization has to make the conscious decision to stop screwing around.
- If people are doing data stuff, you as an IT professional should be supporting them.
- Change will be easier if you have encapsulation via e.g. stored procs. But what if you want to rename the stored proc?
- Performance hit?
- Yes, but you’re already taking a complexity hit, because your apps already have code to clean up the data. By cleaning up the schema, you can get rid of that code.
The Traditional Database
- Continuous changes not allowed
- Tools are not that good
- Production changes are rare
- Migration is a huge project, if it’s attempted at all
- Non-existent change management
- Non-existent testing
- Very few companies are doing database testing
- RDBMSes have been out for what, 30 years, and we still don’t have basic stuff like testing tools?
- There are a few, like SQLUnit, but they don’t have wide acceptance
- If this up-front stuff doesn’t work, and let’s face it, we’ve given it 20-30 years and devs still go around the DBAs, then let’s admit it doesn’t work.
The Agile Database
- Slowly and safely changing database
- Functionality added in increments
- Change management
- Facilitates automated testing
- Knowledge by developers of the functionality
- DBAs pair with developers
- Acknowledged importance of team interaction
- DBA = Role != Person
Philosophies of Agile Data Method (www.agiledata.org)
- Data is important, but it’s not the center of the universe.
- Enterprise issues are important. Our apps have to fit with those of other teams. Follow standards.
- Enterprise groups need to be flexible. It is the kiss of death to try to have a consistent repeatable process.
- Repeatable processes have nothing to do with software development success. You want repeatable results.
- Unique situation. Individuals are unique, therefore teams are unique. Problems are unique.
- Work together
- Sweet spot
Everyone should get their own copy of the database. www.agiledata.org/essays/sandboxes.html
First time deployment
- Should be handled almost exactly the same as deploying code
- Database master script to create production instance
- Branch if necessary
Change scripts for later changes, collected by day/week/iteration/etc.
- New person joins the team. Make it easy.
- Checks out from source control
- Use make/rake/ant/etc.
- “dbcreate” target to create a new, empty database
- “dbinit” target to create the tables and such
- The code is under source control; why not the schema?
- Should be run every time you test your application
Agile Model Driven Development (AMDD)
- BDUF only works in academic fantasy-land
- Cycle 0: Initial modeling sessions (days, /maybe/ a week)
- Initial requirements modeling
- Initial architectural modeling
- Reduce the feedback. Get working software sooner.
- Cycle 1..n:
- Model storming (minutes)
- Implementation (ideally test driven) (hours)
- Nobody should be doing big models up front
- Simple change to schema that improves design while retaining both behavioral and informational semantics
- Maybe the semantics change at an academic level, but not at a practical level
- Database refactorings are a subset of schema transformations, but they do not add functionality
Why DB refactoring is hard
- Your app
- Other apps you know about
- Other apps you don’t know about
- Persistence frameworks
- Other databases
- Test code
- File imports
- File exports
“Is the design of the code the best design possible to allow me to add this new feature?”
Example: change customer number to an alphanumeric customer ID
- Add CustomerID varchar field
- But you’ve got to be able to convert back and forth, to keep the old field in sync
- So you don’t allow alphanumerics during the transition period. The new field could hold them, but the old field can’t.
- After you are finally able to delete the old field, then you can start using alphanumeric IDs
Would TDD be a good idea for these refactorings?
- Theoretically, yes.
- Practically, tools may not be there yet.
- If I want to thoroughly test the DB…
- Data going in, data going out: DBUnit works for this
- What about constraints? What about triggers? Can’t assume that this stuff works.
When you’re trying to remove a field, you can try removing it, then re-creating your entire schema. The “ant dbinit” will fail if you have e.g. a view that still references that field. (Obviously you still need to find code dependencies.)
Two things to do to your SQL scripts when you make a change:
- Fix the script that creates all schema from scratch
- Add a script that upgrades from build 635 to 636. Should be hand-coded, and should be done when you make the change (so you still remember what you did).
- Alternative: don’t do #1. Your “schema from scratch” is your latest released version, and your automated tests create that and apply all the upgrade scripts to it. (Volunteered from the audience.)
- Change scripts need to be in source control.
- Ant target “dbupgrade”. Give it a from version and a to version, it gets that version of the “make fresh” script, and runs all the “upgrade” scripts against it.
Ruby on Rails has really good database migration support. You define a migration, and you can go forward and backward. Database-agnostic.
- Add the new column to be the merged data
- Remove the old columns
Database things are trivial for the most part, so let’s make them trivial.
If you’re not putting it into source control, why are you creating it? If you’re not testing it, why are you creating it?
- Remove View
- Introduce Surrogate Key
- Replace One-To-Many with Associative Table
Refactoring is continuous, deployment is controlled (the way they do it, anyway)
Allow developers to experiment
- Try multiple designs
- Change the model in their local schema
- DBA should be involved in design of functionality
- DBA should help devs design tables, indexes, tune queries for performance
- DBA and devs should pair all the time
- Putting bad data in / getting bad data out
- Database schema (what if someone drops a constraint?)
- Vendor should have a regression testing tool to sell you if not give you
Encapsulate the database
- Coupling is enemy #1
- Encapsulation strategies:
- Brute force (embedded SQL)
- Data access objects
- Persistence frameworks
- Start converting data when replacing legacy app; day 1 of the project
- If feasible, use lightweight database, e.g. mySQL, in-memory database
- Write a data generator, when needed
- New releases can be acceptance tested against current production data
- Migration is a snap, so why not deploy weekly?