So you're designing a database, and you think to yourself, "Hey, I'll use
VARCHAR(n) for some of my constraints!" Now, you have two problems.
You might have thought you needed VARCHAR(n) for:
- Space optimization
- Performance
- Correctness
And you'd be wrong on all three counts.
In PostgreSQL, a VARCHAR(n) is exactly a TEXT field with a limit on its
maximum length. This means you actually
waste a little bit
of space by storing that constraint.
It's also actually slower on writes because in addition to doing the checks
around TEXT, you're doing a length check.
Then, there's the "correctness" issue. Actual data integrity constraints, to
an excellent approximation,
never consist exactly of, "string
must be no longer than n characters," that being what VARCHAR(n) actually
says. They will either not have a length restriction, or they will have both
a minimum and a maximum length, along with some construction rules.
For example, ZIP (postal) codes in the US don't match VARCHAR(10). Instead,
they must consist of five digits and can have another section which consists
of a hyphen and four more digits. One regular expression that describes this
is:
'^\d{5}(-\d{4})?$'
So VARCHAR(10) would actually be the wrong constraint, and it would allow for
bad things like 'messed ZIP' where only things like '94607-2410' should really
be there.
When you see the VARCHAR(n) in the road, s/VARCHAR(\d+)/TEXT/g it
Until next time...
1) Don't want people throwing the kitchen sink in my fields and if you allow it they will come and be gluttons. I'm sorry David, but I really don't want people who have first names that are 1000 characters long in my data. I can live with if they have no first name. That to me is a problem.
2) Portability. WE do a lot of linked tables in MS Access to PostgreSQL and cross between SQL Server etc. Of course you wouldn't care about this. But needless to say most databases implement varchar very differently. In Access it comes in as a memo field which is rot with issues. Similar issue with SQL Server and oracle has that clob thing going on.
Same reason I don't always use varchar(max) in sql server by the way.
I don't want to spend a lot of time truncating my fields everytime I need to feed it to some god damn government agency.
So in short -- we live in a world that is not PostgreSQL centric so deal with it .
I'm sorry to have to disagree with you completely, but you missed the entire point of the piece, which is that VARCHAR(n) does not actually do anything good anywhere. If you need constraints, write ones that actually fit your use case. In no case that I've seen so far has VARCHAR(n) been a correct constraint. But heck, I've only been in the database business full time for a decade or so, so I could be wildly wrong here.
From a meta-data standpoint its convenient to look at a table definition and know ah no one can stuff more than n characters in there. Especially when you need to distribute fixed width data to those mainframe monsters.
I've been in databases longer than a decade .
your performance point is well-taken. Would be nice to see some benchmarks showing the differences in speed how much you are loosing.
I'm not saying no text fields, just that a world with all text fields is not ideal
That certain SQL engines have what I could generously describe as, "quirks,"--and the unnecessary pain they've caused me has not disposed me to such generosity--that doesn't make VARCHAR(n) a valid constraint. More of a workaround.
Thanks for the suggestion about "portability," which will be in another post or series
regards
Pavel Stehule
Good Point. That's one of the things I was trying to say, but you said it better.
"SQL defines two primary character types: character varying(n) and character(n), where n is a positive integer. Both of these types can store strings up to n characters (not bytes) in length. An attempt to store a longer string into a column of these types will result in an error, unless the excess characters are all spaces, in which case the string will be truncated to the maximum length. (This somewhat bizarre exception is required by the SQL standard.) If the string to be stored is shorter than the declared length, values of type character will be space-padded; values of type character varying will simply store the shorter string.
If one explicitly casts a value to character varying(n) or character(n), then an over-length value will be truncated to n characters without raising an error. (This too is required by the SQL standard.)"
Surely this strange behaviour when casting to it makes it a dangerous data type to use?
You'll get more consistent behaviour, and I would wager no less performance, from creating a TEXT column with a constraint on the length, surely.
(None of this is meant to argue against Regina's point about portability, I claim no expertise in this area).
Text fields are not displayed in the grids (Delphi, BCB in example). But varchars are. And the length modifier is taken into account.
I mean what width of the column in the grid should be.
its a convenient constraint to use and way more portable than what you are proposing
And varchar(25) is much shorter to write than text blah blah maxlength and way more understandable to most tools.
Not to mention the create table statement is way more shorter and meta data beautiful
But I think we've beaten this argument up. Can't wait to read you article on portability
VARCHAR(32) NOT NULL CHECK (username != '')
A username of length 33 will break things. A username of length 31 wont break things.
(I also get the feeling I should be able to make the check username > '' and get rid of the NOT NULL but I've not tested that)
As such I don't think the point of view can be as absolute as it appears.
There are constraints that soley enforce database integrity (not necessarily just referential integrity). This could be as simple as this:
...
filename VARCHAR(255) NOT NULL,
...
for a filename under NTFS, because longer names are actually invalid.
On the other hand checking for convenience belong into the application level:
...
name TEXT NOT NULL,
...
A name of more than 32 characters may be annoying or break the HTML table on your webfrontend, but it should be valid nonetheless.
This because that's not really the database's job to check whether it's in the perfect format. Constraints are meant to serve data integrity.
At least that's my opinion.
> with a limit on its maximum length. This means you
> actually waste a little bit of space by storing that
> constraint.
The column's max character length is stored only once for the column, in pg_attribute.atttypmod. So there is no wasted space by specifying the VARCHAR length.
Also, you didn't mention the 8.3+ optimization whereby short VARCHARs (length 126 or less) only need one byte per value to store the string length. TEXT values always needs 4 bytes, as do longer VARCHARs.
No argument with your other points.