Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Happy to hear Darwen dislikes 3-value logic. A language where x=x can return something equivalent to false in a common use-case is wrong.


> A language where x=x can return something equivalent to false in a common use-case is wrong.

Then almost all common languages are wrong, if you consider floating point NaNs to be a “common use-case” - and surely they must be, since it is easy to produce them using ordinary arithmetic operations.


Maybe you do different programming from me but I find NULLs far more common than NaNs.


At my day-job, I almost never use floating point, so NaNs don’t come up. The one exception to that is JavaScript, where NaNs pop up significantly more than in most other languages - parseInt() will return a NaN if you give it an invalid number, most other languages a function called “parseInt” wouldn’t return floating point.

I work on a lot of code that uses SQL NULLs, but most of the time I don’t encounter them directly, because I’m using some ORM which translates them to Java nulls (or Go nil or whatever), and many query generation frameworks handle the weirdness around querying for nulls (IS NULL vs = NULL) automatically. Most programming languages don’t give their nulls the weird equality semantics that SQL nulls have


> The one exception to that is JavaScript, where NaNs pop up significantly more than in most other languages - parseInt() will return a NaN if you give it an invalid number, most other languages a function called “parseInt” wouldn’t return floating point.

Ahh, every time this conversation about how SQL Nulls violate reflexive property of equality, people are like "so do NaNs" and I'm wondering "who are these people who are dealing with NaNs all the time?".

I'm mostly a back-end and DevOps guy, so I don't write that much JS. I didn't know that JS used NaN the way other languages would use Null or exceptions for numerical operations.

...

Also that's terrible.


I know pretty much nothing abut realtional algebra but I have written and read reasonable amount of SQL. And I find 3-value logic very handy, even if I occasionally see these arguments that it is somehow wrong? Can you ELI5 why I should dislike NULL and what should I do with my database when I do not know the value of an attribute or the attribute is not applicable for the row? (The answer to the latter might be to design a "better" schema? To that my response is that there are other measures for "goodness" that may be more important than ideological purity. Say, you may want to have all your data in one table instead of creating a new table for a thousand different subcases.)


The complaints about NULL are because its semantics are inconsistent. It is used for a number of different purposes:

– a missing value (existing but unknown), for example a person's birth date that happens to be unknown

– the non-applicability of a value, for example the spouse of a person who never married

– a combination of the above, for example the death date of a person where it is unknown whether they have died yet (but if it was known that they are alive, the death date would be NULL all the same)

– an empty set (see for example https://dbfiddle.uk/UJJgVCZ_)

Depending on the context in an SQL expression, NULL sometimes behaves like an unknown value (propagates upwards in expressions), sometimes like a non-value (doesn't match any conditions), sometimes like an empty set. This makes it unintuitive and difficult to reason about.

You are asking about what else to use. In principle you could use placeholder values (like using the date value 9999-12-31 to mean "hasn't died yet"), but that has its issues as well. The bottom line is, there is really no good alternative in SQL. And that's what people dislike about SQL. Because you could imagine a database language with a standard NULL placeholder value with consistent and straightforward semantics.


Right. If they needed NULLs, then imho full algebraic datatype tagged unions would be the natural extension of these. Now, obviously you can implement those in SQL using one column per type and one column as your discriminator tag, but that would be excruciating, especially considering how weak SQL generally is about standard libraries and reusability outside of the very fixed concepts like Views and Functions. Then you could properly define the semantics of your "missing value" as appropriate to the data.

Various SQL servers have accepted the need to support stuffing multiple data-types into a single column - MSSQL has SQL_VARIANT, Orcale has ANYDATA, SQLLITE has... well, everything in SQLLITE. So between that and the existence of NULL, the ship has already sailed. May as well make it formal and schema-defined behavior using tagged unions.

As the rest of the software industry has figured out, `Maybe<T>`s are better than NULLs.


Isn't a placeholder value clearly worse? If I want to find a list of people that died the same day, I don't want to include people whose death dates are unknown in one big group. And using a a placeholder value for people who haven't died yet is almost asking for the world to end on that date.


The placeholder value would not be a valid value for those operations, and you would get a parse-time error when attempting to group or otherwise match on values where you didn't exclude the possibility of the placeholder value. Like how in some programming languages you get a compile-time error on `foo.bar()` when `foo` may be null, but for example `if (foo != null) { foo.bar(); }` compiles, because within the `if` body `foo` is now non-optional.

Similarly, in a database language the column/field references would be typed as optional or non-optional, and you are only allowed to perform value-related operations on non-optional references. And an optional reference can be made non-optional by first passing the data to a condition equivalent to SQL's `IS NOT NULL`, for example. Or for operations that can support NULL (like maybe sorting), those would have a different form for supporting NULLs. For example, plain `ORDER BY` would cause a parse-time error when applied to a nullable column where NULL has not been filtered away, but `ORDER BY ... NULLS FIRST` (or `LAST`) would be applicable to nullable columns. Similarly for grouping, there would be options if you do want to include a NULL group, but by default it wouldn't type-check.

But you wouldn't get the nonintuitive behavior of SQL where NULLs sometimes act like NaNs, for example `NULL not in ('foo', 'bar')` evaluating to false, and `NULL in (NULL, 'foo', 'bar')` also evaluating to false.


Yous have another column which is a flag giving the stae of the attribute. Note that not known and not valid for the row are two different states. If you set the attribute to NULL - which of the two do you mean?

Actually you give the answer for attribute not applicable - and you don't get a thousand different subcases. The measure of goodness is does your db model fit the real world data.


So I end up having values in the actual column that may be complete junk and I need to check another column to know that? I would expect that to cause trouble sooner or later.


Right. Attempting that workflow would be crazy without language support.

But at the same time, proper languages that have operator overriding and custom types make that kind of thing trivial - look at how many languages have a Maybe<T> these days, which is basically automating that process.

The fact that doing it in SQL would be suicidal shows that, while SQL was a brilliant language for the '70s... it has not evolved nearly enough.


Actually, systems that don't crutch on the "Law of Excluded Middle" are more robust and realistic than ones that do.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: