Tracing SQL Server JDBC Unicode Parameter Behavior Inside the Library

Where the Problem Started

The existence of a configuration option does not guarantee that it is applied at the point one expects. When parameter binding is decided inside a database driver, there can be a gap between documented behavior and the actual call path. This post traces why setSendStringParametersAsUnicode=false did not behave as expected inside the SQL Server JDBC driver.

As a follow-up to the SQL Server VARCHAR index issue, I needed a way to leave parameters as the existing nvarchar values while querying parameters that use an index as varchar/char.

One option is to send all string parameters as varchar by adding setSendStringParametersAsUnicode=false to the JDBC URL.

When that is set, parameters that used to be sent as nvarchar are all sent as varchar. But a few columns in some DBs are exceptionally declared as nvarchar. To handle those, I used the @Nationalized annotation. With that in place, even if the JDBC URL declares that parameters should be sent as varchar, the attribute’s isNationalized value is set to true, so an nvarchar type query is generated.

The problem here is that when saving, the query is sent as nvarchar, so all unicode special characters are saved properly. But when reading this column value from some DBs, the driver raises Could not extract column x from JDBC ResultSet - The conversion from varchar to NCHAR is unsupported.

Implementation Path

public void contributeTypes(TypeContributions typeContributions, ServiceRegistry serviceRegistry) {
    // by default, not much to do...
    registerColumnTypes( typeContributions, serviceRegistry );
    final NationalizationSupport nationalizationSupport = getNationalizationSupport();
    final JdbcTypeRegistry jdbcTypeRegistry = typeContributions.getTypeConfiguration().getJdbcTypeRegistry();
    if ( nationalizationSupport == NationalizationSupport.EXPLICIT ) {
       jdbcTypeRegistry.addDescriptor( NCharJdbcType.INSTANCE );
       jdbcTypeRegistry.addDescriptor( NVarcharJdbcType.INSTANCE );
       jdbcTypeRegistry.addDescriptor( LongNVarcharJdbcType.INSTANCE );
       jdbcTypeRegistry.addDescriptor( NClobJdbcType.DEFAULT );
    }

When @Nationalized is declared, NVarcharJdbcType is added to the jdbcTypeRegistry in the contributeTypes method of SQLServerDialect.java. Then, when Hibernate tries to map db values → Java objects, the extract function in BasicExtractor.java, which is used for that mapping,

@Override
public J extract(ResultSet rs, int paramIndex, WrapperOptions options) throws SQLException {
    final J value = doExtract( rs, paramIndex, options );
    if ( value == null || rs.wasNull() ) {
       if ( JdbcExtractingLogging.LOGGER.isTraceEnabled() ) {
          JdbcExtractingLogging.logNullExtracted(
                paramIndex,
                getJdbcType().getDefaultSqlTypeCode()
          );
       }
       return null;
    }

takes the abstract function doExtract from NVarcharJdbcType.

@Override
public <X> ValueExtractor<X> getExtractor(final JavaType<X> javaType) {
    return new BasicExtractor<>( javaType, this ) {
       @Override
       protected X doExtract(ResultSet rs, int paramIndex, WrapperOptions options) throws SQLException {
          return javaType.wrap( rs.getNString( paramIndex ), options );
       }
       @Override
       protected X doExtract(CallableStatement statement, int index, WrapperOptions options) throws SQLException {
          return javaType.wrap( statement.getNString( index ), options );
       }
       @Override
       protected X doExtract(CallableStatement statement, String name, WrapperOptions options) throws SQLException {
          return javaType.wrap( statement.getNString( name ), options );
       }
    };
}

Looking at the getNString function,

@Override
public String getNString(int columnIndex) throws SQLException {
    loggerExternal.entering(getClassNameLogging(), "getNString", columnIndex);
    checkClosed();
    String value = (String) getValue(columnIndex, JDBCType.NCHAR);
    loggerExternal.exiting(getClassNameLogging(), "getNString", value);
    return value;
}

the driver tries to retrieve the stored value as NCHAR.

I added the @Nationalized annotation to handle columns that were declared as NVARCHAR at the request of some clients. But when most database columns that are actually stored as VARCHAR are converted this way, The conversion from varchar to NCHAR is unsupported. gets raised.

If we dig into where the error is ultimately thrown,

dtv.java

Object getValue(DTV dtv, JDBCType jdbcType, int scale, InputStreamGetterArgs streamGetterArgs, Calendar cal,
        TypeInfo typeInfo, CryptoMetadata cryptoMetadata, TDSReader tdsReader, SQLServerStatement statement) throws SQLServerException {
    SQLServerConnection con = tdsReader.getConnection();
    Object convertedValue = null;
    byte[] decryptedValue;
    boolean encrypted = false;
    SSType baseSSType = typeInfo.getSSType();
    // If column encryption is not enabled on connection or on statement, cryptoMeta will be null.
    if (null != cryptoMetadata) {
        assert (SSType.VARBINARY == typeInfo.getSSType()) || (SSType.VARBINARYMAX == typeInfo.getSSType());
        baseSSType = cryptoMetadata.baseTypeInfo.getSSType();
        encrypted = true;
        if (aeLogger.isLoggable(java.util.logging.Level.FINE)) {
            aeLogger.fine("Data is encrypted, SQL Server Data Type: " + baseSSType + ", Encryption Type: "
                    + cryptoMetadata.getEncryptionType());
        }
    }
    // Note that the value should be prepped
    // only for columns whose values can be read of the wire.
    // If valueMark == null and isNull, it implies that
    // the column is null according to NBCROW and that
    // there is nothing to be read from the wire.
    if (null == valueMark && (!isNull))
        getValuePrep(typeInfo, tdsReader);
    // either there should be a valueMark
    // or valueMark should be null and isNull should be set to true(NBCROW case)
    assert ((valueMark != null) || (valueMark == null && isNull));
    if (null != streamGetterArgs) {
        if (!streamGetterArgs.streamType.convertsFrom(typeInfo))
            DataTypes.throwConversionError(typeInfo.getSSType().toString(), streamGetterArgs.streamType.toString());
    } else {
        if (!baseSSType.convertsTo(jdbcType) && !isNull) {
            // if the baseSSType is Character or NCharacter and jdbcType is Longvarbinary,
            // does not throw type conversion error, which allows getObject() on Long Character types.
            if (encrypted) {
                if (!Util.isBinaryType(jdbcType.getIntValue())) {
                    DataTypes.throwConversionError(baseSSType.toString(), jdbcType.toString());
                }
            } else {
                DataTypes.throwConversionError(baseSSType.toString(), jdbcType.toString());
            }
        }

boolean convertsTo(JDBCType jdbcType) {
    return GetterConversion.converts(this, jdbcType);
}

static final boolean converts(SSType fromSSType, JDBCType toJDBCType) {
    return conversionMap.get(fromSSType.category).contains(toJDBCType.category);
}

The current situation is that fromSSType(DB declared type) is varchar and toJDBCType(Application declared type) is NCHAR. In this case, SSTYPE becomes CHARACTER, and because NCHAR is not in the map, the error occurs.

How I Verified It

The possible solutions I considered were all unsatisfying:

Remove @Nationalized, accept that some special characters cannot be saved/read correctly, and treat everything as varchar
At runtime, when executing a select query, inspect the column type and forcibly change the descriptor in jdbcTypeRegistry
Keep the columns declared as nvarchar in a separate table and do step 2
Remove the @Nationalized annotation and use native query in CUD query to force type casting to nvarchar (of course, since reads are done as varchar, there will be defects for some special characters in columns declared as nvarchar)
Clone the mssql-jdbc library and create v2 If I were to modify the driver, adding SSType.Category.NCHAR to CHARACTER in http://m.microsoft.sqlserver.jdbc.SSType.GetterConversion looked like the relevant point. Looking at http://m.microsoft.sqlserver.jdbc.JDBCType.UpdaterConversion, CHARACTER already contains SSType.Category.NCHAR, so the write path appears to avoid the same failure. That makes the read-side gap feel especially unfortunate.

Below is source code that forcibly adds NCHARACTER to the CHARACTER enum map of GetterConversion. Since it directly mutates library internals, it absolutely needs strict review before use.

Reflection, Unsafe, immutable enum internals, and error suppression all make this unpleasant. If there is a cleaner approach, I would much rather use that.

import jakarta.annotation.PostConstruct
import org.springframework.context.annotation.Configuration
import sun.misc.Unsafe
import java.lang.reflect.Field
import java.lang.reflect.Modifier
import java.util.Collections
import java.util.EnumSet

@Configuration
class SSTypeConfiguration {

    @PostConstruct
    fun modifyCharacterEnumSet() {
        try {
            val getterConversionClass = Class.forName("com.microsoft.sqlserver.jdbc.SSType\$GetterConversion")
            val characterField: Field = getterConversionClass.getDeclaredField("CHARACTER")
            characterField.isAccessible = true

            val characterEnum = characterField.get(null)

            val categoriesField: Field = characterEnum.javaClass.getDeclaredField("to")
            categoriesField.isAccessible = true

            val unsafeField = Unsafe::class.java.getDeclaredField("theUnsafe")
            unsafeField.isAccessible = true
            val unsafe = unsafeField.get(null) as Unsafe

            val fieldOffset = unsafe.objectFieldOffset(categoriesField)

            @Suppress("UNCHECKED_CAST")
            val enumSet = categoriesField.get(characterEnum) as EnumSet<*>

            val nCharacterCategory = Class.forName("com.microsoft.sqlserver.jdbc.JDBCType\$Category")
                .enumConstants.first { (it as Enum<*>).name == "NCHARACTER" }

            @Suppress("UNCHECKED_CAST")
            (enumSet as MutableSet<Any>).add(nCharacterCategory)

            val immutableEnumSet = Collections.unmodifiableSet(enumSet)

            unsafe.putObject(characterEnum, fieldOffset, immutableEnumSet)

            val conversionMapField = getterConversionClass.getDeclaredField("conversionMap")
            conversionMapField.isAccessible = true
            @Suppress("UNCHECKED_CAST")
            val conversionMap = conversionMapField.get(null) as MutableMap<Any, EnumSet<*>>

            val sstypeCategoryClass = Class.forName("com.microsoft.sqlserver.jdbc.SSType\$Category")
            val characterCategory = sstypeCategoryClass.enumConstants.first { (it as Enum<*>).name == "CHARACTER" }

            @Suppress("UNCHECKED_CAST")
            (conversionMap[characterCategory] as MutableSet<Any>?)?.add(nCharacterCategory)
           } catch (e: Exception) {
            // runtime에 터뜨리고 롤백과정을 밟는 것이 이상적일듯함
        }
    }
}

Driver Boundary Takeaway

Problems like this do not end at application configuration. The driver’s parameter-binding behavior has to be understood, and sometimes that means reading library source. Production performance issues often become most complex at the boundary between framework, driver, and database engine.