As I wrote about last month, following an initial presentation about code club to the year 5 assembly at Ridgeway primary school in Croydon, we’ve now started our first term of teaching and we’re 2 weeks into the course. I wanted to share some feedback about lessons learnt to date.
1. The assembly was a great idea, it enthused the children and got us lots of sign ups for the club. We offered 17 places (the total number of machines in the IT suite) and we got 35 applicants from about 90 or so children in year 5 (we only did the assembly for this year group).
2. 17 is a slightly higher number than code club recommend as ideal, but we thought we’d give it a go. I have one of the children’s teachers with me in the class, able to help out with questions, and this makes a real difference. We do get a lot of questions when we’re doing the exercises, but not more than we can handle. If I was on my own though, or if I didn’t have the teacher with me, this would probably prove difficult.
3. The children absolutely love the badges and the club signs. Just like they say on the website, it gives them an identity with the club. I made up all the badges with conference style laminated clips and they all love clipping these to different areas of their clothing! The signs are much the same and other children have been going past the room saying “oooh, what’s that, code club, wonder what that is”, which can’t be a bad thing either.
4. I have a perfect 50/50 split of girls to boys (OK I have an off number of children but as close to 50/50 as you can do with such). As a father of 2 girls I’m especially happy about this. All we need to do now is get this ratio up in the IT industry!
5. My classes are one hour long and I make sure I get there 20 minutes before to set up and confirm all the machines are OK and the projector for my machine is OK. I also make sure I do all the printing and collating of materials the night before and am completely ready to go as soon as the children arrive. The hour feels short and there’s no time to waste with admin!
6. In the first class I did an intro about scratch generally as per the code club materials and did some initial demos. This worked well and took about 30 minutes. I then let the children try the first project “Felix and Herbert”. This went OK, but we ran out of time before many of the children could finish. With hindsight, I’d have done more demos and interactive teaching with the children in the first class and then started the full project in the second week.
7. I use a generic school account on the scratch website so that all the children upload their work to the same place (prefixing their names on the projects). Although they’re welcome to have their own accounts, and some already have done so, it provides a nice view to the school of what we’ve achieved, all in one place. Check out what’s up there from week 2 projects
8. I let the children experiment. Straight away from minute 1 on the first class they were trying their own things, making all sorts of sounds and little mini programs and sprites and laughing and joking. I think this is a great thing! I’ve heard others say that you shouldn’t allow this and that you should make them stick exactly to the project lists and not let them deviate. I personally strongly disagree with this and I think the children like the freedom. I do ensure that when I’ve a key point to make I get them to gather round on the carpet do a demo, for example yesterday I talked alot about variables and why they were so important and key to the project.
9. Despite point 8 above, I do encourage them to use the code club project materials. They are fantastically well designed, and with the tick boxes allow the children to keep good progress notes of how far they’ve gone with the project. What I observed yesterday in week 2 was that although some deviated to do their own thing slightly (I actually told them they could) by the end everyone was really focuses on following the project notes anyway.
10. In week 2 I had a much better timed lesson. I did a 5 minute recap of week 1, then I did a chat about variables as they were a key point to learn from week 2, and we did some collaborative coding with the children using my teaching / projector machine to do the first page or 2 of that weeks’ project. Then I gave them all 40 minutes to do the project themselves. Myself and the teacher then just answered questions as they came up and everything went really well. 5 minutes before the end I made sure that everyone had saved and uploaded something to the website to ensure that they have something to show to friends and parents.
All in all I can already say that this is one of the most rewarding things I’ve done in many years, and if you’re thinking about volunteering you need to do so immediately! It’s completely fantastic and the quality of the code club materials is fantastic. I can’t wait for next week!
Whilst I was waiting for some very long running reporting code to run recently (a 365 million row table in tempdb was never going to be a good thing, but it wasn’t my doing, the business logic was forced upon me!) I was browsing various SQL 2012 items of interest and noticed that there was a new DMV to query the memory dumps produced by SQL Server
When looking into it further I realised that it had been introduced in SQL 2008 R2 as well and I’d never even noticed! This is ironic as in all the courses and talk I do on troubleshooting I’m always nagging people to check their error log directory in case they have 10000 mini dumps flooding their disks (insert large number of your choice here but I think that 10000 was the largest amount that I saw over my time at Microsoft). Now I just need to build this check into automated scripts. I guess the natural way would be to add it into the perfstats scripts or create a hybrid of that, which is something to add to my ever growing to-do list
I guess another nice way to do it would be to create a job which emailed the DBA when dumps were encountered (although you could argue a good DBA would already have noticed!)
I’m looking to recruit a SQL Server production DBA for my current employer, dotDigitalGroup. If you fancy coming and working on a big sector leading SaaS project with some very intelligent developers and technicians, check out the advert here
and our dedicated job site with details about the team here
(Update from Monday morning, the workaround shown below works for me)
Another day, another set up problem (sigh) I was installing a second SQL 2008 R2 instance on a new cluster today. The first one had gone well without incident but the second one had a crucial change in that it used mount points. It was a fairly complex install with 8 mount points under the root volume and it took a while to complete all the setup configurations. When it got started it failed with the following selection of errors:
1. A dialog box with the error
Wait on database engine recovery handle failed. Check the SQL Server error log for potential causes.
2. The setup finished and pointed me to the usual error log files in the setup directories. In the summary log file I saw an equivalent error
Detailed results: Feature: Database Engine Services Status: Failed: see logs for details MSI status: Passed Configuration status: Failed: see details below Configuration error code: 0x4BDAF9BA@1306@24 Configuration error description: Wait on the Database Engine recovery handle failed. Check the SQL Server error log for potential causes. Configuration log: C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Log\20120420_114647\Detail.txt
3. When opening the referred to detail.txt file I found pretty much the same again
2012-04-20 12:17:28 Slp: Configuration action failed for feature SQL_Engine_Core_Inst during timing ConfigRC and scenario ConfigRC. 2012-04-20 12:17:28 Slp: Wait on the Database Engine recovery handle failed. Check the SQL Server error log for potential causes. 2012-04-20 12:17:28 Slp: The configuration failure category of current exception is ConfigurationFailure 2012-04-20 12:17:28 Slp: Configuration action failed for feature SQL_Engine_Core_Inst during timing ConfigRC and scenario ConfigRC.
4. The text file pointed me to the SQL Server error log of the partially installed instance. When I opened this, I found the “real” cause
2012-04-20 16:00:51.51 spid8s Clearing tempdb database. 2012-04-20 16:00:51.52 spid8s Error: 5123, Severity: 16, State: 1. 2012-04-20 16:00:51.52 spid8s CREATE FILE encountered operating system error 5(failed to retrieve text for this error. Reason: 15100) while attempting to open or create the physical file 'O:\data\tempdb.mdf'. 2012-04-20 16:00:51.52 spid8s Error: 5123, Severity: 16, State: 1. 2012-04-20 16:00:51.52 spid8s CREATE FILE encountered operating system error 5(failed to retrieve text for this error. Reason: 15105) while attempting to open or create the physical file 'O:\data\tempdb.mdf'.
The SQL Server error code 5123 is why the instance couldn’t start, as it’s unable to create a tempdb database data file, and no SQL Server can start without tempdb being present. The “real” error though is the reason for this occurring which is error code 5 from the OS when SQL runs the createfile API. This is one that everyone probably recognises which is “access denied”. So the bottom line here was that I was getting access denied on the root of my mount point for tempdb (which is where I had placed the data files as part of setup configuration).
I checked through the other parts of setup and the rest of the system databases had managed to write to the root of other mount point drives (which seemed strange), and of more relevance, I noted that the SQL Server service accounts had been granted full access to the mount point root directories as part of the setup, so theoretically there ought not to be a permission error!
I spent a few hours digging around, tried the install again with new mount points and a clean position, but encountered the exact same problem. Eventually I figured that it was due to the fact that there are “issues” with the permissions on root mount points. This is distinctly not well documented within official MS sources (or not that I could find) and certainly not within the official pre-reqs.
The best I could find was a connect item here
a blog post here
and a forum post here
However the accepted workaround is that you should create a sub directory under the root of the mount point and then you’ll be fine. I’ll be trying this method next week as its 8pm on a Friday now and time for the weekend. If it works I’ll probably post a bug requesting that the documentation gets changed to point this out, as it would have me a whole lot of time.
As always, good luck with your setup……
I was discussing with some developers this week why SQL Server collation mattered and what are typical things that can go wrong. This is a huge topic and is an often over looked feature of database and server design, especially when you want to build applications that run in multiple geographic locations and that should accept multiple language inputs. Being as I’ve worked extensively with the Finnish Swedish collation from my time working in Sweden, I have a reasonable level of understanding of some of the issues that can come up.
I wrote the following script to demonstrate an example of what can happen if you don’t get your collations correct. It’s probably the most common example I’ve seen over the years.
1. You have a SQL Server that is running one of the main Latin collations for English, be it UK, US, current windows collation or legacy SQL Server ones.
2. You have data from multiple languages stored in unicode columns such as nvarchar
3. You sort or search on the data and those people who have used data from non English languages that have special or extra characters in the alphabet, do not get the results that they expect.
This script only shows a fraction of the things that can actually go wrong, but it provides a simple demonstration of sorts and searches producing unexpected results. It creates its own database and tables on any test server you may choose to run it on (and then deletes them afterwards). It will work as a demo on any server as long as it’s not already running a Finnish Swedish collation.
use master go --i'm creating a non-default latin collation as per the legacy product I was looking at --in this case however the results are the same as if you're running the more common default windows server collation 'Latin1_General_CI_AS' create database collation_test collate SQL_Latin1_General_CP1_CI_AS go use collation_test go select SERVERPROPERTY('collation') as 'server collation' select collation_name as database_collation from sys.databases where name = 'collation_test' if exists ( select name from sys.objects where name = 'collation_test' and type = 'u' ) begin drop table collation_test end create table collation_test ( test nvarchar(10) ) set nocount on go insert collation_test (test) values ('aaaa') insert collation_test (test) values ('ääää') insert collation_test (test) values ('åååå') insert collation_test (test) values ('öööö') insert collation_test (test) values ('bbbb') insert collation_test (test) values ('zzzz') set nocount off go print 'select the results from the tables in differing collations' print 'in Swedish alphabet the characters åäö follow after Z as characters 27 to 29' print'' select test as "order in latin collation" from collation_test order by test select test as "order in Finnish_Swedish_CI_AS collation" from collation_test order by test collate Finnish_Swedish_CI_AS print 'do searches on the table' print '' select test as "search in latin collation" from collation_test where test > 'b' select test as "search in Finnish_Swedish_CI_AS collation" from collation_test where test > 'b' collate Finnish_Swedish_CI_AS --clean up use master go drop database collation_test go
The bottom line is that if you want to do collations and mixed languages properly, you need to think about this stuff carefully at the design stage.
I was writing lots of TSQL today for something I was working on, which in itself is a rather rare occasion since I do so much more infrastructure and architecture stuff nowadays. I needed to write a script to populate a table with a series of rows containing sequential dates. I had to create a table which had 2 columns in it, one was [date] datatype and the other needed to be a 6 character string representing the year and the month as numbers concatenated together, for example 15th March 2012 would become 201215.
Then this table needed several rows covering every day over a number of years. Here’s the code just in case anyone fancies reusing it.
create table #dates ( CustomMonth char(6), lookupdate date ) declare @date_increment date --change your start date here if you fancy set @date_increment = '2005-01-01' --change your end date here if you fancy while @date_increment < '2012-12-31' begin set @date_increment = DATEADD(day,1,@date_increment) insert #dates (CustomMonth, lookupdate) select convert(char(4),(datepart(year,@date_increment))) + RIGHT('0' + CONVERT(VARCHAR(2), DATEPART(MM, @date_increment)), 2), @date_increment end --check it looks ok select * from #dates
Previously I wrote about my efforts to automate the decryption of files with SSIS using the gpg2.exe which comes as part of the GnuPG package. The original article is here
However after deploying the working solution into production, to be run as a scheduled task, I found out that this package and solution still had some issues. I found that it was behaving rather differently when it was deployed into the production environment as opposed to running in the BIDS environment. When executing the exact same code in production which worked processing the exact same files in development (and I mean the exact same, same account, same files, same everything) I got error which looked like this (sanitised for security)
Error: 2012-03-13 11:16:07.10 Code: 0xC0029151 Source: run decrypt Execute Process Task Description: In Executing "C:\Program Files (x86)\GNU\GnuPG\gpg2.exe" "--batch --passphrase-fd 0 --decrypt-files [myfilename]" at "", The process exit code was "2" while the expected was "0". End Error DTExec: The package execution returned DTSER_FAILURE (1). Started: 11:16:02 Finished: 11:16:07 Elapsed: 4.609 seconds. The package execution failed. The step failed.
So I was again getting error code 2 which I had previously, for which the –batch switch had previously resolved the issue in development. So the error code was the same, but the reason obviously had to be different now. This required a little more investigation to get to the bottom of. Firstly I ran Process Monitor, which is often my first port of call in such scenarios, to check whether I was hitting some obscure permissions errors when running in live with the SQL Agent. It turned out totally clean (As an aisde I had done the same when initially installing GnuPG to resolve and issue that it couldn’t access a temp directory it required to do decryption).
A bit of research through the web and the full documentation of GnuPG left me using a further switch:
which allowed me to look at some of the status messages from the output which were previously being swallowed by the SSIS task when run in production. There was SSIS logging enabled but it wasn’t getting anything back from the gpg2 executable beyond the status code.
I used a couple of different versions of this switch which looked like this
gpg2 --batch --status-fd 2 c:\gk\test\output.txt --decrypt-files test.gpg
which outputs the status messages to c:\gk\test\output.txt, or you can do this
gpg2 --batch --status-fd 2 --decrypt-files test.gpg
which outputs the messages to the console
Either way you end up with the following output (again slightly sanitised)
[GNUPG:] FILE_START 3 test.gpg
[GNUPG:] ENC_TO [hexstring] 1 0
[GNUPG:] USERID_HINT [hexstring] [mykeyname] (mykeyname) <firstname.lastname@example.org>
[GNUPG:] NEED_PASSPHRASE [hexstring] [hexstring] 1 0
[GNUPG:] PLAINTEXT 62 [integer] test_.txt
[GNUPG:] PLAINTEXT_LENGTH 901226
but unfortunately this still didn’t give me anything to go on, as it still worked in the test environment, but not in the production one. Eventually by playing with the logging levels and these switches in production I got the details out in the SSIS log which contained this ket string
gpg: decryption failed: No secret key
I then realised that I was being an idiot and that the service account that I was running the SQL Agent under did not have the certificate registered under that userid. I had only imported the certificate into Kleopatra for the development userid I logged in with and not for the service account. I simply imported the certificate to the service account profile and then everything worked. This meant that the original instructions and code were valid, but I thought I’d put this post up in case anyone did the same stupid thing as me. It’s worth remembering that the certificates are by default imported at a user level into Kleopatra.
Update 2012-03-13 – If you still get error code 2 after running this code in production (but it continues to work in your development environment) you might like to look at the subsequent post I did about further troubleshooting of this issue
This is something that I spent a few hours on recently which I wanted to share. The requirement here is to create a SSIS task to automatically decrypt a variable number of files that have been encrypted with PGP. This task will live within a larger SSIS package which does other typical SSIS tasks; fetching files from FTP, moving them around a file system, streaming them into a database and so forth.
The key here is that the task needs to be completely automated so that no user interaction is required , i.e. typing in the passphrase or other such matters. Whilst working this out I was browsing around the web and found various solutions but none was 100% perfect for my particular requirements. Initially all the options I tried either required me to enter the passphrase or returned error codes even on success. This post assumes a certain familiarity with SSIS development and PGP.
The PGP tool I used was the latest GPG4WIN installed to the default location (this means that the actual executable is:
C:\Program Files (x86)\GNU\GnuPG\gpg2.exe
The PGP files I was receiving were encrypted with the public key I had passed to the external source, and were simply decrypted using the GUI or the command line if I was prepared to type in the passphrase.
The way I automated this in SSIS was as follows:
Create a Foreach Loop to allow the processing of multiple files. The collection properties looked like this:
The variable mapping look like this
Inside this Foreach Loop I create an Execute Process Task.The process properties look like this:
The Expressions properties look like this.
It’s important to note that the arguments property on the process page are set by the expression, not hard coded, although they subsequently appear here. It’s critical to form the arguments in the expression builder to get them to work properly. The expression in text format is:
“–batch –passphrase-fd 0 –decrypt-files ” + @[User::receive_file_name]
Part of this syntax is undocumented in the GPG help files and had to be picked from the web. The confusion that I had was that I found an article which used gpg.exe and not gpg2.exe and mine version seemed to behave differently. The passphrase here is held in a variable in the package and then passed to the command line as the StandardInputVariable. This is what the [-fd 0] string achieves in the syntax. However, this still doesn’t work properly unless you pass the –batch parameter. If you don’t pass –batch then you still get challenged for the passphrase. If you run the package in debug mode you get the dialog box challenge, which you can then type into, but if you run in production mode, the task just fails with error code 2.
Whilst looking at this problem I also experimented with storing the passphrase in a file and various other options. Although the above option worked for me, I also noted a custom component is available for purchase at http://www.cozyroc.com/ssis/pgp-task which might be wort investigating if you have a budget to acquire the tools and an enterprise grade ETL requirement.
Here’s something I was up to during the week. It’s a fairly mundane task, but despite that one which is very important, both to the client I was doing it for, and for most people generally who run this type of configuration.
The databases I was working with were part of a true 24×7 configuration where any downtime whatsoever has the potential to lose data (email marketing where every remote click event is tracked, meaning that you can’t even take planned downtime as you can’t control email receipts opening and clicking emails).
The systems in question run a fairly standard database mirroring configuration, 2 physical servers (principal and mirror partners), mirroring multiple databases in high safety mode with auto failover determined by the quorum of a third witness server. The task in question was to run windows update on the 2 partner servers and then apply SP3 for SQL 2008 to bring it up to build 5500.
The guys who had been running these servers previously told me that normally they just patched the mirror partner before failing over and then patching the new mirror (which was previously the principal). This is the standard methodology of a rolling upgrade within a mirroring configuration, but it missed one important step. I’m incredibly risk averse in all situations, and in this scenario it’s essential to remove the mirror witness before starting this process as if you don’t you have the small potential risk that half way through the upgrade and patching process you might suddenly find your mirror partnership failing over.
In all fairness this is a quite unlikely scenario, as it would require a failure at the principal at the exact point in time that patch process was running. It was also require a theoretical problem with all the servers managing their quorum, as they ought to still deal with such a failure properly, but after many years in the business and particularly after many years within Microsoft support, I’ve had the unfortunate experience of experiencing a wide range of very obscure failures across the SQL Server product set, and a mirroring split brain is one of them.
A split brain can very simply be described as a scenario where both partners believe that they are either the principal or the mirror, therefore invalidating the partnership. If you ever get in this scenario it’s extremely horrible and sometimes (again speaking from experience) you are obliged to do some rather dirty troubleshooting to recover the situation.
Sometimes my experiences at Microsoft support can scare people and skew their view of the product, as all we ever used to deal with in the escalation team was obscure problems or bugs that didn’t normally occur and couldn’t easily be solved. This means that whenever someone asks me about a certain procedure I’ve normally seen it break in a really horrible way! 99.9% of the time in production scenarios this doesn’t happen of course, but the moral of this story is that it makes me very risk averse.
So back to the point in hand, if you want to be risk averse when patching mirror partnerships, the thing to do first is to remove the witness and thereby drop back to high safety WITHOUT auto failover, meaning that if something stupid happens whilst patching, the mirroring won’t try to failover and mess things up further.
To achieve this process in a controlled fashion, here are my step by step instructions (remember if you mirror multiple database you need to run the scripts for each DB)
1. Disable witness
ALTER DATABASE [your database] SET WITNESS OFF go
2. Patch current mirror
3. Reboot current mirror if necessary
4. Failover from principal to newly patching mirror
--RUN on THE COMMANDS ON THE CURRENT PRINCIPAL TO MOVE TO OTHER SERVER ALTER DATABASE [your database] SET PARTNER FAILOVER
5. there will be a short outage whilst the new principal comes online
6. patch the current mirror (original principal)
7. reboot current mirror
8. fail back to original principal server (if required – this is optional)
9. add back the witness
ALTER DATABASE [your databse] SET WITNESS = 'TCP://[your FQDN]:[your port]' go
10. you’re good to go
Good luck with your patching!
I published a blog post with Basefarm last week about tracing select statements in SQL Server without using profiler. It demonstrates how to use SQL Audit (based on extended events) to record any select statements issues on particular tables over a prolonged time period in a scalable lightweight way. Although SQL Audit was designed as it name suggests, to provide an audit framework for the product, I find it can be quite useful in other ways as well. In this example we were helping developers effectively check their code base for unexplained access to a particular object.
You can read the full post here: