Azure Synapse Analytics overlooked features


There are some options in Azure Synapse Analytics that are not obvious that I wanted to point out.

While in Azure Synapse Studio, going to the Data hub and clicking the “+” at the top to the right of Data brings up:

Synapse SQL database 
Connect to external data 
Integration dataset 
Browse samples

Under “Workspace”, choosing “Synapse SQL Database” gives you the ability to create a database in a SQL on-demand pool. A SQL on-demand pool (which you access via an endpoint) can have multiple databases. Note that SQL on-demand has no local storage so only metadata objects are stored in the databases.

After you create a database, if you then go to the Develop hub and create a new SQL script, you can choose to connect to “SQL on-demand”, and then the newly created database will be available under “Use database”. You can then create views or external tables in that database.  When you go to the Data hub, under Databases you will see the newly created database (you might need to hit refresh first).

Under “Linked”, choosing “Connect to external data” will create a linked service to ADLS Gen2 or Cosmos DB that you can use in the Data hub (under “Linked”).  This is the same method as going to the Manage hub and choosing “Linked services” and clicking “New” (but that option has more sources to connect to). These linked services can also be used in Azure Data Factory (see Linked Services in Azure Data Factory). 

Under “Linked”, choosing “Integration dataset” will allow you to create a dataset that can be used in pipeline activities and in data flows.  When creating the dataset you will need to specify a linked service to use for the dataset (which you could of done via “Connect to external data”), or you can create a new linked service (see Datasets in Azure Data Factory).

Under “Linked”, choosing “Browse samples” will give you a ton of samples to choose from broken into four sets:

  • Datasets: A bunch of datasets that when added will show up under Data -> Linked -> Azure Blob Storage.  You can then choose an action (via “…” next to any of the containers in the dataset) and choose New SQL script -> Select TOP 100 rows to examine the data as well as choose “New notebook” to load the data into a Spark dataframe.  Any dataset you add is a linked service to files in a blob storage container using SAS authentication.  You can also create an external table in a SQL on-demand pool or SQL provisioned pool to each dataset via an action (via “…” next to “External tables” under the database, then New SQL script -> New external table) and then query it or insert the data into a SQL provisioned database
  • Notebooks: A bunch of sample Apache Spark notebooks in various languages (PySpark/Scala/Spark.NET C#/SparkSQL) that will show up under Develop -> Notebooks
  • SQL scripts: A bunch of sample SQL scripts that will show up under Develop -> SQL scripts
  • Pipelines: This is similar to choosing the “Create pipeline from template” on the home page of a Azure Data Factory workspace and will create a pipeline that will show up under Orchestrate -> Pipelines

These samples offer a great way to shortcut the process of building a solution in Synapse.

The post Azure Synapse Analytics overlooked features first appeared on James Serra's Blog.

Original post (opens in new tab)
View comments in original post (opens in new tab)