standard SQL to support pagination #Indeed

Context — In the Indeed SDI, each company page shows the latest “reviews” or “articles” submitted by members. When you scroll down (or hit Next10 button) you will see the next most recent articles.

Q: What standard SQL query can support pagination? Suppose each record is an “article” and page size is 10 articles.

I will assume each article has an auto-increment id (easier than a unique timestamp) maintained in the database table. This id enables the “seek”” method.  First page (usually latest 10 articles) is sent to browser.  Then the “fetch-next” command from browser would contain the last id fetched. When this command hits the server, should we return the next 10 articles after that id (AA), or (BB) should we check the latest articles again, and skip first 10? I prefer AA. BB can wait until user refreshes the web page.

The SQL-2008 industry standard supports both (XX) top-N feature and (YY) offset feature, but for several reasons [1], only XX is recommended :

select * from Articles where id < lastFetchedId fetch first 10

[1] http://www.use-the-index-luke.com clearly explains that the “seek” method is superior to the “offset” method. The BB scenario above is one confusing scenario affecting the offset method. Performance is also problematic when offset value is high. Fetching the 900,000th page is roughly 900,000 times slower than the first page.

Advertisements

##RDBMS=architect’s favorite

A specific advantage .. stored proc can _greatly_ simplify business logic as the business logic lives with the data …

Even without stored proc, a big join can replace tons of application code implementing non-trivial business logic. Hash table lookup can be implemented in SQL (join or sub-query) with better clarity and instrumentation because

* Much fewer implicit complexities in initialization, concurrency, input validation, null pointers, state validity, invariants, immutabilities, …
* OO and concurrency design patterns are often employed to manage these complexities, but SQL can sidestep these complexities.

Modularity is another advantage. The query logic can be complex and maintained as an independent module (Similarly, on the presentation layer, javascript offers modularity too). Whatever modules dependent on the query logic has an dependency interface that’s well-defined and easy to test, easy to investigate.

In fact testability might be the most high-profile feature of RDBMS.

I think one inflexibility is adding new column. There are probably some workarounds but noSQL is still more flexible.

Another drawback is concurrency though there are various solutions.

bi-temporal database – first lesson

Milestone in PWM — similar. I was using only the validFrom/validTill columns.

I think this is all about accountably modifying historical record.

snapshot — For a given observation date like 31/12/1999, “where TxStart < observDate and observDate < TxEnd” would return a snapshot as of 31/12/1999.

A “Tx” or “Transaction” — means a data entry. TxStart/TxEnd marks the observation window or period of belief. If we observe the price on a day within this window, i.e. take a snapshot on that day, then all the data rows outside this window is filtered out.

Indexing for performance — an index on (TxStart, TxEnd) would speed up such “snapshot” queries.

Data error 1 — If a wrong price was entered but quickly corrected, and never affected any customer, then I doubt bi-temporal is designed for this. Therefore the “never-delete” is not so strict.

Data error 2 — However, if a customer was charged the wrong price, then “never-delete” principle holds. The wrong price was in force at that time, so it is valid history (though not valid by company policy). Valid history is never deleted and must be milestoned.

Example — http://sqlmag.com/business-intelligence/bitemporal-design-time-after-time

c#Reuters database IV

Database Question

Table E
=========
int id
string empType
string email
string firstName
string lastName

indices:
======
CLUSTERED (id, empType)
UNIQUE(id)
UNIQUE (empType, id, firstName)
DUPLICATE (empType, firstName)

Table E has: 10,000,000,000 rows

Table E has: one row with firstName “bettlejuice”

Table E has: 1,000,000,000 rows with empType = ‘fulltime’

SELECT * FROM E WHERE email = ‘foo@example.com‘;

SELECT * FROM E WHERE id = 100 AND firstName = ‘beetlejuice’;

SELECT * FROM E WHERE empType = ‘fulltime’ AND firstName = ‘beetlejuice’;

SELECT * FROM E WHERE empType = ‘fulltime’ AND firstName = ‘beetlejuice’;

How many pages for leaf nodes does the above query read?

INSERT: Writes how many pages?

oracle tablespace striping

If an audit table or a trade capture table gets a lot of concurrent writes, you may want to stripe it across disks to increase parallelism and write performance. One way is to configure the table with multiple tablespaces, according to a friend. Each tablespace maps to a physical disk.

The same stripe technique also speeds up reading.

See http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:7116328455352

basic oracle instance tuning tips #Mithun

You can turn on tracing for a given session. All queries will be traced. You can also trace across all sessions.

Oracle provides dynamic performance stats in the form of so-called V$ views, but I think the trace files are more detailed.

Another common technique is to record timestamps

– client log – issuing query
– server log – receiving query
– server log – returning data
– client log – receiving data
– client log – sending out data to requester, which could be in the same or another process.

Latency is additive. Find the largest component.

simple snoop table to monitor DB access

CREATE TABLE snoop( 
 charp1 VARCHAR(16384) DEFAULT  ”  NULL,
 charv1 VARCHAR(16384) DEFAULT  ”  NULL,
 charp2 VARCHAR(16384) DEFAULT  ”  NULL,
 charv2 VARCHAR(16384) DEFAULT  ”  NULL,
 nump1 VARCHAR(99)     DEFAULT  ”  NULL,
 numv1 FLOAT           DEFAULT  0   NULL,
 datep1 VARCHAR(99)    DEFAULT  ”  NULL,
 datev1 DATETIME NULL,
 ts DATETIME DEFAULT  getDate()  NOT NULL,

/* one way to use this table is to save multiple params when calling a proc
insert snoop(charp1,charv1,nump1,numv1,datep1,datev1,sproc)
      values(‘param1’,?, ‘param2’,?,  ‘param3’,?, ‘myProc’)
*/
 sproc VARCHAR(99) DEFAULT  ”  NULL,
 ident INT IDENTITY NOT NULL,
 remark VARCHAR(99) DEFAULT  ”  NULL
)

SQL code generation – ETL^ORM

Hibernate does SQL code generation. Often sub-optimal. Now I think ETL tools probably avoid SQL code generation. Reason is efficiency.

Suppose your legacy app has business logic in query or sproc. ETL tools often emulate the same business logic but outside the database, and often at a much faster throughput.

[practically]proc to return 0-row, null or default value

We often write lookup procedures to return a single joined record.
Better distinguish between these scenarios below. The same stored proc
can
– returns 0-row
– return a special value to indicate 0-row
– return a null value for a field
– return a default value
If possible, I generally avoid returning null value, because they
require extra parsing in java. Besides, null values can be a consequence
of many scenarios — ambiguous.
If the one and only select from the proc simply selects a bunch of
variables, then 0-row won’t happen. How do you indicate 0-row? A very
common scenario. I often use a @rowct variable, that’s updated by the
earlier table selects. In this context, we can also put special values
into other fields to indicate 0-row.
If you want the caller to know it’s 0-row, null or default value, when
all scenarios are possible.
– then choosing a default value can be tricky
– null can be tricky because in the @rowct case, a lot of fields of the
last select might be null.

Oracle article on exclusive/shared locks, row/table level locks, non-repeatable-read

http://download.oracle.com/docs/cd/B10501_01/server.920/a96524/c21cnsis.htm#2841

Here are a small sample of the knowledge pearls —

– If a transaction obtains a row lock for a row, the transaction also acquires a table lock for the corresponding table. The table lock prevents conflicting DDL operations (like alter table).

– A detailed comparison of read-committed vs serializable isolation levels in Oracle

To my surprise,

* “Readers of data do not wait for writers of the same data rows”. Maybe Reader thread simply goes ahead and read-committed. Reads either the BEFORE or the AFTER image of the row. Now, If isolation is at Serializable, then still no wait, but this thread will throw exception if it re-reads the row and detects update. This is a classic non-repeatable-read error.

foreign key on Wall St

An experienced consultant told me he only saw FK used once in the many
banks he knows. He basically said if you don't know how to use FK then
don't use it. His team uses FK during paper design only.

I too worked in several banks. Only relatively inexperienced developers
use FK but I find them inconvenient.

One of the problems is update/delete. FK often blocks you. We all know
some data is wrong but we can't fix them, in real time trading!

Insert? I don't remember but I think you can't insert a dependent table
before inserting all primary tables. Imagine your dependent table
depends on 2 tables, which in turn depend on 3 other tables.

Remember Wall St is all about quick changes and quick fixes (2 different
things). That's why jmx is so popular – no restart required.

not-null when creating table

Null values mess up numeric aggregates. If you often do that then consider a meaningful default value, though it distorts average().

Nulls mess up composite boolean exp. Worse still, nulls can even mess up basic where-clause.

My first choice is not-null.
– For string fields, default to empty string.
– For integer date fields, wall street systems often default to 99991231.

In terms of null handling, SQL beats java. As java developers going into SQL, we don't need to be too paranoid about nulls.

same case-expression in q[SELECT/GROUP-BY]

See P417 [[transact-sql programming]]. In GS and Citi I have written many ugly queries where a large case expression appears in SELECT and again in GROUP-BY.

Not a performance problem, but a maintenance problem.

My advice — Have no fear. You can use Perl or Java to manage the ugly query. Not possible if you need it in a stored proc and have no experience with execute-immediate. It’s still ok. There’s often no better alternative.

sybae ltrim() and rtrim() can change a empty string to a null string

I do get some null cusips from this query —

select cusip=ltrim(rtrim(p.cusip)), price = case
when traderMarkInd = 'Y' then traderMarkPrice
when autoMarkEnabled = 'Y' then autoMarkPrice
end, p.account, i.position, maturityDt=99991231
from TblMuniMarkPrice p left join TblMuniInventory2 i on p.cusip = i.cusip
and p.account = i.account
where 1=1
and p.cusip is not null
and ltrim(rtrim(p.cusip)) is null

$dbproc in PWM perl database programs

Now I know where the $dbproc came from. It’s a common variable name in sybase DB-library programs. Note DB-library is the older of the 2 open-client libraries, and still used in many wall street banks.

http://manuals.sybase.com/onlinebooks/group-cnarc/cng1110e/dblib/@Generic__BookTextView/503;pt=505#X — shows a standard DB-library program

http://manuals.sybase.com/onlinebooks/group-cnarc/cng1110e/dblib/@Generic__BookTextView/34250;pt=39614
http://manuals.sybase.com/onlinebooks/group-cnarc/cng1110e/dblib/@ebt-link;pt=34250?target=%25N%15_34741_START_RESTART_N%25

sybase – number of user connection

1> sp_configure ‘number of user connection’

2> go

 Parameter Name                 Default     Memory Used Config Value Run Value   Unit                 Type      

 —————————— ———– ———– ———— ———– ——————– ———-

 number of user connections              25     1990886        1700         1700 number               dynamic   

 

 

1> sp_monitorconfig ‘number of user connection’

2> go

Usage information at date and time: Dec  2 2010 10:35AM.

 

 Name                      Num_free    Num_active  Pct_act Max_Used    Num_Reuse  

 ————————- ———– ———– ——- ———– ———–

 number of user connection         376        1324  77.88         1363           0

 

DB as audit trail for distributed cache and MOM

MOM and distributed cache are popular in trading apps. Developers tend to shy away from DB due to latency. However, for rapid development, relational DB offers excellent debugging, tracing, and correlation capabilities in a context of event-driven, callback-driven, concurrent processing. When things fail mysteriously and intermittently, logging is the key, but u often have multiple log files. You can query the cache but much less easily than DB.

Important events can be logged in DB tables and

* joined (#1 most powerful)
* sorted,
* searched in complex ways
* indexed
* log data-mining. We can discover baselines, trends and anti-trends.
* Log files are usually archived (less accessible) and then removed, but DB data are usually more permanent. Don't ask me why:)
* selectively delete log events, easily, quickly.

* Data can be transformed.
* accessible by web service
* concurrent access
* extracted into another, more usable table.
* More powerful than XML.

Perhaps the biggest logistical advantage of DB is easy availability. Most applications can access the DB.

Adding db-logging requires careful design. When time to market is priority, I feel the debug capability of DB can be a justification for the effort.

A GS senior manager preferred logging in DB. Pershing developers generally prefer searching the same data in DB rather than file.

y prepareStatement uses the connection object

 

                  PreparedStatement ps = conn.prepareStatement (sql);

 

Preparing (compiling) a preparedStatement is done on the DB server, so the java client must reach out to server.

 

PreparedStatement is faster because of pre-compiling. Pre-compiling means a 2 stepper – pre-compiling and executing-with-argument. For N iterations, You need to send 1 + N “messages” to the server. Only the first call (pre-compiling) need to use the connection object.

Fw: milestoning vs log table

Thanks for sharing your implementation. I agree some apps may need a log of user activity. A log table.

Milestone is probably designed for change-history. By the way I’m not a big fan of milestone. It has many drawbacks – unenforceable data rules, manual updates, foreign key, hibernate, data corrections …

table constraints

For simple data validation like Price > 0, you can implement in the application loading into the table, or in the application reading from the table, or you can use table constraints.

* reusable — What if another source system (beside LE) needs to load into this table? Table constraints are automatically reusable with zero effort.

* testing — Table constraints are extremely simple and reliable that we need not test them.

* Table constraints are easier to switch on/off. We can drop/add each constraint individually, without source code migration and regression tests

* flexible — we can adjust a table constraint more easily than in application. Drop and create a new constraint. No source code change. No test required. No regression test either.

* modular — table constraints are inherently more modular and less coupled than application modules. Each constraint exists on its own and can be removed and adjusted on its own. They don’t interfere with each other.

* risk — Table constraints are more reliable and there will be less risk of bad data affecting our trailers. Validation in application can fail due to bugs. That’s why people measure “test coverage”.

* gate keeper — Table constraints are more reliable than validations in application. They are gate keepers — There’s absolutely no way to bypass a constraint, not even by bcp.

* visible — Table constraints are more visible and gives us confidence that no data could possibly exist in violation of the rule.

* data quality — You know …. both suffer from data quality. They know they should validate more, but validation in application is non-trivial. Reality is we are short on resources.

* if we keep a particular validation as a table constraint, then we don’t have to check that particular validation inside the loader AND again in the downstream trailer calculator (less testing too). You mentioned reusable valiation module. This way we don’t need any.

* hand-work — In contingency situations, it’s extremely valuable to have the option to issue SQL insert/update/bcp. Table constraints will offer some data validation. From my experience, i have not seen many input feed tables that never need hand work. I believe within 6 months after go-live, we will need hand insert/update on this table.

* Informatica — Informatica is a huge investment waiting for ROI and we might one day consider using it to load lot data. Table constraints work well with Informatica.

* LOE — The more validation we implement in application, the higher the total LOE. That’s one reason we have zero constraint in our tables. We are tight on resources.

* As a principle, people usually validate as early as possible, and avoid inserting any invalid data at all. Folks reading our
tables (perhaps from another team) may not know “Hey this is a raw input table so not everything is usable.” Once we load stuff into
a commissions table, people usually think it’s usable data. Out of our 100+ tables, do you know which ones can have invalid data?

MS iview – DB centric

Q: how do u speed up a stored proc?
%A: join order; index selection; avoid table scan; update stats

Now I think we should first establish performance target, by estimating minimum I/O required. (U can’t make a huge query run faster than a tiny query.) Then measure actual logical i/o. See the gap? Usually u can find clues in the execution plan. Improve the plan.

Q: How could a string object’s content be modified?
A (right): reflection to access the char array.

Q: how does a singleton become a 2-ton?
A: 2 class loaders, but i thought class loaders form a delegating hierarchy so no 2 loaders compete to load a class?
A (from interviewer): you can create a class loader outside the hierarchy. Indeed singletons are per-classloader.

Q:diff between clustered and non-clustered index
The answer I forgot to give: a column with lots of updates is no good as clustered index

Q: Users complain about a delay/slowness. we know it’s due to a Unix process. How do you investigate.
A: top; thrashing.

Q: What’s garbage collection?

Q: I have an object in a cache. it’s not referenced from anywhere outside the cache. After using it once, i don’t need it anymore. How do i get it garbage collected?
A: weakHashMap
A: just delete the entry from the cache

Q: lots of insert/delete/updates to a table. how do u keep track of all changes and by who? How do u ensure changes are always tracked?
A: %_hist table and milestoning

Stored proc in an informatica pipe

2 common uses:

A) as a source qualifier. Overwrite the query with “exec yourSP param1, param2 ..”. Resultset will appear in the pipe

B) Stored Procedure transformation. The SP’s input parameters become the in-ports. Output params become out-ports. If you are the
SP’s author, result set columns can be “redirected” to out params. One input record coming down the pipe can produce multiple
records — active transformation.

y Trigger

A few benefits of triggers in the mortgage commissions system

* testability — To test the triggers, just run one or a bunch of inserts. Use sqsh, aqua studio, fitnesse or any other tool. If the logic is in Perl/Java, i don’t think it would be so simple.

* Modular — All (and only) the logic related to inserting records are contained in the triggers. This is separate from other logic. Separation of concern. Other developers are shielded from the complexity of saving records.

* complexity — The insert trigger in the MtgAccts table involves more than 5 queries. To implement the same logic using Java or Perl would take 3-5 times more lines of code, by my estimate.

* flexibility — during production support. Suppose we need to load a few records for a production issue. Just write the query by hand and test in QA and then run it in sqsh/aqua or another tool.
* reuse — 70% of the trigger code in MtgAccts table are used in other tables and also used in the spApproveMtg stored proc.

There are some concerns …

* performance — is not a top priority. Each month’s load job takes less than 30 seconds now.

* UDB — There’s some concern over migrating stored proc and triggers to UDB. Trigger support in UDB is probably richer than sybase. Therefore the triggers are probably UDB-friendly.

don’t repeat yourself ] sproc

Basic constructs/ideas/techniques to avoid repeating yourself ] sproc:

* dynamic sql
* nested sproc — sproc1 calling sproc2, which calls sproc3. i think the code in the nested sproc can be invoked repeatedly but coded once only
* app to construct the “sister statements”
* converge — short, non-repetitive queries insert into a shared table, then common logic applied to the shared table
* temp tables are essential to many of the constructs above
* case-expressions are powerful tools for cutting such repetition

include an irrelevant table in a n-way join with no effect

Say you use a 6-table join to return 1000 rows for a large set of conditions. Now for some reason[1] you need to include another table, but want exactly the same output. The additional table should have zero impact on query output and near-zero impact on performance.
 
[1] one reason: to combine this query with another query.
 
Solution: try to find a condition involving only columns in the irrelevant table, to return exactly one row. Usually the primary key can help. However, this condition can be very hard to understand.

filter^output column

2 types of columns

Most (if not all) columns in a table/view are either
– filter columns, used in where, having, on …
– output columns, returned in select-list
– or both

Examples of filter columns: most join columns, id ….

Given a complex query, if u need a quick summary of one of the tables it’s worthwhile to classify the columns this way. This is obvious stuff but it pays to develop this instinct. It should become a 2nd nature

to see who holds a DB lock

An experienced friend suggested “You can find the DB-loginname and the client machine connected to the DB server.” I don't
completely agree with this solution.

Very often an application loginname like “KP2_reporting” is used by 200 (realistic) applications, all by the same IT team? Many of
these applications could connect from the same production machine. At any time, up to 20 applications could be running.

I think we can also try to get the process id on the client machine, or the TCP port.

some pro^con of triggers

+ if 2 programs need to CUD but can’t share the DAO.
Example: 2 java programs running on 2 machines. U may have to copy the jar to share code.
Example: sometimes u may even use command line to CUD

+ to achieve the same data quality level, those same SQL must be run anyway, in trigger or in app

————-

– perf. extra load on the bottleneck — DB server
– migrating to another DB vendor

half-open interval — From^To

The common milestone design attaches 2 non-nullable datetime columns F (fromDate) and T (toDate) to each row. “My” F is always identical to my predecessor’s T, and my T is identical to my sucessor’s F.

[2] In otherwords, the entire timespan (till 9999/12/31) is fully covered, with overlaps on the joint instants.

Fruit of the labor is — given any instant, there’s always 1 and only 1 matching row. Standard query is getDate() >= From and getDate() < To. Therefore known as half-open interval — closed on F, open on T.

What if F and T are date types not datetime? Statement [2] still holds, with each joint-instant at a day’s first moment.

trigger to prevent duplicate data

Background: the table may have an ID column, DateCreated, DateModified. Also the primary key is not this ID but identiy column or something else.

This trigger basically tests all the account-detail columns and detect a duplicate row. However, if you define a uniqu composite index covering all these columns, then you won’t need this trigger. But this trigger lets you
– add more complex checks
– apply specialized remedies based on context

43 if ( select count(*) from Accts m, inserted i
44 where m.ID=i.ID
45 and m.OriginationDate = i.OriginationDate
46 and m.USState = i.USState
47 and m.AcctNum = i.AcctNum
49 and m.OfficeID = i.OfficeID ) = 2 rollback trigger — duplicate row

sybase trigger — unique id column

specific Purpose: input validation. Let’s zoom in on this. insert, update. not delete.

My observation: a unique id column is important to any input-validation trigger in sybase.

My reasons: you usually need UPDATE yourtable set …. WHERE yourtable.id=inserted.id. Without the where-clause, you could update pre-existing rows. Perhaps millions of pre-existing rows for every row inserted.

Because sybase doesn’t support before-trigger, you need this trick to update the rows just inserted.

SQL-CASE expressions — keywords

* evaluate — each case expression evaluates to a …
* per-row — the expression is usually evaluated for each row
* if-then-else

All above keywords apply to both simple and searched case expressions

which parts of the case expression can be “expanded”?
– when clause
– after-case expression, right after the “CASE” keyword
– ??? then clause

Where can a case-expression appear?
* group-by
* update->set
* select-columns

start/end of a unit of work — DB2

J4: it’s important to understand this (simple) concept when reading about savepoints, ….

Practically, at any (except right after a commit/rollback) moment in your session, you are in an open[1] transaction.

The DB2 trainer said a tx would start when u are outside any tx and issue a select or another sql statement. I still don’t understand.

[1] or “active transaction”

joining an existing transaction

an EJB method has 6 transaction attributes (described in tref). A basic concept is “joining a transaction”. It means “joining an existing transaction”.

This joining concept is important not only in EJB.

According to the DB2 trainer, DB2 triggers and functions JOIN transactions, whereas sproc often CONTROL transactions. Let’s explore

* a function is used in a single select.
* a trigger is invoked as part of a CUD

triggers for input-validation

Use a BEFORE-trigger, not an AFTER-trigger or INSTEADOF-trigger.

Input-validation often contains real business logic (age > 18). The same business logic must go into app, perhaps javascript too.

Q: Maintenance nightmare?
A: No. Some input come through the app, some input come through a DBA tool

tiny sql database embedable in java

http://robsanheim.com/2006/03/10/searching-for-the-perfect-in-memory-database/ mentions derby, hsql, sqllite

I think this kind of thing might be relevant to our systems:

status-quo: We have sybase/udb. I think we use temp tables or staging tables to hold temporary data. a little slow.

status-quo: We create custom objects (java collections) to hold stuff in memory, but need to build custom-methods to access them. A pain to test. Great if we can use standard sql instead of those methods. The idea is to leverage on existing standard and avoid custom coding to duplicate existing standard.

We have lots of feeds.

deterministic — DB stored program

sproc? usually non-deterministic
table functions in db2? usually non-deterministic
simple scalar functions? usually deterministic

For every input data, if every time you pass the the input into this blackbox you always get consistent output, then …. deterministic

If you find some input data that could lead to inconsistent output, then …. non-deterministic

tablespace in db2

When you hear people mention db2 “tablespace”, remember these 5 key points (knowledge pearls)

– each tablespace has its page size. page size affects many things including performance

– tablespaces are often named after the departments like Eq, FICC, Legal …

– a tablespace can hold many tables. I think a table can span tablespaces.

– the term “tablespace” means the same to db2 and oracle,
db2: tablespace
oracle: tablespace
sybase: segment

tool vs server in DB2

Certain features (like autocommit) are in “tools” and not the “server”. Make this fundamental distinction.

For DB2, Tools = Command-line-processor, Command-editor, Developer-work-bench, Aqua-Studio.

I call these “db2 clients”

Q: what db2 client do you use in a shell script?
A: CLP

ROLE of a DB table in a financial app

typical financial applications could have these groups of elements:
– lots (dozens) of tables
– lots of stored procedures
– lots of classes
– [c] lots of batch jobs, all in a scheduling system with their complete command lines
– [c] lots of standalone [1] batch scripts, not counting library files. Each standalone is started from the command line.

As in chemistry, elements interact. When learning about one element, It’s imperative albeit pains-taking to /map out/ [1] how it interacts with other elements. Tips for studying a db table:

– Your goal is, let’s repeat it, mapping out how the table interacts with other elements
– search for the table name in app source code. Many DB commands aren’t store procs.
– search in the stored proc source
– once u have an important stored proc name, search for it in the app source. Start with one stored proc
– for sybase, try sp_depends

[c = a strategic choke point, where you can search…]
[1] document if possible, which clarifies and confirms your own understanding.

stored-proc ^ dynamic-sql

In some apps, these are the 2 main solutions.

I think dynamic sql isn’t industry-standardized (stored proc has an ANSI standard) and depends heavily on rdbms server support.

In theory, once microsoft/oracle has a good dynamic-sql feature, other vendors could follow.

Search for “dynamic sql oracle/sybase/db2/…”

pl/sql q&&a non-iview

A1: No. Just an empty identifier.
q: how does an insteadof trigger support DML on any view, even non-updatable views?
a: I think this trigger can be placed on any view.

A4: erro code + erro message

q: what constitutes a pl/sql block?
A: I think BEGIN and END

A7: declare a user-defined exception, to be *raised* later
A89: I think the context surrounding the raise

q7: what is meant by declaring an exception?

q: can you connect to your oracle instance from a cybercafe and see your colleague’s pl/sql source with original comments?
A: yes see tref

q: can you obfuscate your pl/sql source?
A: yes see tref

Q: can i raise a standard exception?

q4: a pair of things to be defined for an exception, beside its name?

q: why do some say java try/catch and exception propagation borrowed from pl/sql?
A: “catch or propagate”. In other words, uncaught exceptions are propagated.

Q1: When declaring a user-defined exception, is there any content in it?

Q89: a user-defined exception usually signifies a specific (rather than vague) condition. Where is the specific condition specified?

—-jargons:
–session as in “session object”

pl/sql package q&&a

j4package: access control — achieved by declaration. This is A more important j4 than “grouping and code-organization”. Access to what? [1]
j4package: fully-qualified-names — to avoid namespace pollution.

#1 nlg pearl: – – > { spec-body } forms a twin for a pl/sql package.

a: short for “package specification”, which appears before “CREATE OR REPLACE PACKAGE BODY”. Stuff (what stuff? [1]) declared in a spec are “global” ie usable outside the pkg. What aren’t? [2]
A: same meaning — vars declared in a spec. Usable outside the pkg. Read online about why package-variables aren’t persistent in http context.
A: “CREATE OR REPLACE PACKAGE my_pkg8 ..” Note there’s no “BODY”. A spec has nothing to do with “DECLARE ….”
A: a “body”, short for “PACKAGE BODY”.
A: in your “schema”, basically your login account (??? confirmed!)
q: what does a spec look like? How do I know i’m looking at a spec?
Q: what’s a “spec” in pl/sql?
Q: what’s the twin sister of a “spec”?
Q: what are package-variables && global variables?
Q: Where is a function saved if i don’t have a package?
[1] vars, functions, procedures. Yes you can declare all of these in the spec
[2] stuff declared only in the body.

Oracle cursor porperties in 3 modes

J4: Obscure topics, but it’s easy to mater the issues surrounding and beneath them, and your mastery will lead to confidence and better understanding of related cursor topics.

B) Implicit => no %ISOPEN [1]
SQL%ROWCOUNT is usable
SQL%FOUND is usable [2]

C) cursor for loop => %ROWCOUNT is the only usable property
%ISOPEN, %FOUND are completely unnecessary.

A) Explicit cursor
All cursor properties were designed and usable for explicit.

“cursor property” is also known as “cursor attribute” or “cursor variable”.
[1] always false. There’s no loop in which you can test ISOPEN
[2] notice there’s no name for the cursor as it’s implicit. use “SQL%..”

sum(field_with_all_null) == null

http://www.oracle.com/technology/oramag/oracle/05-jul/o45sql.html shows how aggregate functions react to empty sets ie “no entry except nulls”.

COUNT returns zero whereas AVG, SUM, MAX, and MIN return null.

If you have no values to count, it’s fair to say that you have zero values, whereas you can’t really come up with, say, a maximum value without at least one value from which to choose. We can make a reasonable argument that SUM should return zero instead of null, that the sum of no values is zero, but Oracle’s implementation of the behavior we describe here is fully compliant with the SQL standard.

duo+trio intro 2 mysql cursor

Lesson 1: “open, fetch, close” is the first thing to internalize. You have mastered the most important thing in Mysql cursor if you can recall this trio.

open my_cursor ;
fetch my_cursor ….
close my_cursor ;

Side show: Cursors are used for multi-row selects. For single-row selects, simply “select into…”

Lesson 2: “declare, open, fetch, close”

Lesson 3: “1-to-1 mapping” between a cursor and a multi-row select

Lesson 4: “declare-declare, open, fetch, close”, since we usually need to declare the cursor and the error handler. This forms the world-famous “duo + trio”

Lesson 5: Each multi-row select usually has 1 cursor and 1 error handler to handle end-of-loop.

choose index vs FTS(full table scan)

http://www.isug.com/Sybase_FAQ/ASE/section1.5.html#1.5.3 has a good story. Here’s my version of the story, perhaps less accurate.

limited RAM => can’t keep all data blocks in cache. frequent swap-out/page-out => index may point to Page 5 first, and 986 pages later, Page 5 again => same disk data-block retrieved twice => index solution may require more disk-reads than a FTS.

The simplified story above helps us focus on a small, bite-sized topic. When you feel ready, read the longer version below.

Besides, Remember index itself need to be retrieved from disk.
How to choose between index and FTS? [[ oracle sql tuning ]] and [[mysql stored procedure ]] both mention somewhere around 10% [1] as a threshold to switch to FTS.
[1] [[ oracle sql tuning ]] p 233 gives different thresholds for row-sequenced tables. Same chapters also mention the influence of parallel query.

in-line view ^ ANSI view: SELECT from SELECT

select * from (select * …. )
NOT a sub-query. supported in oracle and mysql5
Now an ANSI standard view is very similar. Basically a saved query to be used as a fake table, a select-query to be used in FROM, ie select from select.
The starting point to learn inline view is a standard ANSI view.

I would say Oracle supports 2+1 types of views
* ANSI standard view
* materialized view — see earlier posts
* inline view — no name. query not saved with a name

materialized view

I think there are only 2 important types of views — traditional view ^ materialized view

materialized view = a db-server-side cache. Compare with mysql query cache [[ mysql stored procedures ]]

Compared to a standard view, A materialized view takes a different approach in which the query result is cached as a concrete (rather than virtual) table that is infrequently (rather than on-each-query) updated from the original base tables. This enables much more efficient access, at the cost of some data being potentially out-of-date. Whenever the underlying tables of a materialized view are updated, Oracle marks the materialized view as stale.

When a SQL query is issued, the query rewrite facility within Oracle will determine whether a suitable materialized view is available. If it is, Oracle will transparently rewrite the query to use the materialized view instead. This results in significantly improved performance since the query accesses the small set of rows in the materialized view rather than all the rows in the tables referenced by the query.

materialized view is most frequently used in data warehouse applications. Data warehousing applications also do not typically require access to real-time data.

MySQL does not provide Materialised Views by itself. But it is easy to build Materialised Views yourself.

MS SQL server has something similiar to materialized view — “indexed view”

sql to list mutual funds

we don’t know when the database may get updated. we only know it’s very rare

Jingsong J. Feng if the change frequency is very low, create a trigger in the database, when data change, trigger a query, and update result

Jingsong J. Feng schedule Thread to run it every miniute or every 5 miniutes
Bin X. Tan/VEN… to update the local cache?
Jingsong J. Feng yes

Jingsong J. Feng oh, set a timeout variable — Hibernate

How about an update/insert/delete trigger to send a GET request to our servlet and expire the cache immediately? The next visitor to the website would see the updated list fresh from the database.

If there’s a jms broker, trigger can send a cache-expiry alert to the queue even when the web system is unavailable. When the web site comes up again, it receives the jms message and expires the cache?

Without a JMS broker, the GET request could update some persistent expiry-flag on the cache server. Even with frequent reboots, the cache server could check the expiry-flag periodically.

Q: Can a trigger send a GET?
A: yes

Q: Can a trigger be a jms client?
A: Not so common. http://www.oracle.com/technology/products/integration/bam/10.1.3/TechNotes/TechNote_BAM_AQ_Configuration.pdf
(AQ = Oracle Advanced Queuing )

#1 error when selecting tuples failing a predicate

Q: write a SELECT to
– show those … (ie tuples) which are not …
– show students who are not ….
– show accounts which do not …
– show ships which are not ….
– show shops which are not ….

All of these are similar. Usually you need a condition like

  where col8 …
  where col8 not like “%..”

Do you remember a common mistake you tend to make, assuming no nulls?

A: You are assuming each student is represented by just 1 row. Example: an enrollment table with columns for student_id and subject_code. One student can show up in many rows.

ROTI [[oracle sql tun`]]

cost-based optimizer vs rule-based optimizer
optimizer modes — choose, rule, first_rows, all_rows
diminishing return, most significant queries to target
4 areas of ora tuning — instance, norm/denorm(with triggers), OS, SQL
self-join examples
oracle implementations of joining, in relation with subqueries
function-based index
what can disable an index
optimizer hints
sql cache
hash storage for often-joined tables
essential table-access methods
essential index-access methods
reading execution plans

NY iview — meetup

q: with and without sub queries, write sql to show employees not assigned any project, given employee (id, name), employee_project (eid, pid)
%%A: outer join with count() = 0 ? Yes see post on payment history

http://sqlzoo.net/3.htm contains actors not casted

select count(name) from actor left outer join casting on actor.id=casting.actorid where casting.actorid is null

Based on IBM webpage: “The following query returns only rows in which customers have not made any calls

SELECT c.customer_num, c.lname, c.company, c.phone, u.call_dtime, u.call_descr
FROM customer c LEFT OUTER JOIN cust_calls u
ON c.customer_num = u.customer_num
WHERE u.customer_num IS NULL “

This is a case of a normal “unmatched prikey”. No referential integrity violation.

q: xp with mysql replication? (programmers need to know mysql replication.)

q: describe an xq java web project

q: difference btw spring mvc vs struts

q: java 5 features? generic, enum

q: describe a personal pet project (“side project”)

q: why do u say smarty is cleaner separation of presentation and logic than jsp or asp?

q: name a technology that excites u now and name one feature

q: php caching

q: php connection pool

q: unit testing?

q: ajax xp?

q: web services xp?

q: state sync between 2 instances of php?

%%q: important technical skills for this job
a: mysql cluster, fast sql queries, core java, junit,

2min intro to NULL (sql)

http://www.oracle.com/technology/oramag/oracle/05-jul/o45sql.html is a more complete list of jolts. But for now …

#1 jolt: “the NOT-trap” for any cond3 such as “amount > 10” or “id=order.customer_id” … ,
if
select … where cond3 –> 30% of the rows,
then
select … where NOT cond3 –> ?


#2 jolt: “ignore”

where col3 = …
where col3 > …
where col3 LIKE …
where col3 IN …

These filters will ignore NULL entries. Yes, ignore their existence. Reason? null means “unknown” so such an entry is perhaps IN and perhaps not IN.

composite prikeys includ`nullable columns@@ NO

Remember primary key must be unique.

If one part of a composite-key is null/unknown, the entire composite-key’s value becomes unknown. => Every member of every composite prikey need a not-null constraint. If you attempt to create/define an offending prikey, i think somehow the DB server will refuse.

In general, if either of the 2 things you compare is unknown, result is unknown. Remember “null AND true = null”, “NOT” applied to a condition with null value -> null?

Non-comparable => can’t test uniqueness

open/fetch/close cursor ] pl/sql, part 3

Lesson 1: For a regular loop
declare/ %ROWTYPE / open / fetch..into/ %NOTFOUND / close

Lesson 2: For a cursor for loop: DECLARE / FOR / END_LOOP

DECLARE
— %ROWTYPE omitted
— OPEN omitted
FOR
— FETCH..INTO omitted
— %NOTFOUND omitted
END_LOOP
— CLOSE omitted

– – – – Now a real example: – – – –

  DECLARE
CURSOR cursor3 IS SELELCT ...
...
FOR undeclared_var in cursor3 LOOP
-- use the var
END LOOP;

anon ^ named programs ] pl/sql

As far as I know, every pl/sql program is either A) anonymous or B) named. You need to know them because

– Both are demonstrated frequently in literature.
– Their syntax differ, causing confusion to programmers.
– Authors don’t point out their sample code is A or B, causing confusion to programmers.

Triggers and functions must be named to be invoked. A procedure actually takes parameters just like functions. I would say a procedure [1], too, must be named to be invoked.

[1] some define “procedure” differently and accept “anonymous procedures”.

B is not always created/replaced in DB. It can be “created” in an anon-block and thrown away. see [[ making sense of …]] for a wider explanation.

A is also known as an “anonymous block”, essentially a throw-away program, not worth saving. Lacking a name, you can’t save in the DB such as in a pl/sql package.

Nevertheless, anon blocks are popular — You type up the text into sqlplus, run it and throw it away. => You can also save the text, and later copy-paste.

The most confusing thing about anon is nested anon. In anon-block1’s DECLARE section, you an put an entire function (named) or procedure (named to be invoked).

open/fetch/close cursor ] pl/sql, part 2

Lesson 1: open / fetch..into / close
Lesson 2: open / fetch..into / %NOTFOUND / close

  open cursor3;
loop
fetch cursor3 into record_var2;
exit when cursor3%NOTFOUND
-- do something with record_var2
end loop;
close cursor3;

Lesson 99:
declare/ %ROWTYPE / open / fetch..into/ %NOTFOUND / close

For a simple yet complete sample code, combine the last sample in this post and the previous post.

open/fetch/close cursor ] pl/sql, part 1

Lesson 1: open/fetch/close
Lesson 2: declare / open / fetch /close
declare the cursor

Lesson 3: open / fetch..into / close

  open cursor3;
fetch cursor3 into record_var2;
-- do something with record_var2
close cursor3;

Lesson 4: declare / %ROWTYPE / open / fetch..into / close
After declaring the cursor, you declare a record-var to hold one row of data

  DECLARE
CURSOR cursor3 IS SELECT ....
record_var2 cursor3%ROWTYPE ;

%% pl/sql iview quiz #7 — jargons

jargons: For each, interviewer can ask for examples(eg), suitable(j4)/unsuitable(n4) uses, or interviewer may mention them in a question, assuming your knowledge. Ranked roughly in order of likelihood

–sqlcode, sqlerrm
eg:
–“instead of” triggers
j4:
eg:
–FOR EACH ROW
P13-9
eg: I actually created many for-each-row triggers in zed.
— deterministic
j4: perf? questionable — “If Oracle did attempt to buffer rows from a deterministic table function that returned many thousands of rows, it could result in performance issues due to excessive memory consumption. ” — http://www.dba-oracle.com/plsql/t_plsql_deterministic.htm

eg: (in function definition)

–“when other”
eg:
–varray
–nocopy
–anonymous block
A regular stored program has a name — for invoke-by-name
j4: no need to invoke-by-name later. Just run once and throw-away

view ^ stored-proc

A) you can create a table and revoke access of PUBLIC to it; then you can grant limited access privileges to the table through views. — http://publib.boulder.ibm.com/infocenter/idshelp/v10/index.jsp?topic=/com.ibm.ddi.doc/ddi132.htm

B) Same for a stored procedure. You can remove read/write access to a table, but provide controlled read/write access to the table via stored procedures. Benefit? Better control. See post on [[ j4stored procedures — a sybase perspective ]]

A and B are related. Both can hide a group of “internal, raw tables” and present a “better” facade, better in some sense. Both use basic SQL and nothing else. Both provide access control. Difference? The 2 facades
– A View’s facade looks and feels like a table, to be used in FROM.
– A stored proc’s facade looks like …. an alien to SQL.

pl/sql function — in-DB ^ in-anon

Every function[1] exists either (X) in an anon or (y) permanently in the databse.

This longevity view is a different view from anon-vs-named. “Named” is irrelevant in this view, since functions are always named.

Y) CREATE OR REPLACE FUNCTION my_func1 …
X) DECLARE FUNCTION my_func2 …

– as part of the declaration in an “anonymous-block”, X creates a short-lived, throw-away function, to be used in the enclosing anon only. A case of “nested block”. You will see 2 pairs of BEGIN/END.

– Y would save the function in the DB

See also [[ anon ^ named ]] for a wider discussion. This post focuses on how to create a function.

Making sense of anonymity && longevity of pl/sql programs

The biggest confusion is nesting.

Longevity depends solely on the o u t e r m o s t layer:
* If outermost is anon, then your unit (even a function) is a throwaway.
* If outermost is in-DB ie an DB entity, then your unit (even if anon) is in-DB and permanent.

– In-DB => outermost is named, even though your unit could be a nested anon.
– named outermost =usually=> in-DB. Throwaway programs need no name.
– throwaway => outermost is anon
– anon outermost => throwaway

in a nested context, Anonymity is layer-specific:
– Your unit could be anon, but the outermost could be named. At the same time, A function may be nested inside your unit.
– Your unit could be a named procedure, but the outermost could be anon. Another anon may also be nested inside your unit.(http://www.unix.org.ua/orelly/oracle/prog2/ch02_07.htm).

No need to combine this post with [[ anon ^ named ]] or [[ in-DB ^ in-anon ]], which offer useful insight. Combining these is time-consuming.

%%iview quiz #1 — pl/sql cursor

Let’s build a good list of questions on cursor and feel prepared.

A3: other processes must queue up for lock-release. Exactly like “synchronized”
a5: everything in the schema (ie DB account)

a1: race condition, exactly like java threads
a6: grant

q1: why does a cursor need to lock a column?
q1.2: how about other CUD statements? Implicit locking? I think every RDBMS must.
q2: how to lock?
q3: What problems can it lead to?
q: u can declare a cursor in a pkg-spec or a pkg-body. Why?
q5: U declare a cursor in a pkg spec. what’s the access scope?
q6: how do u widen access further?
q7: how does an “external” program use this “global” cursor?
q: cursor properties (attributes/properties) for an implicit cursor?
A: see [[ jargon ]]

—- jargons:
–%ROWTYPE
eg?
–%NOTFOUND
eg?

— cursor properties
eg: my_cursor%NOTFOUND, my_cursor7%ROWCOUNT
eg: sql%FOUND for an implicit cursor
–“for update of”
eg: nailing down a column

a JSP-based batch

User? A golf specialty shop

Startup? User logs in to a website to configure and launch the batch job. After successful start, the JSP page shows a progress indicator, refreshed every 15 minutes.

Code sharing? POSBean ie Point-of-Sales bean is a big (thousands of lines) java component containing business logic for our batch and for other modules.
* parse correlated records from multiple sources including branch POS and membership databases
* calculate loyalty points, redemptions and discounts
* “roaming customers” — need to reference remote data maintained across borders
* sophisticated logic to reconcile conflicts among data sources

Database triggers? i think there were a few types
* input validation triggers
* in-DB logging triggers

Threads? single-threaded. Some kind of singletons. One or more “application/context scope attributes”.

Duration? typically less than an hour for small records

Cancel button on a web page? I think it was supported. I think the thread checks a cancellation flag periodically

Progress indicator? At any time, user can see a progess indicator. The thread updates a flag periodically

Realtime log viewing? At any time, user can view the last x lines of transaction logs

Realtime SQL statement viewing? Similar to log viewing

Schedule? on-demand. no automatic schedule.

generic trigger Q&&A

Q3: explain the important term “trigger-event”, in relation to “trigger-action”

Q: can a trigger-event invoke both a before-action and an after-action?
A: yes

Q: can a trigger-action be defined with select-statements alone? Does it make sense?

Q: can a trigger-action call stored programs?
A: yes

Q8: Beside before-action and after-action, what other trigger-actions are supported?
A8: for-each-row-action is NOT a third type. Both mysql and oracle let you combine BEFORE and for-each-row

Q: what if a DML where-clause matches no record? Will a before-action run? How about an after-action?

A3: an insert on a table, a delete on a table, or an update OF one or more columns ON a table. In some databases(which?), a call to a stored procedure also qualifies as a trigger event

basic pl/sql templates — anon ^ named

FUNCTION my_func2…. IS — CREATE? not always present

DECLARE — keyword u s u a l l y omitted when declaring cursors, exceptions ..
BEGIN
EXCEPTION
END;

See http://infolab.stanford.edu/~ullman/fcdb/oracle/or-plsql.html and
http://www.cise.ufl.edu/~jhammer/classes/Oracle/PL_SQL.htm

— Now for anon,
DECLARE — required keyword when declaring cursors, exceptions, functions ….
BEGIN
EXCEPTION
END;

RETURN-NULL ^ NULL statement ] pl/sql

In PL/SQL, these 2 are different.

I believe “return null” is a kind of “return statement”, returning from the stored program.

I believe the “null” statement has the same meaning as “no-op” in other languages??? You need it when your if/then/else has no “action” and need a no-op filler.

illustrating minimal DB index

–an analogy to illustrate index concept (master copy on recoll->tref): a pile of receipts sorted by shipping date

one record number on each receipt => row-id
one non-unique customer name on each page.
one unique but unsorted order number on each page

Goal: given names or order numbers, quickly get to the relevant receipts without Scan.

To build an index on the unique order number is easy — use a spreadsheet or a stack of cards to record the order-numbers vs receipt-numbers, and sort by order-number. (B*-tree is slightly diff)

To build an index on names… After sorting, cards with same names would come together. U need to consolidate them on 1 card.

sql normenclature

Clarity and brevity are the #1 and #2 goals of technical communication.

A stored function can serve as a “select ITEM”, an “order-by-ATTRIBUTE”, a “group-by-FIELD”, or part of a “where-CRITERIA” or “having-PREDICATE” — “criteria/predicate” are more precise than “condition”.

A view and a sub-select can serve as a “from TARGET”

A forkey entry/value “references” a prikey entry/value.
3 forkeys (columns) can “refererence/point-to/refer-to” the same prikey.
A table can have 2 forkeys (columns) pointing to 2 prikeys, in 2 tables.

“20 rows, each containg 4 FIELDS” —- Avoid “column” .

A column can “hold 14 ENTRIES / values ” (Avoid “values” ) but only 9 “distinct values” including the special “null value”.

The 14 entries include 2 “null ENTRIES / values” (Avoid “values” ) , so there are only 12 KNOWN-VALUES.

“Entry” has a more specific meaning. “2 null entries” sound better than “2 null values”.

Many say “These 2 values are identical.”, but It’s clearer to say “These 2 RECORDS are identical” when you select several non-key columns.

“duplicate ENTRIES / RECORDS” —- Avoid “values”

pl/sql exception ^ java exception

— raise == re-throw that same exception you caught
Unlikely java, “raise” is a one-word statement. No need to mention the exception name.

–propagation
An uncaught exception is thrown to the enclosing caller on the call stack.

–when when when == catch catch catch

–when other == catch (Exception ex)

–oracle standard exception == java standard exception
–pl/sql user-defined exception == java custom exception
error message = toString() of the custom exception

pl/sql exception — template

A basic template (anon or named) is —
DECLARE
BEGIN
EXCEPTION
WHEN
WHEN

WHEN
END;

A longer template is

DECLARE -- sometimes u don't see DECLARE
some_var NUMBER
some_var2 DATE
...
user_defined_exception8 EXCEPTION; -- create a name for your custom exception
BEGIN
statements -- a lot
...
EXCEPTION -- one occurence only
WHEN standard_exception1 THEN
statement82
...
RAISE; -- you don't have to raise.

WHEN standard_exception25 THEN
NULL -- do nothing. Just jump to END.
-- If there's an enclosing block, continue with it

WHEN standard_exception7 THEN
statement28
...
RETURN NULL

WHEN user_defined_exception8 THEN
statement7
...

WHEN OTHER THEN -- catch-all
statement61
...
-- no other statements between END and the last WHEN
END;

THEN && END – in shell and pl/sql

Most block constructs need an “End …” since they don’t use “{” for blocks.
Most conditionals need a “Then” , since they don’t use “{” for blocks.

Applicable to most stored procedure and shell scripts.

1) First, let’s internalize these simple, approximate rules. You will get ahead, since many people don’t even notice these simple patterns.

2) When we have the resources and the need, we can list the exceptions, but don’t need to memorize 100% or even 80%. Just memorize some.

3) When we have more resources and need, we try to memorize more.